CN115035541A - Large-size complex pdf engineering drawing text detection and identification method - Google Patents
Large-size complex pdf engineering drawing text detection and identification method Download PDFInfo
- Publication number
- CN115035541A CN115035541A CN202210735421.3A CN202210735421A CN115035541A CN 115035541 A CN115035541 A CN 115035541A CN 202210735421 A CN202210735421 A CN 202210735421A CN 115035541 A CN115035541 A CN 115035541A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- sub
- resolution
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 35
- 238000000034 method Methods 0.000 title claims abstract description 26
- 238000013507 mapping Methods 0.000 claims abstract description 9
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000012545 processing Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 102100032202 Cornulin Human genes 0.000 description 1
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/42—Document-oriented image-based pattern recognition based on the type of document
- G06V30/422—Technical drawings; Geographical maps
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/1444—Selective acquisition, locating or processing of specific regions, e.g. highlighted text, fiducial marks or predetermined fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
- G06V30/14—Image acquisition
- G06V30/148—Segmentation of character regions
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Character Input (AREA)
Abstract
The invention provides a text detection and identification method for a large-size complex pdf engineering drawing, which comprises the following steps: step S1: preprocessing pdf engineering drawings to generate corresponding high-resolution images; step S2: cutting the high-resolution image into a plurality of low-resolution subgraphs, and recording the corresponding sequence of the subgraphs according to the positions; step S3: carrying out first sub-image text detection, preliminarily positioning a text region range in the sub-image, and outputting position coordinates corresponding to the range; step S4: mapping the position coordinates of the text regions in the sub-image to the original large image, removing repeated data in the large image, and acquiring corresponding text region images according to the position coordinates after the repetition removal; step S5: performing second text detection, accurately positioning the text in the text area, and cutting a corresponding text block; step S6: and performing text recognition on the text block, and extracting text content in the text block and a corresponding coordinate position. The method provided by the invention improves the text recognition accuracy of the complex drawing.
Description
Technical Field
The invention relates to the technical field of text drawing management of nuclear power plants, in particular to a text detection and identification method for large-size complex pdf engineering drawings.
Background
In the engineering field, a relationship between a drawing and text contents thereof is often required to be established so as to quickly query information such as material codes, component numbers and the like in the drawing and the corresponding drawing. In the past, most of the work is realized by manual means, the efficiency is low, and the cost of manpower resources is extremely high under the condition of processing text data of a large number of drawings. Therefore, a method for automatically identifying the text content of the drawing is needed to replace manual work, so that text extraction of a large amount of pdf drawings is completed, the labor cost is reduced, and the text extraction efficiency of engineering drawings is improved.
Currently, text recognition for drawings generally requires two steps: text detection and text recognition. The text detection aims at detecting a text area in a drawing, realizing the positioning of a text in an image and outputting position coordinate information corresponding to the text area; the text recognition aims at outputting corresponding texts aiming at text areas in drawings.
Therefore, the problems of high cost, low efficiency, complex drawing content and the like exist in the conventional drawing text extraction method.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a method for detecting and identifying the text of a large-size complex pdf engineering drawing, which is low in cost and high in efficiency.
In order to achieve the above purpose, the invention provides the following technical scheme:
a method for detecting and identifying a text of a large-size complex pdf engineering drawing comprises the following steps:
step S1: preprocessing pdf engineering drawings to generate corresponding high-resolution images;
step S2: cutting the high-resolution image into a plurality of low-resolution subgraphs, and recording the corresponding sequence of the subgraphs according to the positions;
step S3: carrying out first sub-image text detection, preliminarily positioning a text region range in the sub-image, and outputting position coordinates corresponding to the range;
step S4: mapping the position coordinates of the text regions in the sub-image to the original large image, removing repeated data in the large image, and acquiring corresponding text region images according to the position coordinates after the repetition removal;
step S5: performing second text detection, accurately positioning the text in the text area, and cutting a corresponding text block;
step S6: and performing text recognition on the text block, and extracting text content in the text block and a corresponding coordinate position.
In step S2, the high resolution image is cut into several low resolution sub-images by using sliding window cropping.
In step S3, the text detection of the sub-image is completed by using an advanced east method, and the rough position information of the text region in the sub-image is preliminarily obtained.
Step S4 includes:
step S41: mapping the coordinate position in the step S3 to the original high-resolution large image;
step S42: removing repeated data in the coordinate information;
step S43: and cutting the corresponding text area image according to the position coordinates after the duplication removal.
In step S5, the text region image obtained in step S4 is subjected to a second text detection, the text is accurately positioned, and a corresponding text image is cut out.
In step S6, a PaddleOCR text recognition scheme is used to complete text recognition of the text image obtained in step S5, and finally the text content and the corresponding image area coordinates are output.
Compared with the prior art, the text detection and identification method for the large-size complex pdf engineering drawing provided by the invention has the following beneficial effects:
the method provided by the invention can accurately detect the effective text area in the large-size complex PDF engineering drawing, including the coordinate information of the transverse text area and the coordinate information of the vertical text area, and accurately identify the text content in the effective text area.
In addition, through two continuous text detections, the adverse effect of interference of lines, patterns and the like on recognition is effectively avoided, and the text recognition accuracy of the complex drawings is improved.
Furthermore, the text detection and identification method is applied to large-size drawings in a sliding window blocking processing mode, and meanwhile the risk of continuous text interception is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a flowchart of a text detection and identification method for a large-size complex pdf engineering drawing according to an embodiment of the present invention.
Detailed Description
The following is a detailed description of the preferred embodiments.
The invention provides a text detection and identification method for a large-size complex pdf engineering drawing, which can be generally divided into four parts: firstly, processing PDF drawing, converting the PDF drawing into a high-resolution image, and orderly splitting the image into sub-images with fixed sizes. And secondly, performing text detection twice, and accurately positioning a text area. The method comprises the steps of detecting a subgraph for the first time, and finding out a rough region range with a text in the subgraph; and the second detection aims at the area detected for the first time, eliminates the interference existing in the area and accurately positions the text position. Processing text coordinates, mapping coordinates in the subgraph to a high-resolution big graph, and screening out repeated data in the coordinates; and fourthly, text recognition, namely recognizing the text content of the corresponding area according to the text detection result, and outputting the text content and the corresponding coordinate position thereof.
As shown in fig. 1, the method for detecting and identifying the text of the large-size complex pdf engineering drawing provided by the invention comprises the following steps:
step S1: preprocessing a pdf engineering drawing to generate a corresponding high-resolution image, such as a ten-million pixel level image of 3680x 2944;
step S2: and cutting the high-resolution image into a plurality of subgraphs with smaller sizes by using a sliding window cutting method, and recording corresponding sequence numbers of the subgraphs according to the transverse and longitudinal sliding times i and j of the cutting window. The specific method comprises the following steps: for the large graph of 3680x2944, each subgraph I i,j The width w and the height h of the sliding frame are both 736, the horizontal sliding step length delta x and the longitudinal sliding step length delta y are both 368, and finally 63 sub-graphs are obtained;
step S3: completing text detection of the subgraph by using an advanced east method, and primarily acquiring rough position information of a text region in the subgraph, wherein the rough position information is specifically represented as four vertexes of a rectangular text region and corresponds to 8 coordinate values (x) 0 ,y 0 )…(x 3 ,y 3 );
Step S4: mapping the position coordinates of the text region in the sub-graph to the original large graph, removing repeated data in the large graph, and acquiring a corresponding text region image according to the position coordinates after the repeated data are removed;
step S41: and mapping the coordinate position in the step S3 to the original high-resolution large graph, wherein the coordinate mapping formula is as follows:
X m =i*Δx+x m ,m=0,1,2,3;
Y n =j*Δy+y n ,n=0,1,2,3;
step S42: and removing repeated data in the coordinate information. Since the sub-graph is obtained by clipping using the sliding window in step S2, there is a case that the same text region is detected multiple times in the detection, so that multiple sets of coordinate information pointing to the same region in the original graph are obtained, and these repeated data need to be merged into one set of coordinate data. The repeated data merging judgment formula is as follows:
wherein S is i And representing the text area, if the text detection area contains the situation, combining the coordinates of the text area, and discarding the coordinates of the smaller area.
Step S43: and cutting the corresponding text area image according to the position coordinates after the duplication removal.
Step S5: and carrying out secondary text detection on the text area image subjected to the primary text detection, accurately positioning the text, and cutting out a corresponding text image. The secondary detection can effectively remove the interference of lines or patterns except the text content in the text region detected for the first time, realize more accurate text positioning and ensure the accuracy of subsequent identification.
Step S6: and performing text recognition on the accurate text region obtained by text detection by using a PaddleOCR text recognition scheme. Finally, the text content and the corresponding image area coordinate position are output.
The text detection method adopts an advanced EAST open source text detection scheme, takes a VGG16 network structure as a main network to extract pixel characteristics in a drawing, realizes multi-channel characteristic fusion by using modes of upsampling, convolution and the like, and predicts a text region according to the fusion characteristics. The text recognition portion uses the PaddleOCR open source text recognition scheme based on the CRNN model using CTC Loss as a Loss function.
The invention provides an application type basic technology, solves the problems of text detection and identification of PDF engineering drawings in scenes with large size (note: the whole PDF drawing cannot be directly used as an input source) and complex content (note: interference lines or patterns of horizontal texts, vertical texts and similar texts exist), and can provide technical support for relevant application of specific texts in large-size complex PDF engineering drawings, such as: identification of codes of device codes or material codes, code error correction recommendation, code location query, code file association, and the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.
Claims (6)
1. A text detection and identification method for a large-size complex pdf engineering drawing is characterized by comprising the following steps:
step S1: preprocessing a pdf engineering drawing to generate a corresponding high-resolution image;
step S2: cutting the high-resolution image into a plurality of low-resolution subgraphs, and recording the corresponding sequence of the subgraphs according to the positions;
step S3: carrying out first sub-image text detection, preliminarily positioning a text region range in the sub-image, and outputting position coordinates corresponding to the range;
step S4: mapping the position coordinates of the text regions in the sub-image to the original large image, removing repeated data in the large image, and acquiring corresponding text region images according to the position coordinates after the repetition removal;
step S5: performing second text detection, accurately positioning the text in the text area, and cutting a corresponding text block;
step S6: and performing text recognition on the text block, and extracting text content in the text block and a corresponding coordinate position.
2. The method for detecting and recognizing text in large-sized complex pdf engineering drawing according to claim 1, wherein in step S2, the high-resolution image is cut into several low-resolution sub-images by using sliding window cropping.
3. The method for detecting and recognizing the text of the large-sized complex pdf engineering drawing according to claim 1, wherein in step S3, the text detection of the sub-graph is completed by using an advanced east method, and rough position information of the text region in the sub-graph is obtained preliminarily.
4. The method for detecting and identifying the text of the large-size complex pdf engineering drawing according to claim 1, wherein the step S4 comprises:
step S41: mapping the coordinate position in the step S3 to the original high-resolution large image;
step S42: removing repeated data in the coordinate information;
step S43: and cutting the corresponding text area image according to the position coordinates after the duplication removal.
5. The method for detecting and recognizing text in large-sized complex pdf engineering drawing according to claim 1, wherein in step S5, the text region image obtained in step S4 is subjected to a second text detection, the text is precisely located, and the corresponding text image is cut out.
6. The method for detecting and recognizing text in large-sized complex pdf engineering drawing according to claim 1, wherein in step S6, using PaddleOCR text recognition scheme, completing text recognition of the text image obtained in step S5, and finally outputting the text content and the corresponding image area coordinates.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210735421.3A CN115035541A (en) | 2022-06-27 | 2022-06-27 | Large-size complex pdf engineering drawing text detection and identification method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210735421.3A CN115035541A (en) | 2022-06-27 | 2022-06-27 | Large-size complex pdf engineering drawing text detection and identification method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115035541A true CN115035541A (en) | 2022-09-09 |
Family
ID=83126782
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210735421.3A Pending CN115035541A (en) | 2022-06-27 | 2022-06-27 | Large-size complex pdf engineering drawing text detection and identification method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115035541A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570066A (en) * | 2024-08-02 | 2024-08-30 | 中国科学技术大学 | Text image super-resolution method and system based on mask reconstruction paradigm |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190266394A1 (en) * | 2018-02-26 | 2019-08-29 | Abc Fintech Co., Ltd. | Method and device for parsing table in document image |
CN111414906A (en) * | 2020-03-05 | 2020-07-14 | 北京交通大学 | Data synthesis and text recognition method for paper bill picture |
CN111860348A (en) * | 2020-07-21 | 2020-10-30 | 国网山东省电力公司青岛供电公司 | Deep learning-based weak supervision power drawing OCR recognition method |
CN112069985A (en) * | 2020-08-31 | 2020-12-11 | 华中农业大学 | High-resolution field image rice ear detection and counting method based on deep learning |
CN112633277A (en) * | 2020-12-30 | 2021-04-09 | 杭州电子科技大学 | Channel ship board detection, positioning and identification method based on deep learning |
CN113269049A (en) * | 2021-04-30 | 2021-08-17 | 天津科技大学 | Method for detecting handwritten Chinese character area |
WO2021190171A1 (en) * | 2020-03-25 | 2021-09-30 | 腾讯科技(深圳)有限公司 | Image recognition method and apparatus, terminal, and storage medium |
CN113569629A (en) * | 2021-06-11 | 2021-10-29 | 杭州玖欣物联科技有限公司 | Model method for extracting key information and desensitizing sensitive information of machining drawing |
CN114140803A (en) * | 2022-01-30 | 2022-03-04 | 杭州实在智能科技有限公司 | Document single word coordinate detection and correction method and system based on deep learning |
CN114170608A (en) * | 2021-12-01 | 2022-03-11 | 上海东普信息科技有限公司 | Super-resolution text image recognition method, device, equipment and storage medium |
CN114220091A (en) * | 2021-12-16 | 2022-03-22 | 广东电网有限责任公司 | Image text detection method and system based on fast Rcnn |
-
2022
- 2022-06-27 CN CN202210735421.3A patent/CN115035541A/en active Pending
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190266394A1 (en) * | 2018-02-26 | 2019-08-29 | Abc Fintech Co., Ltd. | Method and device for parsing table in document image |
CN111414906A (en) * | 2020-03-05 | 2020-07-14 | 北京交通大学 | Data synthesis and text recognition method for paper bill picture |
WO2021190171A1 (en) * | 2020-03-25 | 2021-09-30 | 腾讯科技(深圳)有限公司 | Image recognition method and apparatus, terminal, and storage medium |
CN111860348A (en) * | 2020-07-21 | 2020-10-30 | 国网山东省电力公司青岛供电公司 | Deep learning-based weak supervision power drawing OCR recognition method |
CN112069985A (en) * | 2020-08-31 | 2020-12-11 | 华中农业大学 | High-resolution field image rice ear detection and counting method based on deep learning |
CN112633277A (en) * | 2020-12-30 | 2021-04-09 | 杭州电子科技大学 | Channel ship board detection, positioning and identification method based on deep learning |
CN113269049A (en) * | 2021-04-30 | 2021-08-17 | 天津科技大学 | Method for detecting handwritten Chinese character area |
CN113569629A (en) * | 2021-06-11 | 2021-10-29 | 杭州玖欣物联科技有限公司 | Model method for extracting key information and desensitizing sensitive information of machining drawing |
CN114170608A (en) * | 2021-12-01 | 2022-03-11 | 上海东普信息科技有限公司 | Super-resolution text image recognition method, device, equipment and storage medium |
CN114220091A (en) * | 2021-12-16 | 2022-03-22 | 广东电网有限责任公司 | Image text detection method and system based on fast Rcnn |
CN114140803A (en) * | 2022-01-30 | 2022-03-04 | 杭州实在智能科技有限公司 | Document single word coordinate detection and correction method and system based on deep learning |
Non-Patent Citations (1)
Title |
---|
李昊 等: "基于深度学习和图匹配的接线图检测与校核", 北京航空航天大学学报, vol. 47, no. 3, 2 November 2020 (2020-11-02) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118570066A (en) * | 2024-08-02 | 2024-08-30 | 中国科学技术大学 | Text image super-resolution method and system based on mask reconstruction paradigm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111814722B (en) | Method and device for identifying table in image, electronic equipment and storage medium | |
CN110363102B (en) | Object identification processing method and device for PDF (Portable document Format) file | |
US8401333B2 (en) | Image processing method and apparatus for multi-resolution feature based image registration | |
RU2651144C2 (en) | Data input from images of the documents with fixed structure | |
US9230382B2 (en) | Document image capturing and processing | |
CN104751142B (en) | A kind of natural scene Method for text detection based on stroke feature | |
CN110781877B (en) | Image recognition method, device and storage medium | |
Chen et al. | Shadow-based Building Detection and Segmentation in High-resolution Remote Sensing Image. | |
CN111626145B (en) | Simple and effective incomplete form identification and page-crossing splicing method | |
CN105139042A (en) | Image identification method and system | |
CN112883926B (en) | Identification method and device for form medical images | |
CN110321750A (en) | Two-dimensional code identification method and system in a kind of picture | |
CN116052193B (en) | RPA interface dynamic form picking and matching method and system | |
CN112016481A (en) | Financial statement information detection and identification method based on OCR | |
CN116311259B (en) | Information extraction method for PDF business document | |
CN113688688A (en) | Completion method of table lines in picture and identification method of table in picture | |
CN115035541A (en) | Large-size complex pdf engineering drawing text detection and identification method | |
US8897538B1 (en) | Document image capturing and processing | |
CN109635729B (en) | Form identification method and terminal | |
CN101425143A (en) | Image positioning method and device | |
CN112364863B (en) | Character positioning method and system for license document | |
CN112232390B (en) | High-pixel large image identification method and system | |
CN117218672A (en) | Deep learning-based medical records text recognition method and system | |
CN115861922B (en) | Sparse smoke detection method and device, computer equipment and storage medium | |
CN101901333A (en) | Method for segmenting word in text image and identification device using same |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |