CN117172212A - Catalog extraction method and device in drawing, electronic equipment and storage medium - Google Patents
Catalog extraction method and device in drawing, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN117172212A CN117172212A CN202311031166.5A CN202311031166A CN117172212A CN 117172212 A CN117172212 A CN 117172212A CN 202311031166 A CN202311031166 A CN 202311031166A CN 117172212 A CN117172212 A CN 117172212A
- Authority
- CN
- China
- Prior art keywords
- catalog
- information
- target
- text information
- grouping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 29
- 238000000034 method Methods 0.000 claims abstract description 34
- 238000004891 communication Methods 0.000 claims description 13
- 238000004590 computer program Methods 0.000 claims description 9
- 238000013145 classification model Methods 0.000 claims description 7
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 4
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 239000004566 building material Substances 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application relates to a catalog extraction method, a device, electronic equipment and a storage medium in a drawing, which are applied to the technical field of computers, wherein the method comprises the following steps: determining a target table belonging to a drawing catalog in a drawing to be identified; judging whether line segment information exists in the target table; if not, extracting first text information in the target table; classifying the first text information to obtain at least one classification result; based on the classification result, carrying out longitudinal grouping and transverse grouping on the first text information to obtain a grouping result; determining a table structure of the target table based on the grouping result; and extracting the target table based on the table structure and the first text information to obtain the target table. The method solves the problems that in the prior art, the recognition workload is large, line segment and cell information is excessively depended, and non-standard tables and non-standard conditions cannot be compatible.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and apparatus for extracting a catalog in a drawing, an electronic device, and a storage medium.
Background
Forms are common content of text. In operation it is often necessary to convert the form in the picture into an editable file format. Manual input is the simplest method, but this method is inefficient in handling large numbers of forms and is more prone to error.
In the related art, a method for identifying a drawing catalog is generally used, in which after a table image is acquired by using an image acquisition device, ocr identification and straight line detection are performed on each table image, and then a line segment feature is used to calculate and extract a cell and an internal text so as to identify the table.
However, the above-described recognition method is not only large in recognition workload, but also excessively depends on line segment and cell information, and cannot be compatible with nonstandard tables and nonstandard cases.
Disclosure of Invention
The application provides a catalog extraction method, a device, electronic equipment and a storage medium in a drawing, which are used for solving the problems that in the prior art, the recognition workload is large, line segment and cell information are excessively depended, and non-standard forms and non-standard conditions cannot be compatible.
In a first aspect, an embodiment of the present application provides a method for extracting a catalog in a drawing, including:
determining a target table belonging to a drawing catalog in a drawing to be identified;
judging whether line segment information exists in the target table;
if not, extracting first text information in the target table;
classifying the first text information to obtain at least one classification result;
based on the classification result, carrying out longitudinal grouping and transverse grouping on the first text information to obtain a grouping result;
determining a table structure of the target table based on the grouping result;
and extracting the target table based on the table structure and the first text information to obtain the target table.
Optionally, the determining the target table belonging to the drawing catalog in the drawing to be identified includes:
determining an initial form in the drawing to be identified;
extracting second text information and coordinate information in the initial table;
a target form in the initial form is determined based on the second text information and the coordinate information.
Optionally, the classifying the first text information to obtain at least one classification result includes:
inputting the first text information into a text classification model, and outputting the classification of each piece of the first text information through the text classification model to obtain the classification result.
Optionally, the performing, based on the classification result, vertical grouping and horizontal grouping on the first text information to obtain a grouping result includes:
determining first position information of each piece of first text information;
and carrying out longitudinal grouping and transverse grouping on the first text information according to the first position information.
Optionally, after the determining whether the line segment information exists in the target table, the method further includes:
if the line segment information exists, integrating the line segment information included in the target catalog to obtain target line segment information;
and determining a table structure of the drawing catalog based on the target line segment information.
Optionally, integrating the line segment information included in the target directory to obtain target line segment information, including:
judging whether any two line segments are relatively overlapped or not based on the line segment information;
if so, merging the first end points at one ends of the two overlapped line segments to enable the two line segments to be merged to obtain the target line segment information; and/or the number of the groups of groups,
judging whether the distance between the second endpoints of any two line segments is within a preset range or not based on the line segment information;
if yes, merging the second endpoints to obtain the target line segment information.
Optionally, extracting the target table based on the table structure and the first text information, and after obtaining the target table, further includes:
determining sequence number information in the first text information;
if the sequence number information is discontinuous, determining the missing sequence number, and supplementing the missing sequence number in the target table.
In a second aspect, an embodiment of the present application provides a catalog extraction apparatus in a drawing, including:
the acquisition module is used for determining a target table belonging to a drawing catalog in the drawing to be identified;
the judging module is used for judging whether line segment information exists in the target table;
the first extraction module is used for extracting the first text information in the target table if the first text information does not exist;
the classification module is used for classifying the first text information to obtain at least one classification result;
the grouping module is used for longitudinally grouping and transversely grouping the first text information based on the classification result to obtain a grouping result;
a determining module, configured to determine a table structure of the target table based on the grouping result;
and the second extraction module is used for extracting the target table based on the table structure and the first text information to obtain the target table.
In a third aspect, an embodiment of the present application provides an electronic device, including: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory is used for storing a computer program;
the processor is configured to execute the program stored in the memory, and implement the catalog extraction method in the drawing according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the method for extracting a catalog in a drawing according to the first aspect.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages: according to the method provided by the embodiment of the application, the target table belonging to the drawing catalog in the drawing to be identified is determined; judging whether line segment information exists in the target table; if not, extracting first text information in the target table; classifying the first text information to obtain at least one classification result; based on the classification result, carrying out longitudinal grouping and transverse grouping on the first text information to obtain a grouping result; determining a table structure of the target table based on the grouping result; and extracting the target table based on the table structure and the first text information to obtain the target table. Therefore, the problem of large calculation amount caused by extracting all tables and determining the drawing catalogue can be avoided by firstly determining the target table belonging to the drawing catalogue in the drawing to be identified and then extracting the target table. In addition, the first text information in the target table is processed, so that the table structure of the target table can be determined, the target table is extracted by utilizing the table structure and the first text information, the extraction of the target table can be realized without relying on line segments, and the problems of nonstandard tables and nonstandard conditions can be compatible.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.
Fig. 1 is an application scenario diagram of a catalog extraction method in a drawing provided by an embodiment of the present application;
FIG. 2 is a flowchart of a method for extracting a catalog in a drawing according to an embodiment of the present application;
FIG. 3 is a flowchart of a method for extracting a catalog in a drawing according to another embodiment of the present application;
FIG. 4 is a block diagram of a catalog extraction apparatus in a drawing provided by an embodiment of the present application;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
According to an embodiment of the application, a catalog extraction method in a drawing is provided. Alternatively, in the embodiment of the present application, the catalog extraction method in the drawing may be applied to a hardware environment composed of the terminal 101 and the server 102 as shown in fig. 1. As shown in fig. 1, the server 102 is connected to the terminal 101 through a network, which may be used to provide services (such as application services, etc.) to the terminal or clients installed on the terminal, and a database may be provided on the server or independent of the server, for providing data storage services to the server 102, where the network includes, but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, or the like.
The catalog extraction method in the drawing according to the embodiment of the present application may be executed by the server 102, may be executed by the terminal 101, or may be executed by both the server 102 and the terminal 101. The terminal 101 may execute the catalog extraction method in the drawing according to the embodiment of the present application, or may be executed by a client installed thereon.
Taking a terminal to execute the catalog extraction method in the drawing according to the embodiment of the present application as an example, fig. 2 is a schematic flow chart of an alternative catalog extraction method in the drawing according to the embodiment of the present application, as shown in fig. 2, the flow of the method may include the following steps:
step 201, determining a target table belonging to a drawing catalog in a drawing to be identified;
step 202, extracting first text information in the target table;
step 203, judging whether line segment information exists in the target table; if there is no execution step 204, if there is execution step 208;
step 204, classifying the first text information to obtain at least one classification result;
step 205, based on the classification result, performing longitudinal grouping and transverse grouping on the first text information to obtain a grouping result;
step 206, determining a table structure of the target table based on the grouping result;
and step 207, extracting the target table based on the table structure and the first text information to obtain the target table.
In some embodiments, after determining the target table belonging to the drawing catalog in the drawing to be identified, extracting the target table can avoid the problem of large calculation amount caused by extracting all tables and determining the drawing catalog. In addition, the first text information in the target table is processed, so that the table structure of the target table can be determined, the target table is extracted by utilizing the table structure and the first text information, the extraction of the target table can be realized without relying on line segments, and the problems of nonstandard tables and nonstandard conditions can be compatible.
The primitive structure in the drawing is multiple in level, various special structures such as blocks, view ports and the like exist, non-frozen primitives which need to be positioned to a visual layer are traversed and nested according to the multi-level primitive structure of the CAD original file, and primitive information of all line segment types and text types is obtained.
In an alternative embodiment, in the case where there is line segment information, the method further includes:
step 208, integrating the line segment information included in the target directory to obtain target line segment information; and determining a table structure of the drawing catalog based on the target line segment information.
And after step 208, step 207 is performed.
In an optional embodiment, integrating the line segment information included in the target directory to obtain target line segment information includes:
judging whether any two line segments are relatively overlapped or not based on the line segment information;
if so, merging the first end points at one ends of the two overlapped line segments to enable the two line segments to be merged to obtain the target line segment information; and/or the number of the groups of groups,
judging whether the distance between the second endpoints of any two line segments is within a preset range or not based on the line segment information;
if yes, merging the second endpoints to obtain the target line segment information.
In some embodiments, a complete line segment table structure exists for a portion of the CAD catalog, but it is also possible that there are no vertical line segments, only horizontal lines, or no line segments at all. Therefore, if it is not determined whether the internal structure of the table is complete, an extraction attempt of the internal structure of the table needs to be performed first, if the standard table structure is found, the extraction attempt is utilized, and if the standard table structure is not found, other steps are relied on.
Specifically, first, line segment error correction is performed, and there may be problems of missing points, intersecting points that appear to intersect in reality, repeated line segments, and the like in the CAD original line segment. The alignment angle difference is controlled by a threshold value (for example, 0.5 degrees) by performing the merging process on the alignment lines where the alignment lines partially overlap. Next, it is calculated whether or not the perpendicularly intersecting line segments have an error at the intersection point, and the straight line extension or truncation process is performed on the intersection error within a certain threshold value (for example, may be 0.001 meter). And finally, merging the endpoints close to each other.
Breaking the intersecting lines from the intersection points to form a plurality of lines, and finally merging the endpoints with close distances. After corrected and perfected line segment data are obtained, a minimum closed polygon algorithm is used for closed rectangle recognition, and each closed rectangle is a final cell.
In an optional embodiment, the determining the target table belonging to the drawing catalog in the drawing to be identified includes:
determining an initial form in the drawing to be identified;
extracting second text information and coordinate information in the initial table;
a target form in the initial form is determined based on the second text information and the coordinate information.
In some embodiments, various table structures exist in CAD drawings, but not every table is a drawing catalog, the table contents may be a title bar, a layer list, a building material list, a symbol table, a door and window list, etc., and a simple keyword cannot be relied on to determine whether the drawing catalog is the drawing catalog, so that a type determination is required for the suspected table contents.
And judging the type of the table, namely using deep learning, inputting all text and coordinate information in a frame range of line segment connection, and outputting the type of the table.
Firstly, text information is encoded into text features through a hidden layer, then the coordinate position of the text is converted into relative position encoding based on the outer frame of the table, and the two features are input. And processing the characteristics by using a transducer model, then using a linear classifier, and finally outputting a classification result to judge whether the classification result is a drawing catalog.
Further, the determination of the target table may use keyword or rule matching in addition to deep learning.
In an alternative embodiment, classifying the first text information to obtain at least one classification result includes:
inputting the first text information into a text classification model, and outputting the classification of each piece of the first text information through the text classification model to obtain the classification result.
In some embodiments, since the text within the rectangular box of the directory is not necessarily all directory line content, there may be headers, title bars, personnel signatures, and other auxiliary types of text information, so that a determination of the content type of the text is required. For a catalog in a CAD drawing, the most important is in the catalog line: serial number, figure number and figure name. The link utilizes a text classification model to identify all texts by serial numbers, figure names and other four types, and the network structure uses bert coding and mlp multi-layer perception classification prediction so as to obtain a classification result of the first text information.
The recognition of sequence numbers, picture names and picture numbers can also use regular expressions for text matching besides deep learning.
In an alternative embodiment, based on the classification result, the first text information is vertically grouped and horizontally grouped to obtain a grouping result, which includes:
determining first position information of each piece of first text information;
and carrying out longitudinal grouping and transverse grouping on the first text information according to the first position information.
In an alternative embodiment, after extracting the target table based on the table structure and the first text information, the method further includes:
determining sequence number information in the first text information;
if the sequence number information is discontinuous, determining the missing sequence number, and supplementing the missing sequence number in the target table.
In some embodiments, referring to fig. 3, the extracted text classification result data is grouped by column according to cross-column information. If the table data is previously extracted, column-wise grouping is performed using the natural spatial structure of the table data, and filtering is performed with reference to header information. If the table is not extracted, the classified text data is utilized to carry out self-organizing grouping (vertical direction), and filtering, checking splitting and merging are carried out through the table header and the type information.
And adopting a voting mechanism to be compatible under the condition of no header, and finally, carrying out intra-group sequencing to generate effective serial number, figure number and figure name grouping data. And grouping and pairing (transversely) effective serial numbers, drawing numbers and drawing names according to the line of the centroid of the text circumscribed rectangle, performing secondary detection according to the matching relative position and distance, and finally pairing to generate directory line object data.
The directory line matching result is automatically supplemented with abnormal sequence numbers, because the sequence numbers are generally continuous numbers and letters, a text searching range is obtained through the directory line with abnormal sequence numbers, and abnormal sequence numbers are supplemented by using the same line of the centroid.
The catalog line of the middle leakage is retrieved by acquiring the middle automatic retrieving range, the catalog line of the tail leakage is retrieved by the tail retrieving range, and then the catalog line data are subjected to global ordering to output a final result.
Compared with the existing method, the catalog extraction method in the drawing provided by the application has the advantages that the type judgment is carried out by using deep learning in advance, so that the identification of non-catalog forms is avoided. The actual semantics of the text content are judged first, so that invalid interference text can be prevented from being added into the catalogue.
By judging the form types in advance, the generation of invalid data can be reduced to a limited extent, and irrelevant form information is prevented from being misidentified as a catalog. By means of deep learning matching rule pairing, dependence on line data is low, and more irregular conditions can be compatible. Deep learning can avoid that an interference text is recognized as effective information, catalog start and stop can be well judged, and rule pairing can solve some problems of omission and special condition compatibility by means of post-processing. The combination of the two can obtain better effect.
Based on the same conception, the embodiment of the present application provides a catalog extraction device in a drawing, and the specific implementation of the device may be referred to the description of the embodiment of the method, and the repetition is omitted, as shown in fig. 4, where the device mainly includes:
an obtaining module 401, configured to determine a target table belonging to a drawing catalog in a drawing to be identified;
a judging module 402, configured to judge whether line segment information exists in the target table;
a first extracting module 403, configured to extract, if not present, first text information in the target table;
the classification module 404 is configured to classify the first text information to obtain at least one classification result;
a grouping module 405, configured to perform a vertical grouping and a horizontal grouping on the first text information based on the classification result, so as to obtain a grouping result;
a determining module 406, configured to determine a table structure of the target table based on the grouping result;
and a second extraction module 407, configured to extract the target table based on the table structure and the first text information, so as to obtain the target table.
Based on the same conception, the embodiment of the application also provides an electronic device, as shown in fig. 5, which mainly comprises: processor 501, memory 502 and communication bus 503, wherein processor 501 and memory 502 accomplish the communication between each other through communication bus 503. The memory 502 stores a program executable by the processor 501, and the processor 501 executes the program stored in the memory 502 to implement the following steps:
determining a target table belonging to a drawing catalog in a drawing to be identified;
judging whether line segment information exists in the target table;
if not, extracting first text information in the target table;
classifying the first text information to obtain at least one classification result;
based on the classification result, carrying out longitudinal grouping and transverse grouping on the first text information to obtain a grouping result;
determining a table structure of the target table based on the grouping result;
and extracting the target table based on the table structure and the first text information to obtain the target table.
The communication bus 503 mentioned in the above electronic device may be a peripheral component interconnect standard (Peripheral Component Interconnect, abbreviated to PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, abbreviated to EISA) bus, or the like. The communication bus 503 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 5, but not only one bus or one type of bus.
The memory 502 may include random access memory (Random Access Memory, simply RAM) or may include non-volatile memory (non-volatile memory), such as at least one disk memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor 501.
The processor 501 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), a digital signal processor (Digital Signal Processing, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a Field programmable gate array (Field-Programmable Gate Array, FPGA), or other programmable logic device, discrete gate or transistor logic device, or discrete hardware components.
In yet another embodiment of the present application, there is also provided a computer-readable storage medium having stored therein a computer program which, when run on a computer, causes the computer to execute the catalog extraction method in the drawing described in the above embodiment.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When the computer instructions are loaded and executed on a computer, the processes or functions described in accordance with embodiments of the present application are produced in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, by a wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, microwave, etc.) means from one website, computer, server, or data center to another. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape, etc.), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), etc.
It should be noted that in this document, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing is only a specific embodiment of the application to enable those skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (10)
1. The catalog extraction method in the drawing is characterized by comprising the following steps of:
determining a target table belonging to a drawing catalog in a drawing to be identified;
extracting first text information in the target table;
judging whether line segment information exists in the target table;
if not, classifying the first text information to obtain at least one classification result;
based on the classification result, carrying out longitudinal grouping and transverse grouping on the first text information to obtain a grouping result;
determining a table structure of the target table based on the grouping result;
and extracting the target table based on the table structure and the first text information to obtain the target table.
2. The method for extracting a catalog from a drawing according to claim 1, wherein the determining a target form belonging to the catalog of the drawing in the drawing to be identified comprises:
determining an initial form in the drawing to be identified;
extracting second text information and coordinate information in the initial table;
a target form in the initial form is determined based on the second text information and the coordinate information.
3. The method for extracting a catalog from a drawing according to claim 1, wherein the classifying the first text information to obtain at least one classification result comprises:
inputting the first text information into a text classification model, and outputting the classification of each piece of the first text information through the text classification model to obtain the classification result.
4. The method for extracting a catalog from a drawing according to claim 1, wherein the step of vertically grouping and horizontally grouping the first text information based on the classification result to obtain a grouping result includes:
determining first position information of each piece of first text information;
and carrying out longitudinal grouping and transverse grouping on the first text information according to the first position information.
5. The method for extracting a catalog in a drawing according to claim 1, wherein after determining whether line segment information exists in the target table, further comprising:
if the line segment information exists, integrating the line segment information included in the target catalog to obtain target line segment information;
and determining a table structure of the drawing catalog based on the target line segment information.
6. The method for extracting a catalog from a drawing of claim 5, wherein integrating the line segment information included in the target catalog to obtain target line segment information comprises:
judging whether any two line segments are relatively overlapped or not based on the line segment information;
if so, merging the first end points at one ends of the two overlapped line segments to enable the two line segments to be merged to obtain the target line segment information; and/or the number of the groups of groups,
judging whether the distance between the second endpoints of any two line segments is within a preset range or not based on the line segment information;
if yes, merging the second endpoints to obtain the target line segment information.
7. The method for extracting a catalog in a drawing according to claim 1, wherein the extracting the target form based on the form structure and the first text information, after obtaining the target form, further comprises:
determining sequence number information in the first text information;
if the sequence number information is discontinuous, determining the missing sequence number, and supplementing the missing sequence number in the target table.
8. A catalog extraction apparatus in a drawing sheet, comprising:
the acquisition module is used for determining a target table belonging to a drawing catalog in the drawing to be identified;
the judging module is used for judging whether line segment information exists in the target table;
the first extraction module is used for extracting the first text information in the target table if the first text information does not exist;
the classification module is used for classifying the first text information to obtain at least one classification result;
the grouping module is used for longitudinally grouping and transversely grouping the first text information based on the classification result to obtain a grouping result;
a determining module, configured to determine a table structure of the target table based on the grouping result;
and the second extraction module is used for extracting the target table based on the table structure and the first text information to obtain the target table.
9. An electronic device, comprising: the device comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory is used for storing a computer program;
the processor is configured to execute the program stored in the memory, and implement the catalog extraction method in the drawing according to any one of claims 1 to 7.
10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the method for extracting a catalog in a drawing according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311031166.5A CN117172212A (en) | 2023-08-15 | 2023-08-15 | Catalog extraction method and device in drawing, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311031166.5A CN117172212A (en) | 2023-08-15 | 2023-08-15 | Catalog extraction method and device in drawing, electronic equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117172212A true CN117172212A (en) | 2023-12-05 |
Family
ID=88946026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311031166.5A Pending CN117172212A (en) | 2023-08-15 | 2023-08-15 | Catalog extraction method and device in drawing, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117172212A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117390812A (en) * | 2023-12-11 | 2024-01-12 | 江西少科智能建造科技有限公司 | CAD drawing warm ventilation pipe structured information extraction method and system |
-
2023
- 2023-08-15 CN CN202311031166.5A patent/CN117172212A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117390812A (en) * | 2023-12-11 | 2024-01-12 | 江西少科智能建造科技有限公司 | CAD drawing warm ventilation pipe structured information extraction method and system |
CN117390812B (en) * | 2023-12-11 | 2024-03-08 | 江西少科智能建造科技有限公司 | CAD drawing warm ventilation pipe structured information extraction method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114821622B (en) | Text extraction method, text extraction model training method, device and equipment | |
EP3117369B1 (en) | Detecting and extracting image document components to create flow document | |
CN111931774B (en) | Method and system for warehousing medicine data | |
CN107204960B (en) | Webpage identification method and device and server | |
CN110222695B (en) | Certificate picture processing method and device, medium and electronic equipment | |
CN117172212A (en) | Catalog extraction method and device in drawing, electronic equipment and storage medium | |
CN111627015A (en) | Small sample defect identification method, device, equipment and storage medium | |
WO2020056968A1 (en) | Data denoising method and apparatus, computer device, and storage medium | |
CN116935430A (en) | Picture frame identification method and device, electronic equipment and storage medium | |
CN109299205B (en) | Method and device for warehousing spatial data used by planning industry | |
CN114005126A (en) | Table reconstruction method and device, computer equipment and readable storage medium | |
CN114780746A (en) | Knowledge graph-based document retrieval method and related equipment thereof | |
CN113076961B (en) | Image feature library updating method, image detection method and device | |
CN114448664A (en) | Phishing webpage identification method and device, computer equipment and storage medium | |
CN113704184A (en) | File classification method, device, medium and equipment | |
EP3564833A1 (en) | Method and device for identifying main picture in web page | |
CN112199499A (en) | Text division method, text classification method, device, equipment and storage medium | |
CN111611388A (en) | Account classification method, device and equipment | |
CN117009968A (en) | Homology analysis method and device for malicious codes, terminal equipment and storage medium | |
US9530070B2 (en) | Text parsing in complex graphical images | |
CN114996360B (en) | Data analysis method, system, readable storage medium and computer equipment | |
CN114969439A (en) | Model training and information retrieval method and device | |
CN114627462A (en) | Chemical formula identification method and device, computer equipment and storage medium | |
CN114417860A (en) | Information detection method, device and equipment | |
CN113989632A (en) | Bridge detection method and device for remote sensing image, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |