CN115995092A

CN115995092A - Drawing text information extraction method, device and equipment

Info

Publication number: CN115995092A
Application number: CN202310128345.4A
Authority: CN
Inventors: 邹军利
Original assignee: Shanghai Bangtu Information Technology Co ltd
Current assignee: Shanghai Bangtu Information Technology Co ltd
Priority date: 2023-02-16
Filing date: 2023-02-16
Publication date: 2023-04-21

Abstract

The application relates to a drawing text information extraction method, device and equipment. Aiming at the drawing to be processed, carrying out region division on the drawing to be processed based on the difference of drawing information expression modes to obtain a division result of a text description region and a form region; based on possible differences of text semantic association characteristics of each region, respectively processing by adopting different text semantic relation analysis technologies; and then, based on the respective output results of the two technologies, the comprehensive processing is carried out to obtain the final extraction result of the text information of the drawing to be processed, so that the key information which needs to be checked in the building drawing is extracted efficiently and accurately, and the drawing checking efficiency is improved.

Description

Drawing text information extraction method, device and equipment

Technical Field

The application relates to the field of character recognition processing, in particular to a drawing character information extraction method, a drawing character information extraction device and drawing character information extraction equipment.

Background

In the engineering construction field, after the design of the building drawing is finished, the building drawing is not directly used for building construction, and an auditing department related to reporting of the building drawing is required to examine whether the design drawing meets related standard points or other related requirements of the building industry, and only the building drawing passing the examination can be used for production construction.

Purely manual examination is required to rely on examination experience and capability of examination staff, examination efficiency is low, and especially large-scale work Cheng Xiangmu is high in drawing content, time and labor are wasted, and semi-automatic examination of the drawing is favored at present. The semi-automatic examination of the drawing requires extracting and identifying the important expression information of the drawing, then outputting the important expression information in the form of QA (query Answer) pairs, and automatically judging whether the relevant standard key points are met or displaying and outputting the important expression information to an examination personnel for manual examination or rechecking, so that the workload of manual examination is reduced.

At present, important expression information in a drawing is mainly extracted through text recognition OCR (optical character recognition) and regularization, however, the layout patterns of a drawing text region and a table region are numerous, the regularization standard needs to be arranged in advance, and in addition, all the layout patterns cannot be exhausted theoretically, so that drawing text information of certain patterns cannot be extracted effectively through regularization, and meanwhile, the prior arrangement of the regularization standard is a time-consuming and labor-consuming process.

Disclosure of Invention

In order to solve the problems that the current drawing examination cannot effectively extract drawing text information in certain patterns and is time-consuming and labor-consuming, the application provides a drawing text information extraction method, device and equipment.

In a first aspect, the drawing text information extraction method provided by the application adopts the following technical scheme:

a drawing text information extraction method comprises the following steps:

obtaining a region division result of a drawing to be processed; the area comprises a text description area and a table area;

inputting first data to be processed corresponding to the text description area into an NLP (Natural Language Processing ) algorithm model, and outputting first extraction information containing a plurality of question answer pairs; the first data to be processed comprises data corresponding to a set first data type in the text description area;

inputting second to-be-processed data corresponding to the table area into a multi-mode algorithm model, and outputting second extraction information containing a plurality of question answer pairs; the second data to be processed comprises data corresponding to a set second data type in the table area;

based on the first extraction information and the second extraction information, a final extraction result of the text information of the drawing to be processed is obtained, and display is output.

By adopting the technical scheme, aiming at the drawing to be processed, carrying out region division on the drawing to be processed based on the difference of drawing information expression modes, and obtaining a division result of a text description region and a form region; based on possible differences of text semantic association characteristics of each region, respectively processing by adopting different text semantic relation analysis technologies; and then, based on the respective output results of the two technologies, the comprehensive processing is carried out to obtain the final extraction result of the text information of the drawing to be processed, so that the key information (namely the question answer pair) to be checked in the building drawing is extracted efficiently and accurately, and the drawing checking efficiency is improved.

Optionally, the model architecture of the NLP algorithm model includes m LSTM (long short-term memory) long-term memory networks, m Multi-Head Attention networks, m layerrn layer normalization networks, m Conv1 x 1 convolution kernels, and m ADD feature fusion networks; and m is a natural number greater than or equal to 2.

Optionally, the model architecture of the NLP algorithm model is formed by sequentially connecting m parsing modules in series, and one parsing module comprises one LSTM long-short-term memory network, 1 Multi-Head Attention network, 1 layerrnom layer normalization network and 1 Conv1 x 1 convolution kernel which are sequentially connected in series; the output of the LSTM long-short-term memory network is transmitted to the Conv1 x 1 convolution kernel through an ADD feature fusion network without difference; and m is [5,15].

Optionally, the multi-modal algorithm model includes an encoder composed of a Tranformer block module, a Mobile-ViT (Mobile Vision Transformer) network, a PAN (Pyramid Attention Network for Semantic Segmentation ) semantic segmentation network and a Concat feature fusion network, and a decoder composed of a Concat feature fusion network, a Bi-LSTM bidirectional long-short term memory network and a CRF layer.

Optionally, the encoder includes two branches, wherein one branch is formed by sequentially connecting n Tranformer block modules in series; the other branch is composed of a Mobile-ViT network and a PAN semantic segmentation network; and carrying out feature fusion coding on the output of the two branches through the Concat feature fusion network.

Optionally, the decoder sequentially processes the output of the Concat feature fusion network through K Bi-LSTM (Bi-directional Long Short-Term Memory, which is formed by combining forward LSTM and backward LSTM and can capture two-way semantic dependence) two-way long-short-Term Memory network and P CRF layer prediction to obtain the second extraction information; the K is a natural number greater than or equal to 2, and the P is a natural number greater than or equal to 2; the K is equal to the P.

Optionally, the set first data type includes a text data type, and the set second data type includes a text data type and a form layout image data type.

Optionally, the obtaining the final text information extraction result of the to-be-processed drawing based on the first extraction information and the second extraction information includes:

determining, for each of the question answer pairs in the second extraction information, a first cell coordinate corresponding to a question of the question answer pair and a second cell coordinate corresponding to an answer;

calculating a row coordinate difference value and a column coordinate difference value of the first cell coordinate and the second cell coordinate;

judging whether the sum of the row coordinate difference value and the column coordinate difference value is smaller than a set threshold value or not;

if yes, judging that the answer pair of the question is normal; if not, judging that the answer pair of the question is abnormal, and marking the answer pair of the question abnormally;

and dividing the question answers in the first extraction information and the second extraction information into a normal question answer pair and an abnormal question answer pair, and obtaining a final extraction result of the text information of the drawing to be processed for output display.

By adopting the technical scheme, the answer pair of the questions output by the multi-mode algorithm model is further detected according to the cell coordinates, so that the accuracy of the final extraction result is improved.

In a second aspect, the drawing text information extraction device provided by the application adopts the following technical scheme:

a drawing text information extraction device, comprising:

the acquisition module is used for acquiring the region division result of the drawing to be processed; the area comprises a text description area and a table area;

the NLP algorithm model is used for carrying out prediction processing on the first data to be processed corresponding to the text description area and outputting first extraction information containing a plurality of question answer pairs; the first data to be processed comprises data corresponding to a set first data type in the text description area;

the multi-mode algorithm model is used for carrying out prediction processing on the second data to be processed corresponding to the table area and outputting second extraction information containing a plurality of question answer pairs; the second data to be processed comprises data corresponding to a set second data type in the table area;

and the comprehensive processing module is used for obtaining a final text information extraction result of the drawing to be processed based on the first extraction information and the second extraction information, and outputting and displaying the final text information extraction result.

In a third aspect, the drawing text information extraction device provided by the present application adopts the following technical scheme:

the drawing text information extraction device comprises a processor, a memory and a computer program stored in the memory and capable of running on the processor, wherein the drawing text information extraction method is realized when the processor executes the computer program.

In summary, the present application includes at least the following beneficial technical effects:

1. the key information which needs to be checked in the building drawing is extracted efficiently and accurately, and the improvement of the drawing checking efficiency is facilitated.

2. And further detecting the question answer pair output by the multi-mode algorithm model according to the cell coordinates, so that the accuracy of the final extraction result is improved.

Drawings

FIG. 1 is a flow chart diagram of a drawing text information extraction method in an embodiment of the application;

FIG. 2 is a block diagram of an NLP algorithm model architecture in an embodiment of the present application;

FIG. 3 is a block diagram of a multimodal algorithm model architecture in an embodiment of the present application;

FIG. 4 is a schematic diagram of a text description area of a drawing in an embodiment of the present application;

FIG. 5 is a diagram of a drawing sheet area in an embodiment of the present application;

FIG. 6 is a block diagram of a drawing text information extraction device in an embodiment of the present application;

fig. 7 is a block diagram of a drawing text information extraction apparatus according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The embodiment of the application discloses a drawing text information extraction method, which can be implemented by a software system, wherein the software system can be installed on hardware equipment to implement operation, and referring to fig. 1, the drawing text information extraction method mainly comprises the following steps:

s101, obtaining a region division result of the drawing to be processed.

In an alternative embodiment of the present application, the drawing to be processed supports multiple format inputs, such as CAD format, PDF format, or picture format.

Drawing information is fusion of different types of data, and a single line image is difficult to completely express the thought or design content of a designer and needs to be expressed in other forms. Information such as fire resistance level, room noise, total building area, etc. is difficult to be represented schematically by images, so that building designers typically use explanatory frames to describe or display them in text or in tables. This makes it easier to express important information of the architectural design.

In an alternative embodiment of the application, the drawing to be processed is divided into regions based on different drawing information expression modes. Therefore, the text information can be extracted based on the information expression attribute difference of the respective areas better by adopting a more suitable analysis technology, and the accuracy and the efficiency of text information extraction are improved.

In the embodiment of the present application, the area of the drawing to be processed may include a text description area, a table area and a line image area. It should be understood that the drawing to be processed may be different, and the region division result may be different, that is, the information expression mode may be different.

It should be noted that, the division of the drawing areas may be implemented by any existing technologies and combinations thereof, such as image recognition, form recognition, text recognition, and the like, which are not limited thereto. The text information extraction and processing method is mainly based on the obtained region division result and then carries out targeted text information extraction and processing.

S102, inputting first data to be processed corresponding to a text description area into an NLP algorithm model, and outputting first extraction information containing a plurality of question answer pairs; wherein the first data to be processed includes data corresponding to the set first data type in the text description area.

In the embodiment of the application, aiming at the data of the text description area, an NLP algorithm model is adopted to extract text semantics, so that classification and matching output of answers to questions are realized.

The first data type is set to be a text data type (e.g. text format), i.e. the first data to be processed is text data contained in the text description area. And carrying out text recognition on the text description area of the drawing to be processed to obtain the first data to be processed.

In order to more efficiently identify and acquire text data, in an alternative embodiment of the present application, quick identification and export of primitives may be implemented by calling a CAD primitive export interface, so as to acquire text data.

It should be understood that points, lines, arcs, spline curves, text, etc. may be referred to as primitives, and that all individual objects in CAD may be considered primitives, which may be a straight line segment line, a circle c, an arc, a multi-segment line pl, a single line text, a multi-line text, etc.

The derived graphic elements can be divided into line segment graphic elements and text graphic elements, and the text graphic elements correspond to text data, so that the text data can be rapidly identified and derived. Of course, in other alternative embodiments of the present application, the text data may be obtained in other existing manners, which will not be described herein.

Aiming at text data in a text description area, analyzing and processing the text data by adopting an NLP algorithm model obtained by the pre-improvement and learning training of the application, realizing the association analysis of each text semantic in the text data, and classifying each text to judge whether the text belongs to a question or an answer; after classifying each text, carrying out semantic matching on each text, mining answers corresponding to each question, realizing the associated mapping of the answers of the questions, and outputting first extraction information containing a plurality of answer pairs of the questions.

It should be noted that, the question answer pair, that is, one question maps one answer, in the process of checking the drawing, the essence is to extract the question answer pair to judge whether to meet the design specification or requirement, thereby realizing the purpose of checking the drawing.

For example, "fire-fighting lane" corresponds to a "QUESTION" (query), and "clear width, clear height greater than 4 meters" corresponds to an "ANSWER" (ANSWER), which forms a QUESTION ANSWER pair. For another example, the "engineering site" corresponds to a "question", the "five-lotus road in the sun city in the south and the" jining road in the west "corresponds to an" answer ", and both also constitute a question answer pair. Much of the information in the drawing is presented in this manner. Therefore, by automatically extracting the answer pairs of the questions in the drawings, examination staff are not required to search, screen and confirm in a large amount of drawing information one by one, and the efficiency of drawing examination is improved.

In the field of natural language processing, semantic relation extraction is an important semantic processing task of NLP technology, but the NLP technology adopted at present still depends on keywords, and a parser and a named entity identifier (NER) are relied on to obtain high-level characteristics; however, in the building drawing, important information can appear at any position of sentences, any typesetting is performed, and the description lengths of the sentences are different; the extraction algorithm based on the NLP technology adopted at present has poor performance on the efficiency and accuracy of long text relation extraction. Based on the method, an NLP algorithm model with a specific model architecture is adopted, text data of a to-be-processed salient text description area is analyzed, and efficient and accurate extraction and output of answers to questions are realized.

Referring to fig. 2, the NLP algorithm model adopted in the embodiment of the present application includes m (m is greater than or equal to 2) parsing modules, where each parsing module is sequentially connected in series (i.e., the output of the previous parsing module is used as the input of the next parsing module), and the input of the first parsing module is the text data of the text description area of the drawing to be processed, and the text data is sequentially input into the NLP algorithm model in units of rows to implement parsing processing.

An analysis module is composed of an LSTM long-term memory network, 1 Multi-Head Attention network, 1 Layernorm layer normalization network and 1 Conv1 x 1 convolution kernel which are connected in series in sequence.

The LayerNorm layer normalization network normalizes all the characteristics of each sample, and the training process of the model can be accelerated by normalizing the activation value of the layer, so that the model converges more quickly.

For Multi-Head Attention network and Layernorm layer normalization network, the analytic effect can not be guaranteed to be forward. In this regard, in the embodiment of the present application, the output of the LSTM long-term memory network is transmitted to the Conv1 x 1 convolution kernel through an ADD feature fusion network, so that a part of the output is transformed through the Multi-Head Attention network and the layerrnorm layer normalization network, another part of the output is directly transmitted to the next Conv1 x 1 convolution kernel, and the results of the two parts are added to be used as the input of the next layer, so that at least information of the LSTM layer can be ensured to be retained, and the performance of the model can be effectively improved.

Experiments prove that when m is within a range of 5 to 15, the NLP algorithm model can better extract the high semantic information of the text abstract features.

S103, inputting second to-be-processed data corresponding to the form area into a multi-mode algorithm model, and outputting second extraction information containing a plurality of question answer pairs; the second data to be processed includes data corresponding to the set second data type in the table area.

The second data type is set to include a text data type and a form layout image data type. The text data can be obtained through the graphic element export mode, and the table layout image data can be obtained through intercepting a table area so as to be converted into formats such as pictures, PDFs and the like.

The form data in the drawing is various in form, has very rich and valuable aesthetic information, and in order to improve the text information extraction efficiency of the forms of various types and various layout typesets, the method and the device finish the output of the aesthetic information of the forms of different types by using the multi-mode algorithm model.

Referring to fig. 3, the multi-modal algorithm model includes an encoder and a decoder, wherein the encoder is composed of a Tranformer block module, a Mobile-ViT (Mobile Vision Transformer) network, a PAN semantic segmentation network and a Concat feature fusion network; the encoder comprises two branches, wherein one branch is formed by sequentially connecting n (n is more than or equal to 2) Tranformer block modules in series; the other branch is composed of a Mobile-ViT network and a PAN semantic segmentation network; and performing feature fusion coding on the output of the two branches through a Concat feature fusion network.

On the premise of ensuring that global characters can be advanced, the value range of the number n of the Tranformer blocks is 4-8 in order to improve the running speed of the model. The actual measurement data prove that when n is smaller than 4, the long text feature vector cannot advance the global text, and when n is larger than 8, the model volume is larger, so that the running speed is low.

Transformer block is insensitive to the position information of the table, and when matching answers to questions in the table, the position information of the cells is important, so that Mobile-ViT is introduced to solve the problem for position coding.

The decoder obtains second extraction information by carrying out prediction processing on the output of the Concat feature fusion network through K Bi-LSTM bidirectional long-short-term memory networks (with K being more than or equal to 2) and P (with P being more than or equal to 2) CRF layers (the CRF layers ensure that final prediction results are effective by adding some constraints, and the constraints can be automatically learned by the CRF layers when training data). The second extracted information contains a number of question answer pairs.

The number K of Bi-LSTM is in the range of 2-6, and the actual measurement data prove that when K is equal to 1, the decoding information is incomplete, and when the number K is greater than 6, the decoding has redundancy. The number K of Bi-LSTM corresponds to the number P of CRF layers, and the values of the Bi-LSTM and the CRF layers are kept the same.

According to the method, text semantic features are extracted through a plurality of layers of Tranformer blocks, a backbone network is formed by Mobile-ViT and PAN (personal area network) in the universal information of the text of the detection form, and multi-scale rich visual features of the form are extracted; performing feature fusion coding through a Concat layer; and then, decoding information through Bi-LSTM and CRF layers to finally obtain a form analysis result containing a plurality of question answer pairs, and realizing an end-to-end structuring task (namely form text recognition, text classification and semantic matching of the question answer pairs).

In an alternative embodiment of the present application, because the construction engineer has a special case of directly representing the text with multiple line segments in the process of drawing, when the graphic element is exported, the text information is changed into line segment information, so that the text information is missing, and effective classification and QA pair matching cannot be performed. In this regard, OCR may be directly employed to extract the drawing text information.

In an alternative embodiment of the present application, for all the texts included in the table area, if there is an abnormal text in the output result that the text is not classified or the QA pair matching is not achieved, the OCR is adopted to redetect the recognition text in the table area (the OCR may need to convert the drawing to be processed into a picture format), and then the multi-modal algorithm model is used again for parsing.

It should be understood that step S102 and step S103 may be processed serially or simultaneously in parallel.

S104, based on the first extraction information and the second extraction information, obtaining a final extraction result of the text information of the drawing to be processed, and outputting and displaying.

In an alternative embodiment of the present application, QA pairs output by the multi-modal algorithm model may also be checked to determine whether a situation of a false match is possible for the multi-modal output result. And the text information extraction accuracy is improved.

Specifically, cell coordinates corresponding to the question and the answer in the QA alignment are obtained (wherein the cell coordinates are obtained through multi-mode algorithm analysis), whether the positions of the two coordinates meet a preset relationship is judged, for example, the difference value of cell row coordinate values corresponding to the two coordinates is calculated, and the difference value of cell column coordinate values corresponding to the two coordinates is calculated, if the sum of the difference value of the two row coordinates and the difference value of the column coordinates is equal to 1, the positions of the two coordinates meet the preset relationship, and the output QA alignment is judged to be correct; if the sum of the difference values of the row coordinates and the column coordinates is not equal to 1, that is, the sum of the difference values of the row coordinates and the column coordinates is not equal to 2 (because the difference values of the row coordinates and the column coordinates cannot be in one cell), the sum of the difference values of the row coordinates and the difference values of the column coordinates cannot be equal to 0), which indicates that the cell where the question and the answer are located does not belong to the adjacent cell, so that the risk of incorrect matching of the QA pair is judged, and the QA pair is marked as an abnormal QA pair and sent to manual checking and confirmation, so that the accuracy of extracting the text information of the table is improved.

The first extraction information and the second extraction information are displayed in columns (divided into a text description area column and a table area column), and simultaneously, focus highlighting (such as fonts and color distinguishing) or prompting (such as prompting for abnormal QA pairs) is carried out for the QA pairs marked as abnormal in the second extraction information, so that the targeted examination of the image-examining personnel is facilitated, and the image-examining efficiency is improved.

And extracting text areas, form areas and image areas from the drawing, wherein the areas are important expression information of the drawing. In particular, the form area and the text area express drawing information are relatively large. In the prior art, text relation information is extracted through regularization, and as the layout styles of the drawing text areas and the table areas are relatively large, the regularization standard also needs to sort data, and the time and labor cost are relatively high. The text area and the table area are both text descriptions, but the text position information is also important except the text content when the table structured information is output. The current technology requires writing multiple matching rules according to each type of table, resulting in a relatively inefficient. The text area of the drawing can be shown by referring to fig. 4, some text descriptions are longer, text keywords and corresponding text relations are changed, the effective extraction is difficult to be carried out by the NLP technology at present, and the problem of higher relation matching error rate exists. Referring to fig. 5, the form area format is complex and changeable, and it can be seen from the figure that there is a lot of line interference, and there is a great problem in identifying the form row, but it is easy to resolve the error in identifying the column.

Aiming at the drawing to be processed, carrying out region division on the drawing to be processed based on the difference of drawing information expression modes to obtain a division result of a text description region and a form region; a difference may exist based on the text semantic association features of each region; aiming at the table area, the multi-mode algorithm is used for completing the extraction of key information of different types of tables, classifying each detected text into questions, answers and the like, and then finding corresponding answers for each text building standard question. And describing text information according to the text region, and extracting through NLP technology relation. The efficiency and the accuracy of text relation extraction are improved.

Based on the same design concept, the embodiment also discloses a drawing text information extraction device.

Referring to fig. 6, a drawing text information extraction apparatus includes:

the obtaining module 61 is used for obtaining a region division result of the drawing to be processed; the area includes a text description area and a form area.

The NLP algorithm model 62 is configured to predict first data to be processed corresponding to the text description area, and output first extraction information including a plurality of answer pairs of questions; the first data to be processed includes data corresponding to the set first data type in the text description area.

The multi-mode algorithm model 63 is used for performing prediction processing on second data to be processed corresponding to the table area, and outputting second extraction information containing a plurality of question answer pairs; the second data to be processed includes data corresponding to the set second data type in the table area.

The comprehensive processing module 64 is configured to obtain a final text information extraction result of the drawing to be processed based on the first extraction information and the second extraction information, and output a display.

The various modifications and specific examples of the method provided in the foregoing embodiment are also applicable to the drawing text information extraction device of this embodiment, and those skilled in the art can clearly know the implementation method of the drawing text information extraction device of this embodiment through the foregoing detailed description of the drawing text information extraction method, so that the description is omitted herein for brevity.

In order to better execute the program of the above method, the embodiment of the present application further provides a drawing text information extraction apparatus, as shown in fig. 7, including a processor 71 and a memory 72.

The drawing text information extraction device may be implemented in various forms including a mobile phone, a tablet computer, a palm computer, a notebook computer, a desktop computer, and the like.

Wherein the memory may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as selecting a marker layer of a known well, determining a candidate region of a target well, and dividing a marker formation of the target well according to a formation boundary value, etc.), and instructions for implementing the intelligent marker formation localization division method provided in the above embodiments; the storage data area may store data and the like involved in the intelligent marker stratum positioning and dividing method provided in the above embodiment.

The processor may include one or more processing cores. The processor performs the various functions of the present application and processes the data by executing or executing instructions, programs, code sets, or instruction sets stored in memory, calling data stored in memory. The processor may be at least one of an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable Gate Array, FPGA), a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic device for implementing the above-mentioned processor function may be other for different apparatuses, and embodiments of the present application are not specifically limited.

Embodiments of the present application provide a computer-readable storage medium, for example, comprising: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes. The computer readable storage medium stores a computer program that can be loaded by a processor and that performs the intelligent marker strata positioning partitioning method of the above-described embodiment.

The foregoing embodiments are only used for describing the technical solution of the present application in detail, but the descriptions of the foregoing embodiments are only used for helping to understand the method and the core idea of the present application, and should not be construed as limiting the present application. Variations or alternatives that are readily contemplated by those skilled in the art within the scope of the present disclosure are intended to be encompassed within the scope of the present disclosure.

Claims

1. The drawing text information extraction method is characterized by comprising the following steps of:

inputting first data to be processed corresponding to the text description area into an NLP algorithm model, and outputting first extraction information containing a plurality of question answer pairs; the first data to be processed comprises data corresponding to a set first data type in the text description area;

2. The drawing text information extraction method according to claim 1, wherein the model architecture of the NLP algorithm model comprises m LSTM long-short-term memory networks, m Multi-Head Attention networks, m layerrnorm layer normalization networks, m Conv1 x 1 convolution kernels and m ADD feature fusion networks; and m is a natural number greater than or equal to 2.

3. The drawing text information extraction method according to claim 2, wherein the model architecture of the NLP algorithm model is formed by sequentially connecting m parsing modules in series, and one parsing module comprises one LSTM long short-term memory network, 1 Multi-Head Attention network, 1 layerrorm layer normalization network and 1 Conv1 x 1 convolution kernel which are sequentially connected in series; the output of the LSTM long-short-term memory network is transmitted to the Conv1 x 1 convolution kernel through an ADD feature fusion network without difference; and m is [5,15].

4. The drawing text information extraction method according to claim 1, wherein the multi-modal algorithm model comprises an encoder composed of a Tranformer block module, a Mobile-ViT network, a PAN semantic segmentation network and a Concat feature fusion network, and a decoder composed of a Concat feature fusion network, a Bi-LSTM bidirectional long-short-term memory network and a CRF layer.

5. The drawing text information extraction method of claim 4, wherein the encoder comprises two branches, wherein one branch is formed by sequentially connecting n Tranformer blocks in series; the other branch is composed of a Mobile-ViT network and a PAN semantic segmentation network; and carrying out feature fusion coding on the output of the two branches through the Concat feature fusion network.

6. The drawing text information extraction method according to claim 5, wherein the decoder obtains the second extraction information by sequentially performing prediction processing on the output of the Concat feature fusion network through K Bi-LSTM two-way long-short-term memory networks and P CRF layers; the K is a natural number greater than or equal to 2, and the P is a natural number greater than or equal to 2; the K is equal to the P.

7. The drawing text information extraction method according to any one of claims 1 to 6, wherein the set first data type includes a text data type, and the set second data type includes a text data type and a form layout image data type.

8. The method for extracting text information from a drawing according to any one of claims 1 to 6, wherein obtaining a final text information extraction result of the drawing to be processed based on the first extraction information and the second extraction information includes:

9. The drawing word information extraction element, characterized by that, include:

10. A drawing text information extraction apparatus comprising a processor, a memory and a computer program stored in the memory and executable on the processor, the processor implementing the drawing text information extraction method according to any one of claims 1 to 7 when executing the computer program.