CN111160157B

CN111160157B - Text extraction method based on DWG drawing and related products

Info

Publication number: CN111160157B
Application number: CN201911304280.4A
Authority: CN
Inventors: 张泽斌; 张华安; 张健
Original assignee: Shenzhen Wanyi Digital Technology Co ltd
Current assignee: Shenzhen Wanyi Digital Technology Co ltd
Priority date: 2019-12-17
Filing date: 2019-12-17
Publication date: 2023-08-08
Anticipated expiration: 2039-12-17
Also published as: CN111160157A

Abstract

The embodiment of the application discloses a text extraction method based on a DWG drawing and a related product, which are applied to electronic equipment, wherein the method comprises the following steps: acquiring a DWG drawing, and determining a title bar area of the DWG drawing; determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword; determining a first rectangular area in the DWG drawing according to the keyword coordinates; and determining a target text according to the first rectangular area, and extracting the target text. The embodiment of the application has the advantage of high user experience.

Description

Text extraction method based on DWG drawing and related products

Technical Field

The application relates to the technical field of electronics, in particular to a text extraction method based on a DWG drawing and related products.

Background

With the rapid development of electronic devices, more and more designers draw engineering design drawings through the electronic devices, wherein DWG format is generally used to store drawings when drawing engineering design drawings, and DWG is a proprietary file format used for computer aided design software AutoCAD and software based on AutoCAD to store design data.

When extracting text information in a DWG file, the DWG file is usually required to be opened by design software AutoCAD or software based on AutoCAD, and related drawing information is manually recorded by a designer, so that the whole text information extraction operation is complex and tedious, the period is long, the text extraction efficiency is low, and the user experience is not high.

Disclosure of Invention

The embodiment of the application provides a text extraction method based on a DWG drawing and a related product, wherein a target keyword is determined in the DWG drawing, a first rectangular area is determined based on the keyword coordinates of the target keyword, a target text is extracted based on the first rectangular area, the text information extraction flow in the DWG drawing is simplified, the extraction period is shortened, the text extraction efficiency is improved, and the user experience is improved.

In a first aspect, an embodiment of the present application provides a text extraction method based on DWG drawings, which is applied to an electronic device, and the method includes:

acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword;

determining a first rectangular area in the DWG drawing according to the keyword coordinates;

And determining a target text according to the first rectangular area, and extracting the target text.

In a second aspect, an embodiment of the present application provides a text extraction device based on DWG drawings, which is applied to an electronic device, and the device includes:

the acquisition unit is used for acquiring the DWG drawing and determining a title bar area of the DWG drawing;

a first determining unit, configured to determine a target keyword in the title bar area, and obtain a keyword coordinate of the target keyword;

the second determining unit is used for determining a first rectangular area in the DWG drawing according to the keyword coordinates;

and the extraction unit is used for determining a target text according to the first rectangular area and extracting the target text.

In a third aspect, an embodiment of the present application provides an electronic device, including a controller, a memory, a communication interface, and one or more programs, where the one or more programs are stored in the memory and configured to be executed by the controller, the programs including instructions for performing steps in any of the methods of the first aspect of the embodiments of the present application.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium, where the computer-readable storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to perform some or all of the steps as described in any of the methods of the first aspect of the embodiments of the present application.

In a fifth aspect, embodiments of the present application provide a computer program product, wherein the computer program product comprises a non-transitory computer readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps described in any of the methods of the first aspect of embodiments of the present application. The computer program product may be a software installation package.

It can be seen that, in the embodiment of the present application, an electronic device obtains a DWG drawing, and determines a title bar area of the DWG drawing; determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword; determining a first rectangular area in the DWG drawing according to the keyword coordinates; and determining a target text according to the first rectangular area, and extracting the target text. Therefore, the target keyword is determined in the DWG drawing, the first rectangular area is determined based on the keyword coordinates of the target keyword, the target text is extracted based on the first rectangular area, the text information extraction flow in the DWG drawing is simplified, the extraction period is shortened, the text extraction efficiency is improved, and the user experience is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic flow chart of a text extraction method based on DWG drawings according to an embodiment of the disclosure;

fig. 2 is a schematic flow chart of another text extraction method based on DWG drawings according to the embodiment of the application;

fig. 3 is a schematic flow chart of another text extraction method based on DWG drawings according to the embodiment of the application;

fig. 4 is a schematic flow chart of another text extraction method based on DWG drawings according to the embodiment of the application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application;

fig. 6 is a functional unit composition block diagram of a text extraction device based on DWG drawings according to an embodiment of the application.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The terms "first," "second," "third," and "fourth" and the like in the description and in the claims and drawings are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, result, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The electronic devices may include various handheld devices, vehicle mounted devices, wearable devices (e.g., smart watches, smart bracelets, pedometers, etc.), computing devices or other processing devices communicatively coupled to wireless modems, as well as various forms of User Equipment (UE), mobile Stations (MSs), terminal devices (terminal devices), etc. For convenience of description, the above-mentioned devices are collectively referred to as electronic devices.

Referring to fig. 1, fig. 1 is a flow chart of a text extraction method based on a DWG drawing, provided in an embodiment of the present application, and the text extraction method based on the DWG drawing is applied to an electronic device, and includes:

step 101, acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

optionally, obtaining the DWG drawing may include: receiving a drawing transmission request sent by a server, wherein the drawing transmission request comprises the following steps: and the DWG drawing is transmitted and requested to be used for requesting the electronic equipment to receive the DWG drawing transmitted by the server.

Optionally, obtaining the DWG drawing may include: starting a DWG drawing acquisition module, wherein the DWG drawing acquisition module is used for sending a drawing acquisition instruction to a preset mobile terminal, the drawing acquisition instruction is used for indicating the preset mobile terminal to send a DWG drawing to the electronic equipment and receiving a drawing acquisition response returned by the preset mobile terminal, and the drawing acquisition response comprises: the DWG drawing.

Optionally, determining the title bar region of the DWG drawing includes obtaining a preset title bar region coordinate set, where the title bar region coordinate set includes: and determining the title bar region in the DWG drawing according to the title bar region coordinate set.

In the embodiment of the present application, the DWG format is a proprietary file format used for storing design data by AutoCAD and AutoCAD-based software, and the DWG file is a file stored in DWG format.

The above-mentioned manner of enabling the DWG drawing acquisition module by the terminal may be varied, for example, in an alternative embodiment, a specific button may be used to determine whether to simultaneously enable the DWG drawing acquisition module. Of course, in another alternative embodiment, the DWG drawing acquisition module may be activated by satisfying a set trigger condition, which may be a specific operation to determine whether to activate the DWG drawing acquisition module, including but not limited to, a specific gesture, or biometric verification, including but not limited to: face recognition verification, fingerprint recognition verification, vein recognition verification, and the like. The embodiments of the present application are not limited to the above-described scheme for starting the DWG drawing acquisition module.

102, determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword;

optionally, a plurality of text primitives are determined in the title bar area, and the target keyword is determined according to the plurality of text primitives.

Therein, the primitive file (Windows Metafile, wmf) is a graphic file format under a Windows platform defined by microsoft corporation, and herein the primitive represents a primitive file for storing text data.

Step 103, determining a first rectangular area in the DWG drawing according to the keyword coordinates;

optionally, before determining the first rectangular area in the DWG drawing according to the keyword coordinates, the method further includes: obtaining rectangular range data corresponding to the target keyword, wherein the rectangular range data comprises: and determining the first rectangular region by combining the rectangular range data according to the rectangular long-side data and the rectangular short-side data and the key word coordinate as the center.

The method is particularly implemented in the process. Assuming that the target keyword is a 'picture name', acquiring a mapping relation between a preset keyword and rectangular range data, and determining that rectangular range data corresponding to the target keyword 'picture name' is: rectangular long-side data 6, rectangular short-side data 4, confirm the keyword coordinate that this target keyword corresponds to (3, 4), then regard keyword coordinate as the center, combine this rectangle scope data can confirm this first rectangle regional, four summit of this first rectangle regional include respectively: (0, 6), (6, 6), (0, 2) and (6, 2).

And 104, determining a target text according to the first rectangular area, and extracting the target text.

Optionally, after extracting the target text, storing the target text in a preset database, and sending a text return request to a server transmitting the DWG file, where the text return request is used to request the server to receive the target text.

In one possible example, the determining the target keyword in the title bar area includes: acquiring a plurality of text primitives from the title bar area; obtaining a preset keyword set, wherein the keyword set comprises: at least one keyword, executing a keyword matching algorithm on the plurality of text primitives according to the keyword set; and if the text primitives are successfully matched with the keyword set, determining that the successfully matched keywords are the target keywords.

Wherein, any one keyword in the keyword set is used for representing drawing information of the DWG drawing, and the keyword set may include: the drawing name, drawing content, drawing number, print, version number, version, etc., are not limited herein.

Optionally, performing keyword matching on the plurality of text primitives according to the keyword set includes: and executing keyword matching operation on each text graphic primitive in the plurality of text graphic primitives, wherein the keyword matching operation comprises the steps of determining any text graphic primitive in the plurality of text graphic primitives as a first text graphic primitive, acquiring the keyword set, acquiring text data contained in the text graphic primitive, sequentially matching the text data with at least one keyword in the keyword set, determining that the text graphic primitive is successfully matched with the keyword set if the text data contains any keyword in the keyword set, and determining that the successfully matched keyword is a target keyword.

Optionally, a preset keyword matching model is obtained, the text primitives and the keyword matching set are used as input of the keyword matching model, and a plurality of target keywords corresponding to the text primitives are obtained.

In one possible example, the method further comprises: if the matching of the text primitives with the keyword set is unsuccessful; generating a plurality of text images according to the plurality of text primitives, executing a keyword searching algorithm based on optical character recognition on the plurality of text images according to the keyword set, and determining the target keywords.

Optionally, obtaining a plurality of text primitive coordinate sets corresponding to the plurality of text primitives, where any one of the plurality of text primitive coordinate sets includes: the method comprises the steps of determining positions of a plurality of text primitives through a plurality of text primitive coordinate sets, and intercepting a plurality of text images corresponding to the plurality of text primitive coordinate sets in a DWG drawing, wherein the plurality of text images are in one-to-one correspondence with the plurality of text primitives.

Further, performing an optical character recognition based keyword search algorithm on the plurality of text images according to the keyword set, comprising: acquiring a preset optical character recognition algorithm, executing the optical character recognition algorithm on the plurality of text images to obtain a plurality of text contents corresponding to the plurality of text images, executing keyword searching operation on the plurality of text contents according to the keyword set, and determining keywords contained in the plurality of text contents as target keywords.

In a possible example, the determining the target text according to the first rectangular area includes: judging whether the first rectangular area contains text data or not; and if so, acquiring the text data in the first rectangular area as the target text.

Optionally, a first rectangular image corresponding to the first rectangular area is intercepted in the DWG drawing, a preset text detection model is obtained, the first rectangular image is used as input of the text detection model, a detection result corresponding to the first rectangular image is obtained, whether the first rectangular area contains text data or not is judged according to the detection result, if the first rectangular area contains text data, a text format corresponding to the target keyword is obtained, the text data is obtained from the second rectangular area, a data format corresponding to the text data is extracted, whether the text format is consistent with the data format is judged, if the text format is inconsistent with the data format, the text data is determined to be in an invalid state, a second rectangular area is determined according to the first rectangular area, and the judgment operation is executed for the second rectangular area; if the text format is consistent with the data format, determining that the text data is in an effective state, acquiring a preset text recognition model, taking the first rectangular image as input of the text recognition model, obtaining text content corresponding to the second rectangular image, comparing the text content with the target keyword, if the text content is inconsistent with the target keyword, determining that the text data is in an effective state, determining that the text content is a target text corresponding to the first rectangular area, and extracting the text content. In one possible example, the method further comprises: if the first rectangular area does not contain the first rectangular area, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area; judging whether the second rectangular area contains text data or not; if so, extracting a text to be detected from the second rectangular area; determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format; acquiring a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested; and if the comparison is successful, determining the text to be detected as the target text.

Optionally, a preset search step is obtained, where the search step may include: 6. 8, 10, etc., without limitation, a second rectangular area is determined based on the cable step and the first rectangular area, i.e., a second rectangular area is determined below the first rectangular area at a distance from the first rectangular area search step.

Optionally, if the first rectangular area does not include text data according to the detection result, determining a second rectangular area for the first rectangular area, intercepting a second rectangular image corresponding to the first rectangular area in the DWG file, obtaining a preset text detection model, obtaining a detection result corresponding to the second rectangular image, judging whether the second rectangular area includes text data according to the detection result, if the second rectangular area includes text data, extracting a text to be detected from the second rectangular area, obtaining a text format corresponding to the target keyword, extracting a format to be detected corresponding to the text to be detected, judging whether the text format is consistent with the format to be detected, if the text format is inconsistent with the data format, determining that the text to be detected is in an invalid state, determining a third rectangular area according to the first rectangular area and the second rectangular area, and executing the judging operation according to the third rectangular area; if the text format is consistent with the format to be detected, determining that the text to be detected is in an effective state, acquiring a preset text recognition model, taking the second rectangular image as input of the text recognition model, acquiring text content corresponding to the second rectangular image, and determining that the text content is the target text.

Referring to fig. 2, fig. 2 is a flow chart of another text extraction method based on DWG drawings provided in an embodiment of the present application, which is applied to an electronic device, and the text extraction method based on DWG drawings includes:

step 201, acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

step 202, acquiring a plurality of text primitives from the title bar area;

step 203, acquiring a preset keyword set, wherein the keyword set includes: at least one keyword, executing a keyword matching algorithm on the plurality of text primitives according to the keyword set;

Step 204, if the matching of the text primitives with the keyword set is successful, determining that the keyword which is successfully matched is the target keyword, and obtaining the keyword coordinates of the target keyword;

step 205, determining a first rectangular area in the DWG drawing according to the keyword coordinates;

and 206, determining a target text according to the first rectangular area, and extracting the target text.

The specific description of the steps 201 to 206 may refer to the corresponding steps of the text extraction method based on DWG drawing described in fig. 1, and will not be repeated herein.

It can be seen that, in the embodiment of the present application, an electronic device obtains a DWG drawing, and determines a title bar area of the DWG drawing; acquiring a plurality of text primitives from the title bar area; obtaining a preset keyword set, wherein the keyword set comprises: at least one keyword, executing a keyword matching algorithm on the plurality of text primitives according to the keyword set; if the text primitives are successfully matched with the keyword set, determining that the successfully matched keywords are the target keywords, and acquiring keyword coordinates of the target keywords; determining a first rectangular area in the DWG drawing according to the keyword coordinates; and determining a target text according to the first rectangular area, and extracting the target text. Therefore, the target keywords are determined by executing the keyword matching algorithm on the text primitives, the first rectangular area is determined based on the keyword coordinates of the target keywords, the target text is extracted based on the first rectangular area, the text information extraction flow in the DWG drawing is simplified, the extraction period is shortened, the text extraction efficiency is improved, and the user experience is improved.

Referring to fig. 3, fig. 3 is a flow chart of another text extraction method based on DWG drawings provided in an embodiment of the application, which is applied to an electronic device, and the text extraction method based on DWG drawings includes:

step 301, acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

step 302, determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword;

step 303, determining a first rectangular area in the DWG drawing according to the keyword coordinates;

step 304, judging whether the first rectangular area contains text data;

and 305, if the text data in the first rectangular area is included, acquiring the text data as the target text, and extracting the target text.

The specific description of the steps 301 to 305 may refer to the corresponding steps of the text extraction method based on DWG drawing described in fig. 1, and will not be repeated herein.

It can be seen that, in the embodiment of the present application, an electronic device obtains a DWG drawing, and determines a title bar area of the DWG drawing; determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword; determining a first rectangular area in the DWG drawing according to the keyword coordinates; judging whether the first rectangular area contains text data or not; and if the text data in the first rectangular area is included, acquiring the text data in the first rectangular area as the target text, and extracting the target text. Therefore, the target keyword is determined in the title bar area, the first rectangular area is determined based on the keyword coordinates of the target keyword, the target text is extracted based on the first rectangular area, the text information extraction flow in the DWG drawing is simplified, the extraction period is shortened, the text extraction efficiency is improved, and the user experience is improved.

Referring to fig. 4, fig. 4 is a flow chart of another text extraction method based on DWG drawings provided in an embodiment of the application, which is applied to an electronic device, and the text extraction method based on DWG drawings includes:

step 401, acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

step 402, determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword;

step 403, determining a first rectangular area in the DWG drawing according to the keyword coordinates;

step 404, judging whether the first rectangular area contains text data;

step 405, if not, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area;

step 406, judging whether the second rectangular area contains text data;

step 407, if the text to be detected is included, extracting the text to be detected from the second rectangular area, and determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format;

step 408, obtaining a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested;

And 409, if the comparison is successful, determining the text to be detected as the target text, and extracting the target text.

The specific description of the steps 401 to 409 may refer to the corresponding steps of the text extraction method based on DWG drawing described in fig. 1, and will not be repeated herein.

It can be seen that, in the embodiment of the present application, an electronic device obtains a DWG drawing, and determines a title bar area of the DWG drawing; determining a target keyword in the title bar area, and acquiring keyword coordinates of the target keyword; determining a first rectangular area in the DWG drawing according to the keyword coordinates; judging whether the first rectangular area contains text data or not; if the first rectangular area does not contain the first rectangular area, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area; judging whether the second rectangular area contains text data or not; if the method comprises the steps of extracting a text to be detected from the second rectangular area, and determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format; acquiring a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested; if the comparison is successful, determining the text to be detected as the target text, and extracting the target text. Therefore, the target keyword is determined in the DWG drawing, the first rectangular area is determined based on the keyword coordinates of the target keyword, the target text is extracted based on the first rectangular area, the text information extraction flow in the DWG drawing is simplified, the extraction period is shortened, the text extraction efficiency is improved, and the user experience is improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present application, as shown in the drawing, the electronic device 500 includes: an application processor 510, a memory 520, a communication interface 530, and one or more programs 521, wherein the one or more programs 521 are stored in the memory 520 and configured to be executed by the application processor 510, the one or more programs 521 comprising instructions for performing the steps of: the embodiment of the application also provides a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to execute part or all of the steps of any one of the methods described in the embodiments of the method, where the computer includes an electronic device.

Acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

In one possible example, in said determining a target keyword in said title bar area, the instructions in said program are specifically for: acquiring a plurality of text primitives from the title bar area; obtaining a preset keyword set, wherein the keyword set comprises: at least one keyword, executing a keyword matching algorithm on the plurality of text primitives according to the keyword set; and if the text primitives are successfully matched with the keyword set, determining that the successfully matched keywords are the target keywords.

In one possible example, the instructions in the program are further for performing the following: if the matching of the text primitives with the keyword set is unsuccessful; generating a plurality of text images according to the plurality of text primitives, executing a keyword searching algorithm based on optical character recognition on the plurality of text images according to the keyword set, and determining the target keywords.

In one possible example, in the determining the target text according to the first rectangular area, the instructions in the program are specifically configured to: judging whether the first rectangular area contains text data or not; and if so, acquiring the text data in the first rectangular area as the target text.

In one possible example, the instructions in the program are further for performing the following: if the first rectangular area does not contain the first rectangular area, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area; judging whether the second rectangular area contains text data or not; if so, extracting a text to be detected from the second rectangular area; determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format; acquiring a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested; and if the comparison is successful, determining the text to be detected as the target text.

The foregoing description of the embodiments of the present application has been presented primarily in terms of a method-side implementation. It will be appreciated that the electronic device, in order to achieve the above-described functions, includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The embodiment of the application may divide the functional units of the electronic device according to the above method example, for example, each functional unit may be divided corresponding to each function, or two or more functions may be integrated in one control unit. The integrated units may be implemented in hardware or in software functional units. It should be noted that, in the embodiment of the present application, the division of the units is schematic, which is merely a logic function division, and other division manners may be implemented in actual practice.

Fig. 6 is a functional unit block diagram of a DWG drawing-based text extraction apparatus 600 according to an embodiment of the application. The text extraction device 600 based on DWG drawings is applied to an electronic apparatus, the text extraction device 600 based on DWG drawings includes a first obtaining unit 601, a first determining unit 602, a second determining unit 603, and an extracting unit 604, wherein:

a first obtaining unit 601, configured to obtain a DWG drawing, and determine a title bar area of the DWG drawing;

a first determining unit 602, configured to determine a target keyword in the title bar area, and obtain keyword coordinates of the target keyword;

a second determining unit 603, configured to determine a first rectangular area in the DWG drawing according to the keyword coordinates;

and the extracting unit 604 is used for determining a target text according to the first rectangular area and extracting the target text.

In one possible example, in the aspect of determining the target keyword in the title bar area, the first determining unit 602 is specifically configured to: acquiring a plurality of text primitives from the title bar area; obtaining a preset keyword set, wherein the keyword set comprises: at least one keyword, executing a keyword matching algorithm on the plurality of text primitives according to the keyword set; and if the text primitives are successfully matched with the keyword set, determining that the successfully matched keywords are the target keywords.

In a possible example, the first determining unit 602 is further configured to: if the matching of the text primitives with the keyword set is unsuccessful; generating a plurality of text images according to the plurality of text primitives, executing a keyword searching algorithm based on optical character recognition on the plurality of text images according to the keyword set, and determining the target keywords.

In one possible example, in the aspect of determining the target text according to the first rectangular area, the extracting unit 604 is specifically configured to: judging whether the first rectangular area contains text data or not; and if so, acquiring the text data in the first rectangular area as the target text.

In a possible example, the extracting unit 604 is specifically configured to: if the first rectangular area does not contain the first rectangular area, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area; judging whether the second rectangular area contains text data or not; if so, extracting a text to be detected from the second rectangular area; determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format; acquiring a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested; and if the comparison is successful, determining the text to be detected as the target text.

The embodiment of the application also provides a computer storage medium, where the computer storage medium stores a computer program for electronic data exchange, where the computer program causes a computer to execute part or all of the steps of any one of the methods described in the embodiments of the method, where the computer includes an electronic device.

Embodiments of the present application also provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program operable to cause a computer to perform some or all of the steps of any one of the methods described in the method embodiments above. The computer program product may be a software installation package, said computer comprising an electronic device.

It should be noted that, for simplicity of description, the foregoing method embodiments are all expressed as a series of action combinations, but it should be understood by those skilled in the art that the present application is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required in the present application.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, such as the above-described division of units, merely a division of logic functions, and there may be additional manners of dividing in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, or may be in electrical or other forms.

The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on such understanding, the technical solution of the present application may be embodied in essence or a part contributing to the prior art or all or part of the technical solution in the form of a software product stored in a memory, including several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the above-mentioned method of the various embodiments of the present application. And the aforementioned memory includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

Those of ordinary skill in the art will appreciate that all or a portion of the steps in the various methods of the above embodiments may be implemented by a program that instructs associated hardware, and the program may be stored in a computer readable memory, which may include: flash disk, read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), magnetic disk or optical disk.

The foregoing has outlined rather broadly the more detailed description of embodiments of the present application, wherein specific examples are provided herein to illustrate the principles and embodiments of the present application, the above examples being provided solely to assist in the understanding of the methods of the present application and the core ideas thereof; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A text extraction method based on DWG drawings, which is applied to electronic equipment, the method comprising:

acquiring a DWG drawing, and determining a title bar area of the DWG drawing;

determining a target keyword in the title bar area, which specifically comprises the following steps: acquiring a plurality of text primitives from the title bar area; obtaining a preset keyword set, wherein the keyword set comprises: at least one keyword; determining any one text graphic element from the plurality of text graphic elements as a first text graphic element, acquiring text data contained in the first text graphic element, sequentially matching the text data with at least one keyword in the keyword set, and if the text data contains any one keyword in the keyword set, determining that the first text graphic element is successfully matched with the keyword set, and determining that the successfully matched keyword is a target keyword; if the matching of the plurality of text primitives with the keyword set is unsuccessful, acquiring a plurality of text primitive coordinate sets corresponding to the plurality of text primitives, wherein any one of the text primitive coordinate sets comprises: the method comprises the steps of determining positions of a plurality of text primitives through a plurality of text primitive coordinate sets, and intercepting a plurality of text images corresponding to the text primitive coordinate sets in a DWG drawing; acquiring a preset optical character recognition algorithm, executing the optical character recognition algorithm on the plurality of text images to obtain a plurality of text contents corresponding to the plurality of text images, executing keyword searching operation on the plurality of text contents according to the keyword set, and determining keywords contained in the plurality of text contents as target keywords;

Acquiring keyword coordinates of the target keywords;

determining a target text according to the first rectangular area, and extracting the target text comprises the following steps: intercepting a first rectangular image corresponding to the first rectangular area from the DWG drawing, acquiring a preset text detection model, and taking the first rectangular image as input of the text detection model to acquire a detection result corresponding to the first rectangular image; judging whether the first rectangular area contains text data according to the detection result, if the first rectangular area contains text data, acquiring a text format corresponding to the target keyword, extracting a data format corresponding to the text data, judging whether the text format is consistent with the data format, and if the text format is inconsistent with the data format, determining that the text data is in an invalid state; if the first rectangular area does not contain text data, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area; judging whether the second rectangular area contains text data or not; if so, extracting a text to be detected from the second rectangular area; determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format, acquiring a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested; and if the comparison is successful, determining the text to be detected as the target text.

2. A text extraction device based on DWG drawings, applied to an electronic device, the device comprising:

a first determining unit, configured to determine a target keyword in the title bar area, specifically configured to: acquiring a plurality of text primitives from the title bar area; obtaining a preset keyword set, wherein the keyword set comprises: at least one keyword; determining any one text graphic element from the plurality of text graphic elements as a first text graphic element, acquiring text data contained in the first text graphic element, sequentially matching the text data with at least one keyword in the keyword set, and if the text data contains any one keyword in the keyword set, determining that the first text graphic element is successfully matched with the keyword set, and determining that the successfully matched keyword is a target keyword; if the matching of the plurality of text primitives with the keyword set is unsuccessful, acquiring a plurality of text primitive coordinate sets corresponding to the plurality of text primitives, wherein any one of the text primitive coordinate sets comprises: the method comprises the steps of determining positions of a plurality of text primitives through a plurality of text primitive coordinate sets, and intercepting a plurality of text images corresponding to the text primitive coordinate sets in a DWG drawing; acquiring a preset optical character recognition algorithm, executing the optical character recognition algorithm on the plurality of text images to obtain a plurality of text contents corresponding to the plurality of text images, executing keyword searching operation on the plurality of text contents according to the keyword set, and determining keywords contained in the plurality of text contents as target keywords; acquiring keyword coordinates of the target keywords;

the extraction unit is used for determining a target text according to the first rectangular area, and extracting the target text, and is specifically used for: intercepting a first rectangular image corresponding to the first rectangular area from the DWG drawing, acquiring a preset text detection model, and taking the first rectangular image as input of the text detection model to acquire a detection result corresponding to the first rectangular image; judging whether the first rectangular area contains text data according to the detection result, if the first rectangular area contains text data, acquiring a text format corresponding to the target keyword, extracting a data format corresponding to the text data, judging whether the text format is consistent with the data format, and if the text format is inconsistent with the data format, determining that the text data is in an invalid state; if the first rectangular area does not contain text data, acquiring a preset searching step length, and determining a second rectangular area according to the searching step length and the first rectangular area; judging whether the second rectangular area contains text data or not; if so, extracting a text to be detected from the second rectangular area; determining a target text format corresponding to the target keyword according to a mapping relation between the preset keyword and the text format, acquiring a text format to be tested of the text to be tested, and comparing the target text format with the text format to be tested; and if the comparison is successful, determining the text to be detected as the target text.

3. An electronic device comprising a processor, a memory, a communication interface, and one or more programs stored in the memory, the one or more programs being executed by the processor to implement the method of claim 1.

4. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program, which is executed by a processor to implement the method of claim 1.