CN111401312B - PDF drawing text recognition method, system and equipment - Google Patents

PDF drawing text recognition method, system and equipment Download PDF

Info

Publication number
CN111401312B
CN111401312B CN202010278085.5A CN202010278085A CN111401312B CN 111401312 B CN111401312 B CN 111401312B CN 202010278085 A CN202010278085 A CN 202010278085A CN 111401312 B CN111401312 B CN 111401312B
Authority
CN
China
Prior art keywords
image
recognition
region
text
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010278085.5A
Other languages
Chinese (zh)
Other versions
CN111401312A (en
Inventor
张东锋
曾雏鹏
李俊波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xinzhi Software Co ltd
Original Assignee
Shenzhen Xinzhi Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xinzhi Software Co ltd filed Critical Shenzhen Xinzhi Software Co ltd
Priority to CN202010278085.5A priority Critical patent/CN111401312B/en
Publication of CN111401312A publication Critical patent/CN111401312A/en
Application granted granted Critical
Publication of CN111401312B publication Critical patent/CN111401312B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • G06V30/422Technical drawings; Geographical maps
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention provides a PDF drawing text recognition method, a system and equipment, wherein the PDF drawing text recognition method comprises the following steps: performing an optical character recognition step based on the deep learning; a customized recognition and general recognition step; a mobile device low-quality image recognition step; wherein the performing optical character recognition step based on the deep learning includes the steps of: detecting a region with characters in a scene and identifying the characters in the region, wherein text detection is performed based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRATT algorithm; the character recognition is carried out based on CNN and CRNN algorithms; wherein the customizing identifying step comprises the following steps: identifying the type of the PDF drawing according to the table characters in the PDF or the frame content in the PDF; extracting content in the region according to the structural features; and extracting the key area, and identifying the characters in the area or extracting the key characters through the deep neural network.

Description

PDF drawing text recognition method, system and equipment
Technical Field
The present invention relates to the field of image processing, and in particular, to a PDF drawing text recognition method, system, and device.
Background
Artificial intelligence has been rapidly developed in terms of data, algorithms and computational power, and has come to a new round of development waves under the large background of global economic digitization transformation. The influence of the artificial intelligence wave is far beyond, and the most remarkable characteristic is that the influence is spread from the professional field to the popular field.
PDF high-precision recognition is a well-established technology in the market today, and a method based on conventional OCR and deep learning is also applied to various industries. Recognition of bank notes, PDF form recognition, and industrial drawing recognition are all widely and well-established techniques. The recognition of formatted and templated PDFs achieves remarkable results in precision and speed, thereby improving the working efficiency and capacity of practitioners in various industries.
The traditional PDF identifies the PDF in a basic fixed form, and has certain requirements on PDF quality. With the popularization of smart phones, the traditional method has not good solution for low-quality PDF images shot by personal mobile phones.
While most PDF identifications are currently whole identifications, no extraction and identification for a specific area is provided. For some structured PDF drawings, it is also a feature of the present solution to extract a region of interest (POI) of a user and parse the content therein.
Disclosure of Invention
The invention aims to provide a PDF drawing text recognition method, a PDF drawing text recognition system and PDF drawing text recognition equipment, provides various recognition schemes such as customization, general scenes and the like, and can provide a solution for recognizing low-quality images by users based on a recent OCR algorithm of deep learning and a corresponding image processing technology.
The invention further aims to provide a PDF drawing character recognition method, a PDF drawing character recognition system and PDF drawing character recognition equipment, which can be used for recognizing various scenes such as industrial drawings, notes, images shot by personal equipment and the like, and solving different requirements of different users.
In order to achieve at least one of the above objects, the present invention provides a PDF drawing text recognition method, which includes the following steps:
performing an optical character recognition step based on the deep learning; a customized recognition and general recognition step; a mobile device low-quality image recognition step;
Wherein the performing optical character recognition step based on deep learning includes the steps of: detecting a region with characters in a scene and identifying the characters in the region, wherein text detection is performed based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRATT algorithm; the character recognition is carried out based on CNN and CRNN algorithms;
Wherein the customizing identifying step comprises the following steps: identifying the type of the PDF drawing according to the table characters in the PDF or the frame content in the PDF; extracting content in the region according to the structural features; and extracting the key area, and identifying the characters in the area or extracting the key characters through the deep neural network.
In some embodiments, wherein the step of extracting key regions of the step of customizing identifying further comprises the steps of:
extracting a key region according to the proportion of the POI;
Extracting all frames by Hough transformation and corner detection;
extracting a key region according to fuzzy matching and accurate matching of the characters;
extracting a key region according to the edge characteristics of the region; and
And extracting the key region according to the Chinese character characteristic in the region.
In some embodiments, in the step of extracting the key region, the key region is extracted according to edge characteristics of shape or symmetry or angle or edge granularity of the region, wherein the key region is extracted according to characteristics of font or size or type of text in the region.
In some embodiments, wherein the mobile device low quality image recognition step further comprises the steps of:
Performing a filtering process on the image;
Performing image enhancement processing on the image;
Performing an image edge sharpening process on the image;
performing an image texture analysis process on the image;
performing image segmentation processing on the image;
Performing a geometric analysis process on the image;
Performing image matching processing on the image; and
Morphological processing is performed on the image.
In some embodiments, the filtering the image is performed in order to smooth the image and reduce noise of the image; wherein the performing image texture analysis processing on the image is performing de-skeletons and connectivity processing on the image; wherein the step of performing image matching processing on the image is to perform template matching and search matching processing on the image; wherein the morphological processing step is to perform expansion, corrosion and opening/closing operation processing on the image.
According to another aspect of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the PDF drawing text recognition method.
According to another aspect of the present invention, there is also provided a PDF drawing text recognition apparatus, including: a software application, a memory for storing the software application, and a processor for executing the software application; wherein each program of the software application program correspondingly executes the steps in the PDF drawing text recognition method.
According to another aspect of the present invention, there is also provided a PDF drawing text recognition system, including an optical character recognition unit, a customized recognition and general purpose recognition unit, and a mobile device low-quality image recognition unit, wherein the optical character recognition unit includes a text detection module and a text recognition module, wherein the text detection module is configured to: detecting a region with characters in a scene, and executing text detection based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRATE algorithm; wherein the text recognition module is configured to: identifying the characters in the detected region, and identifying the characters based on CRNN and CNN algorithms; the customized recognition and general recognition unit is provided with a customized recognition module, and the customized recognition module is configured to structure contents in the feature extraction area and recognize characters in the area or extract key characters through a deep neural network; wherein the mobile device low quality image recognition unit is configured to: filtering, enhancing, sharpening, texture analysis, segmentation, geometry analysis, matching, and morphology of the identified image
In some embodiments, the key region extraction module further includes a POI scale extraction module, a hough transform corner detection extraction module, a text blurring and exact matching extraction module, a region edge characteristic extraction module, and a region text characteristic extraction module; wherein the POI scale extraction module is configured to extract a key region according to the POI scale size; the Hough transformation corner detection and extraction module is configured to extract all frames by Hough transformation and corner detection; the text fuzzy and exact match extraction module is configured to extract a key region according to fuzzy match and exact match of text; the region edge characteristic extraction module is configured to extract a key region according to the edge characteristic of the region; the region text feature extraction module is configured to extract a key region based on the text feature in the region.
In some embodiments, the edge characteristics of the region edge characteristics extraction module are shape, symmetry, angle, and edge granularity, and the Chinese characteristics of the region text characteristics extraction module are font, size, and text type.
Drawings
Fig. 1 is a flowchart of steps of a PDF drawing text recognition method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a PDF drawing text recognition system according to an embodiment of the present invention.
Detailed Description
The following description is presented to enable one of ordinary skill in the art to make and use the invention. The preferred embodiments in the following description are by way of example only and other obvious variations will occur to those skilled in the art. The basic principles of the invention defined in the following description may be applied to other embodiments, variations, modifications, equivalents, and other technical solutions without departing from the spirit and scope of the invention.
It will be understood that the terms "a" and "an" should be interpreted as referring to "at least one" or "one or more," i.e., in one embodiment, the number of elements may be one, while in another embodiment, the number of elements may be plural, and the term "a" should not be interpreted as limiting the number.
The present invention relates to computer programs. Fig. 1 is a flowchart of a PDF drawing text recognition method according to the present invention, and illustrates a solution for controlling or processing a computer external object or an internal object by executing a computer program programmed according to the above-mentioned process on the basis of a computer program processing flow. The PDF drawing character recognition method can utilize a computer system, integrate manual experience and machine learning results, can be used for recognizing various scenes such as industrial drawings, notes, images shot by personal equipment and the like, and can solve different requirements of different users, and it can be understood that the computer is not only a desktop computer, a notebook computer, a tablet and the like, but also other intelligent electronic equipment capable of operating according to programs and processing data.
As shown in fig. 1, the PDF drawing text recognition method includes the following steps:
s10: the optical character recognition step is performed based on deep learning.
The optical character recognition step includes the steps of:
Detecting a region with characters in a scene, wherein text detection is performed based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRATT algorithm; and
And recognizing characters in the detected region, wherein the characters are recognized based on CRNN and CNN algorithms.
Further, the PDF drawing text recognition method further includes step S20: customized recognition and generic recognition steps.
The customizing and universal identifying steps comprise the following steps:
A customized identification step; and
And (3) a general identification step.
Wherein the customized identification step further comprises the steps of:
Identifying the type of the PDF drawing according to the table characters in the PDF or the frame content in the PDF;
extracting content in the region according to the structural features; and
And extracting a key area, and identifying characters in the area or extracting key characters through a deep neural network.
Wherein the step of extracting the key region further comprises the steps of:
Extracting according to the proportion of the POI;
Extracting all frames by Hough transformation and corner detection;
fuzzy matching and accurate matching of characters;
According to the edge characteristics of the region, such as shape, symmetry, angle, edge granularity, etc.; and
Based on character characteristics in the region, such as font, size, text type, etc.
Further, the PDF drawing text recognition method further includes step S30: a mobile device low quality image recognition step.
The mobile device low quality image recognition step further comprises the steps of:
Filtering, such as image smoothing, image denoising;
Enhancing the image;
Sharpening the image edge;
image texture analysis, e.g., debonding, connectivity;
Dividing an image;
Geometric analysis;
image matching, e.g., template matching, search matching; and
Morphological treatments such as expansion, corrosion, opening and closing operations, etc.
As an enterprise provides an identification portal, the ability to provide an individual user with the ability to take PDF drawings and identify. However, the image shot by the user often has poor quality due to illumination conditions, shooting angles, and the like. The PDF drawing character recognition method can improve the quality of images, thereby improving the recognition precision. Aiming at the low-quality image provided by the personal equipment provided by the user, the image can have better expressive force after being processed by the step of identifying the low-quality image of the mobile equipment, and the image quality approximates to a high-precision PDF, so that an identification algorithm can be better carried out. But also expands the applicability and generalization of the overall algorithm.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided in the form of a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
It will be appreciated by those skilled in the art that the PDF drawing text recognition method of the present invention may be implemented by hardware, software, or a combination of hardware and software. The invention may be implemented in a centralized fashion in at least one computer system or in a distributed fashion where different elements are spread across several interconnected computer systems. Any kind of computer system or other apparatus adapted for carrying out the methods is suited. The combination of hardware and software may be a general-purpose computer system with a computer program installed thereon, and the computer system may be controlled to operate according to the method by installing and executing the program.
The present invention can be embedded in a computer program product, which comprises all the features enabling the implementation of the methods described herein. The computer program product is embodied in one or more computer-readable storage media having computer-readable program code embodied therein. According to another aspect of the invention there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, is capable of performing the steps of the method of the invention. Computer storage media is the medium in computer memory that stores some discrete physical quantity. Computer storage media includes, but is not limited to, semiconductors, disk storage, magnetic cores, drums, tapes, laser disks, and the like. It will be appreciated by those skilled in the art that the computer storage media is not limited to the foregoing examples, which are provided by way of example only and are not limiting of the invention.
A typical combination of hardware and software could be a general purpose computer system with a computer program that, when being loaded and executed, controls the computer system such that it carries out the methods disclosed herein. According to another aspect of the present invention, there is also provided a PDF drawing text recognition apparatus, including: a software application, a memory for storing the software application, and a processor for executing the software application. Each program of the software application program can correspondingly execute the steps in the PDF drawing text recognition method.
Corresponding to the embodiment of the method, according to another aspect of the invention, a PDF drawing text recognition system is also provided, and the PDF drawing text recognition system is the application of the PDF drawing text recognition method in the improvement of computer programs.
As shown in fig. 2, in this embodiment of the present invention, the PDF drawing text recognition system includes an optical character recognition unit 100, a customized recognition and general recognition unit 200, and a mobile device low-quality image recognition unit 300.
Specifically, the optical character recognition unit 100 includes a text detection module 110 and a recognition module 120. Wherein the text detection module 110 detects areas of text in the scene, preferably performs text detection based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRAFT algorithm. The text recognition module 120 recognizes the text in the detected region, preferably based on CRNN and CNN algorithms. Those skilled in the art will appreciate that in other embodiments of the present invention, other algorithms may be used in addition to CRNN, CNN, etc., and the present invention is not limited in this respect.
With the development of neural networks in computer vision, the precision of Optical Character Recognition (OCR) has been greatly improved compared with the conventional technology. Under the deep learning large background, the character recognition is expanded from the recognition of the traditional scene to the character recognition of the general scene, namely the character recognition of the natural scene. The algorithm model of the optical character recognition unit ensures that the recognition accuracy of the user is guaranteed to the greatest extent possible. In addition, the speed and precision may be different for different users, so the PDF drawing text recognition system adapts to various algorithms according to different scenes and users, and the text recognition module 120 replaces some complex text recognition algorithms (such as CRNN) with CNNs, so that recognition schemes with a faster speed can be provided as far as possible in the face of simple scenes.
The customized recognition and general recognition unit 200 is provided with a customized recognition module 210, and the customized recognition module 210 is configured to structure contents within the feature extraction area and recognize characters in the area or extract key characters through a deep neural network. Preferably, the customized recognition module 210 is provided with a key region extraction module 220, and the key region extraction module 220 further includes a POI scale extraction module 221, a hough transform corner detection extraction module 222, a text blur and exact match extraction module 223, a region edge characteristic extraction module 224, and a region text characteristic extraction module 225. The POI proportion extraction module 221 extracts according to the POI proportion; the hough transform corner detection extraction module 222 extracts all frames by using hough transform and corner detection; the text fuzzy and exact match extraction module 223 extracts the content in the key region according to the fuzzy match and exact match of the text; the region edge feature extraction module 224 extracts content in the key region, such as shape, symmetry, angle, and edge granularity, according to the edge feature of the region; the regional text feature extraction module 225 extracts content, such as font, size, text type, etc., in the key region based on the text feature in the region.
The mobile device low-quality image recognition unit 300 is configured to perform a filtering process, an image enhancement process, an image edge sharpening process, an image texture analysis process, an image segmentation process, a geometric morphology analysis process, an image matching process, and a morphology process on the recognized image.
As an enterprise provides an identification portal, the ability to provide an individual user with the ability to take PDF drawings and identify. However, the image shot by the user often has poor quality due to illumination conditions, shooting angles, and the like. The mobile device low-quality image recognition unit 300 adopts various image processing schemes to improve the quality of the image, thereby improving the recognition accuracy.
Aiming at the low-quality image provided by the personal equipment provided by the user, the image can have better expressive force after being processed by the PDF drawing character recognition system, and the image quality approximates to a high-precision PDF, so that the recognition algorithm can be better carried out, and the application scene and generalization of the whole algorithm can be expanded.
It will be appreciated by persons skilled in the art that the present invention has been described with reference to flowchart illustrations and/or block diagrams of methods, systems and computer program products according to the invention. Each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart and/or block diagram block or blocks.
It will be appreciated by persons skilled in the art that the embodiments of the invention described above and shown in the drawings are by way of example only and are not limiting. The objects of the present invention have been fully and effectively achieved. The functional and structural principles of the present invention have been shown and described in the examples and embodiments of the invention may be modified or practiced without departing from such principles.

Claims (4)

1. The PDF drawing text recognition method is characterized by comprising the following steps of:
performing an optical character recognition step based on the deep learning; a customized recognition and general recognition step; a mobile device low-quality image recognition step;
Wherein the performing optical character recognition step based on deep learning includes the steps of: detecting a region with characters in a scene and identifying the characters in the region, wherein text detection is performed based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRATT algorithm; the character recognition is carried out based on CNN and CRNN algorithms;
Wherein the customizing identifying step comprises the following steps: identifying the type of the PDF drawing according to the table characters in the PDF or the frame content in the PDF; extracting content in the region according to the structural features; extracting a key area, and identifying characters in the area or extracting key characters through a deep neural network;
Wherein the step of extracting key regions of the step of customizing identification further comprises the steps of: extracting a key region according to the proportion of the POI; extracting all frames by Hough transformation and corner detection; extracting a key region according to fuzzy matching and accurate matching of the characters; extracting a key region according to the edge characteristics of the region; and
Extracting a key region according to Chinese character characteristics in the region;
In the step of extracting the key region in the step of customizing and identifying, the key region is extracted according to the shape, symmetry, angle or edge characteristic of the edge granularity of the region, wherein the key region is extracted according to the character font, size or character type characteristic of characters in the region;
wherein the mobile device low quality image recognition step further comprises the steps of: performing a filtering process on the image; performing image enhancement processing on the image; performing an image edge sharpening process on the image; performing an image texture analysis process on the image; performing image segmentation processing on the image; performing a geometric analysis process on the image; performing image matching processing on the image; performing morphological processing on the image;
Wherein the filtering processing step is to perform image smoothing and image noise reduction processing on the image; wherein the performing image texture analysis processing on the image is performing de-skeletons and connectivity processing on the image; wherein the step of performing image matching processing on the image is to perform template matching and search matching processing on the image; wherein the morphological processing step is to perform expansion, corrosion and opening/closing operation processing on the image.
2. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor performs the steps of the PDF drawing text recognition method of claim 1.
3. A PDF drawing text recognition device, characterized in that the PDF drawing text recognition device comprises: a software application, a memory for storing the software application, and a processor for executing the software application; wherein each program of the software application program correspondingly executes the steps in the PDF drawing text recognition method as claimed in claim 1.
4. A PDF drawing text recognition system comprising an optical character recognition unit, a customized recognition and universal recognition unit, and a mobile device low quality image recognition unit, wherein the optical character recognition unit comprises a text detection module and a text recognition module, wherein the text detection module is configured to: detecting a region with characters in a scene, and executing text detection based on CTPN, seglink, textBoxes, FTSN, pixellink and a CRATE algorithm; wherein the text recognition module is configured to: identifying the characters in the detected region, and identifying the characters based on CRNN and CNN algorithms; the customized recognition and general recognition unit is provided with a customized recognition module, and the customized recognition module is configured to structure contents in the feature extraction area and recognize characters in the area or extract key characters through a deep neural network; wherein the mobile device low quality image recognition unit is configured to: performing filtering processing, image enhancement processing, image edge sharpening processing, image texture analysis processing, image segmentation processing, geometric form analysis processing, image matching processing and morphological processing on the identified image; the key region extraction module further comprises a POI proportion extraction module, a Hough transformation corner detection extraction module, a text blurring and accurate matching extraction module, a region edge characteristic extraction module and a region text characteristic extraction module; wherein the POI scale extraction module is configured to extract a key region according to the POI scale size; the Hough transformation corner detection and extraction module is configured to extract all frames by Hough transformation and corner detection; the text fuzzy and exact match extraction module is configured to extract a key region according to fuzzy match and exact match of text; the region edge characteristic extraction module is configured to extract a key region according to the edge characteristic of the region; the regional text characteristic extraction module is configured to extract a key region according to the Chinese characteristic in the region; the edge characteristics of the area edge characteristic extraction module are shape, symmetry, angle and edge granularity, and the Chinese characteristics of the area character characteristic extraction module are font, size and character type; wherein the mobile device low quality image recognition unit is further configured to: the filtering process performs image smoothing and image noise reduction; the image texture analysis processing executes de-skeleton and connectivity processing; the image matching process performs template matching and search matching process; and morphological processing to perform expansion, etching, and opening and closing operations.
CN202010278085.5A 2020-04-10 2020-04-10 PDF drawing text recognition method, system and equipment Active CN111401312B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010278085.5A CN111401312B (en) 2020-04-10 2020-04-10 PDF drawing text recognition method, system and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010278085.5A CN111401312B (en) 2020-04-10 2020-04-10 PDF drawing text recognition method, system and equipment

Publications (2)

Publication Number Publication Date
CN111401312A CN111401312A (en) 2020-07-10
CN111401312B true CN111401312B (en) 2024-04-26

Family

ID=71435039

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010278085.5A Active CN111401312B (en) 2020-04-10 2020-04-10 PDF drawing text recognition method, system and equipment

Country Status (1)

Country Link
CN (1) CN111401312B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814791B (en) * 2020-07-24 2024-03-19 西门子(中国)有限公司 Method and device for identifying components in system graph
CN112183029B (en) * 2020-09-25 2023-10-31 四川巧夺天工信息安全智能设备有限公司 Digital conversion method for PDF drawing in sheet metal industry
CN112733735B (en) * 2021-01-13 2024-04-09 国网上海市电力公司 Method for classifying and identifying drawing layout by adopting machine learning
CN113094786A (en) * 2021-04-06 2021-07-09 万翼科技有限公司 Construction drawing structured organization method and device based on drawing POI
CN113743052A (en) * 2021-08-17 2021-12-03 的卢技术有限公司 Multi-mode-fused resume layout analysis method and device
CN117558019A (en) * 2024-01-11 2024-02-13 武汉理工大学 Method for automatically extracting symbol map parameters from PDF format component manual

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110291536A (en) * 2017-01-30 2019-09-27 赛门铁克公司 For preventing the structured text and pattern match of the loss of data in object specific pattern image field
CN110334346A (en) * 2019-06-26 2019-10-15 京东数字科技控股有限公司 A kind of information extraction method and device of pdf document
CN110363102A (en) * 2019-06-24 2019-10-22 北京融汇金信信息技术有限公司 A kind of identification of objects process method and device of pdf document
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file
CN110569769A (en) * 2019-08-29 2019-12-13 浙江大搜车软件技术有限公司 image recognition method and device, computer equipment and storage medium
CN110751143A (en) * 2019-09-26 2020-02-04 中电万维信息技术有限责任公司 Electronic invoice information extraction method and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050289182A1 (en) * 2004-06-15 2005-12-29 Sand Hill Systems Inc. Document management system with enhanced intelligent document recognition capabilities
US10628668B2 (en) * 2017-08-09 2020-04-21 Open Text Sa Ulc Systems and methods for generating and using semantic images in deep learning for classification and data extraction

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110291536A (en) * 2017-01-30 2019-09-27 赛门铁克公司 For preventing the structured text and pattern match of the loss of data in object specific pattern image field
CN110363102A (en) * 2019-06-24 2019-10-22 北京融汇金信信息技术有限公司 A kind of identification of objects process method and device of pdf document
CN110334346A (en) * 2019-06-26 2019-10-15 京东数字科技控股有限公司 A kind of information extraction method and device of pdf document
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file
CN110569769A (en) * 2019-08-29 2019-12-13 浙江大搜车软件技术有限公司 image recognition method and device, computer equipment and storage medium
CN110751143A (en) * 2019-09-26 2020-02-04 中电万维信息技术有限责任公司 Electronic invoice information extraction method and electronic equipment

Also Published As

Publication number Publication date
CN111401312A (en) 2020-07-10

Similar Documents

Publication Publication Date Title
CN111401312B (en) PDF drawing text recognition method, system and equipment
US11151363B2 (en) Expression recognition method, apparatus, electronic device, and storage medium
Saxena Niblack’s binarization method and its modifications to real-time applications: a review
WO2019232862A1 (en) Mouth model training method and apparatus, mouth recognition method and apparatus, device, and medium
WO2019232866A1 (en) Human eye model training method, human eye recognition method, apparatus, device and medium
US9971929B2 (en) Fingerprint classification system and method using regular expression machines
WO2018103608A1 (en) Text detection method, device and storage medium
Flores et al. Application of convolutional neural networks for static hand gestures recognition under different invariant features
US9575566B2 (en) Technologies for robust two-dimensional gesture recognition
CN106845384B (en) gesture recognition method based on recursive model
CN103208004A (en) Automatic recognition and extraction method and device for bill information area
CN103902977A (en) Face identification method and device based on Gabor binary mode
CN108121946B (en) Fingerprint image preprocessing method and device
CN117197904A (en) Training method of human face living body detection model, human face living body detection method and human face living body detection device
CN112926379A (en) Method and device for constructing face recognition model
CN110232381B (en) License plate segmentation method, license plate segmentation device, computer equipment and computer readable storage medium
Yahya et al. A new technique for iris localization in iris recognition systems
CN112101293A (en) Facial expression recognition method, device, equipment and storage medium
Lahiani et al. Hand pose estimation system based on a cascade approach for mobile devices
Lahiani et al. Comparative study beetween hand pose estimation systems for mobile devices.
CN113657364A (en) Method, device, equipment and storage medium for recognizing character mark
Singh et al. Detecting face region in binary image
CN111222116A (en) Intelligent terminal
Mehta et al. An Efficient Way to Detect and Recognize the Overlapped Coins using Otsu's Algorithm based on Hough Transform Technique
Patil Comparative Approach for Face Detection in Python, OpenCV and Hardware

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant