CN110852039A - Method and device for converting characters into curves in PDF (Portable document Format) file - Google Patents

Method and device for converting characters into curves in PDF (Portable document Format) file Download PDF

Info

Publication number
CN110852039A
CN110852039A CN201810829748.0A CN201810829748A CN110852039A CN 110852039 A CN110852039 A CN 110852039A CN 201810829748 A CN201810829748 A CN 201810829748A CN 110852039 A CN110852039 A CN 110852039A
Authority
CN
China
Prior art keywords
character
converted
description
pdf
acquiring
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810829748.0A
Other languages
Chinese (zh)
Inventor
郭相军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Original Assignee
Peking University Founder Group Co Ltd
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co Ltd, Beijing Founder Electronics Co Ltd filed Critical Peking University Founder Group Co Ltd
Priority to CN201810829748.0A priority Critical patent/CN110852039A/en
Publication of CN110852039A publication Critical patent/CN110852039A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention provides a method and a device for converting characters into curves in a PDF (portable document format) file, which are used for identifying the font type of characters to be converted in the PDF file; acquiring the character description of the character to be converted from a corresponding word stock according to the font type; and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file. The method can realize the fast and accurate conversion of the characters in the PDF file into the curve, and achieves the purpose of avoiding the characters in the PDF file from being replaced.

Description

Method and device for converting characters into curves in PDF (Portable document Format) file
Technical Field
The invention relates to the technical field of computer information processing, in particular to a method and a device for converting characters in a PDF file into curves.
Background
PDF English is called Portable Document Format, and is an electronic file Format. This file format is operating system platform independent, i.e., PDF files are common in either Windows, Unix or MacOS operating systems. This feature makes it an ideal file format for electronic document distribution and digital information dissemination, and more electronic books, product descriptions, company reports, network materials, and e-mails use PDF format documents at the beginning. PDF formatted files have become a de facto industry standard for digitized information.
In some special applications, it is necessary to convert all or part of the characters in the PDF file into a curve to achieve the purpose of avoiding the characters from being replaced, for example, to avoid the characters in the PDF file from changing fonts, formats, etc. when a computer or software is replaced, or to prevent others from maliciously tampering the contents of the PDF file, it is necessary to convert all or part of the characters in the PDF file into a curve.
Disclosure of Invention
The invention provides a method and a device for converting characters in a PDF file into curves, which are used for quickly and accurately converting the characters in the PDF file into the curves so as to achieve the aim of avoiding the characters from being replaced.
One aspect of the present invention provides a method for converting characters into curves in a PDF file, where the method includes:
identifying the font type of the character to be converted in the PDF file;
acquiring the character description of the character to be converted from a corresponding word stock according to the font type;
and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file.
Further, the characters are described as straight lines or Bezier curves described in the form of points;
the obtaining of the curve information in the corresponding PDF description according to the character description includes:
and converting a straight line or a Bezier curve described in the character description in the form of points into a curve in the PDF description so as to obtain curve information in the PDF description.
Further, after the curve information is written into the content stream of the PDF file, the method further includes:
and deleting the display instruction of the character to be converted in the PDF file.
Further, the obtaining of the character description of the character from the corresponding word stock according to the font type includes:
acquiring charcode information of the character to be converted;
and acquiring the character description of the character to be converted from a corresponding word stock according to the charcode information.
Further, the acquiring charcode information of the character to be converted includes:
if the font Type is a simple font Type, directly acquiring charcode information of the character to be converted, wherein the simple font Type comprises Type1, TrueType or Type 3;
and if the character is a composite font type, acquiring the CMap of the font type, analyzing the CMap, and acquiring charcode information of the character to be converted by combining the CMap of the font referenced by the content stream in the PDF file, wherein the composite font type comprises a CID font.
Another aspect of the present invention provides an apparatus for converting characters into curves in a PDF file, comprising:
the identification module is used for identifying the font type of the character to be converted in the PDF file;
the character description acquisition module is used for acquiring the character description of the character to be converted from a corresponding font library according to the font type;
and the conversion module is used for acquiring curve information in the corresponding PDF description according to the character description and writing the curve information into the content stream of the PDF file.
Further, the characters are described as straight lines or Bezier curves described in the form of points;
the conversion module is specifically configured to:
and converting a straight line or a Bezier curve described in the character description in the form of points into a curve in the PDF description so as to obtain curve information in the PDF description.
Further, the conversion module is further configured to:
and deleting the display instruction of the character to be converted in the PDF file.
Further, the character description acquiring module is configured to:
acquiring charcode information of the character to be converted;
and acquiring the character description of the character to be converted from a corresponding word stock according to the charcode information.
Further, the character description acquiring module is configured to:
if the font Type is a simple font Type, directly acquiring charcode information of the character to be converted, wherein the simple font Type comprises Type1, TrueType or Type 3;
and if the character is a composite font type, acquiring the CMap of the font type, analyzing the CMap, and acquiring charcode information of the character to be converted by combining the CMap of the font referenced by the content stream in the PDF file, wherein the composite font type comprises a CID font.
The method and the device for converting the characters into the curves in the PDF file provided by the invention identify the font type of the characters to be converted in the PDF file; acquiring the character description of the character to be converted from a corresponding word stock according to the font type; and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file. The method can realize the fast and accurate conversion of the characters in the PDF file into the curve, and achieves the purpose of avoiding the characters in the PDF file from being replaced.
Drawings
In order to more clearly illustrate the specific embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the drawings required for the embodiments or the prior art descriptions, obviously, each drawing in the following description is only a few embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a flowchart of a method for converting characters into curves in a PDF file according to an embodiment of the present invention;
FIG. 2 is a flowchart illustrating a method for converting characters into curves in a PDF file according to another embodiment of the present invention;
fig. 3 is a structural diagram of an apparatus for converting characters into curves in a PDF file according to another embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a flowchart of a method for converting characters into curves in a PDF file according to an embodiment of the present invention. As shown in fig. 1, this embodiment provides a method for converting characters into curves in a PDF file, which includes the following specific steps:
s101, identifying the font type of the character to be converted in the PDF file.
In the embodiment, it is necessary to first identify the font Type of the character to be converted in the PDF file, for example, the character to be converted belongs to a simple font Type such as Type1, TrueType or Type3, where a simple font represents one character by one byte, or belongs to a compound font Type such as Type, where one character of a compound font is represented by several bytes, where different fonts are different from each other in description of the character. Among them, Type1 and Type3 are different versions of a text part in different periods in PostScript language published by Adobe corporation, and describe a font outline by using a cubic curve, so that font library information is less, output speed is high, and a font can be enlarged and reduced without distortion. TrueType is a computer font that employs a novel mathematical glyph description technique proposed by Apple corporation in conjunction with Microsoft corporation. The method uses a mathematical function to describe the outline of a font, and comprises instructions such as font construction, color filling, a digital description function, flow condition control, grid processing control, additional cue control and the like, and TrueType uses a quadratic Bessel curve to describe the font. CID is a PostScript font library format published by Adobe company, and consists of two parts, namely CIDFont and CMap table, wherein the description of the font uses cubic curve to describe the outline of the font, and uses the corresponding relation of CMap to organize thousands of characters, thus having the advantages of easy expansion, high speed and practicability. In this embodiment, based on the simple font, one byte represents one character, and the composite font, one character is represented by several bytes, the number of bytes of the character to be converted may be determined first, and the font type of the character to be converted may be determined according to the number of bytes. Specifically, the determination may be made based on the CMap description of the font dictionary in the PDF.
And S102, acquiring the character description of the character to be converted from the corresponding word stock according to the font type.
In this embodiment, since the character descriptions of the characters to be converted in the different font type word libraries are different, the character descriptions of the characters to be converted need to be obtained from the corresponding word libraries according to the font types. Specifically, charcode information of the character to be converted may be first obtained, where charcode is a character code, and then the character description of the character to be converted is obtained from a corresponding word library according to the charcode information. Of course, in this embodiment, other means may also be used to obtain the character description of the character to be converted from the word stock, which is not described herein again.
S103, curve information in the corresponding PDF description is obtained according to the character description, and the curve information is written into a content stream of the PDF file.
In this embodiment, since the character description of fonts such as Type1, TrueType, Type3, CID, and the like is usually a straight line or a bezier curve described in the form of a point, in this embodiment, the straight line or the bezier curve described in the form of a point may be converted into a curve in a PDF description, so as to obtain curve information in the PDF description, and then the curve information is written into a content stream of the PDF file, where the content stream of the PDF file includes a series of instructions for describing the appearance of a page or the appearance and file content of other graphic entities, so that a process of converting a character to be converted into a curve may be completed, and a purpose of avoiding the character being replaced may be achieved.
In the method for converting characters into curves in a PDF file provided by this embodiment, the font type of the character to be converted in the PDF file is identified; acquiring the character description of the character to be converted from a corresponding word stock according to the font type; and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file. The method for converting the characters in the PDF file into the curve can quickly and accurately convert the characters in the PDF file into the curve, and achieves the purpose of avoiding the characters in the PDF file from being replaced.
Fig. 2 is a flowchart of a method for converting characters into curves in a PDF file according to another embodiment of the present invention. As shown in fig. 2, on the basis of the above embodiment, the embodiment provides a method for converting characters into curves in a PDF file, which includes the following specific steps:
s201, identifying the font type of the character to be converted in the PDF file.
In the embodiment, the font Type of the character to be converted in the PDF file is first identified, for example, the character to be converted belongs to a simple font Type such as Type1, TrueType or Type3, where a single font byte represents one character, or belongs to a compound font Type such as Type, where a single character of a compound font is represented by several bytes, where different fonts describe different characters. The font type of the character to be converted in the PDF file is explicitly described in a font dictionary of the PDF, and the font type of the character to be converted in the PDF file can be identified through the font dictionary of the PDF. Specifically, the number of bytes of the character to be converted is determined firstly, and the font type of the character to be converted is determined according to the number of bytes.
S202, acquiring charcode information of the character to be converted.
In this embodiment, the obtaining of the charcode information of the character to be converted can be specifically realized by the following method:
if the font Type is a simple font Type, directly acquiring charcode information of the character to be converted, wherein the charcode information is one byte in a character string, and the simple font Type comprises Type1, TrueType or Type 3;
if the font type is a composite font type, acquiring a CMap of the font type, analyzing the CMap, and acquiring charcode information of the character to be converted by combining the CMap of the font referenced by the content stream in the PDF file, wherein the composite font type comprises a CID font, specifically, a CMap table defines which bytes represent a character, for example, the CMap has 0x15, and when 0x15 appears in the content stream, the CMap is interpreted as a charcode, that is, a character is represented.
And S203, acquiring the character description of the character to be converted from the corresponding word stock according to the charcode information.
In this embodiment, according to the charcode information of the character to be converted, the character description is searched from the corresponding word stock. Wherein, for fonts such as Type1, TrueType, Type3 or CID, the characters are described as straight lines or Bezier curves described in the form of points.
S204, converting the straight line or Bezier curve described in the character description in the form of points into a curve in PDF description, thereby obtaining curve information in the PDF description.
In the present embodiment, a straight line or a bezier curve described in the form of a point may be converted into a curve in the PDF description (bezier curve), thereby obtaining curve information in the PDF description.
S205, curve information in the corresponding PDF description is obtained according to the character description, and the curve information is written into a content stream of the PDF file.
Further, after the curve information is written into the content stream of the PDF file, the method may further include:
s206, deleting the display instruction of the character to be converted in the PDF file.
In this embodiment, after the curve information is written into the content stream of the PDF file, the display instruction of the character to be converted in the PDF file is removed, so that the text in the character to be converted is not displayed any more, and the character is displayed in the PDF file only by the curve described in the PDF description, thereby achieving the purpose of avoiding the character being replaced.
And finally, judging whether all the characters in a certain character library in the PDF file complete curve conversion, and if all the characters in the certain character library in the PDF file complete curve conversion, deleting the character library embedded in the PDF file.
In the method for converting characters into curves in a PDF file provided by this embodiment, the font type of the character to be converted in the PDF file is identified; acquiring the character description of the character to be converted from a corresponding word stock according to the font type; and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file. The method of the embodiment can realize the fast and accurate conversion of the characters in the PDF file into the curve, and achieves the purpose of avoiding the characters in the PDF file from being replaced.
Fig. 3 is a structural diagram of an apparatus for converting characters into curves in a PDF file according to an embodiment of the present invention. The embodiment provides a device for converting characters into curves in a PDF file, which can execute the processing flow provided by the method for converting characters into curves in a PDF file. As shown in fig. 3, the apparatus for converting characters into curves in a PDF file of the present embodiment includes a recognition module 301, a character description obtaining module 302, and a conversion module 303.
The identification module 301 is configured to identify a font type of a character to be converted in a PDF file;
a character description obtaining module 302, configured to obtain a character description of the character to be converted from a corresponding font library according to the font type;
and the conversion module 303 is configured to obtain curve information in the corresponding PDF description according to the character description, and write the curve information into a content stream of the PDF file.
Further, the characters are described as straight lines or Bezier curves described in the form of points;
the conversion module 303 is specifically configured to:
and converting a straight line or a Bezier curve described in the character description in the form of points into a curve in the PDF description so as to obtain curve information in the PDF description.
Further, the conversion module 303 is further configured to:
and deleting the display instruction of the character to be converted in the PDF file.
Further, the character description obtaining module 302 is configured to:
acquiring charcode information of the character to be converted;
and acquiring the character description of the character to be converted from a corresponding word stock according to the charcode information.
Further, the character description obtaining module 302 is configured to:
if the font Type is a simple font Type, directly acquiring charcode information of the character to be converted, wherein the simple font Type comprises Type1, TrueType or Type 3;
and if the character is a composite font type, acquiring the CMap of the font type, analyzing the CMap, and acquiring charcode information of the character to be converted by combining the CMap of the font referenced by the content stream in the PDF file, wherein the composite font type comprises a CID font.
The device for converting characters into curves in a PDF file according to the embodiments of the present invention may be specifically configured to execute the method embodiments provided in fig. 1 and fig. 2, and specific functions are not described herein again.
The device for converting characters into curves in a PDF file provided by this embodiment identifies the font type of the characters to be converted in the PDF file; acquiring the character description of the character to be converted from a corresponding word stock according to the font type; and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file. The device of the embodiment can realize the purpose of quickly and accurately converting the characters in the PDF file into the curve and avoiding the characters from being replaced.
The present invention also provides a computer-readable storage medium having stored thereon a computer program; when executed by a processor, the computer program implements an embodiment of a method for converting characters into curves in a PDF file as provided in fig. 1 and 2, by identifying the font type of the characters to be converted in the PDF file; acquiring the character description of the character to be converted from a corresponding word stock according to the font type; and obtaining curve information in the corresponding PDF description according to the character description, and writing the curve information into the content stream of the PDF file, so that the characters in the PDF file can be quickly and accurately converted into a curve, and the purpose of avoiding the characters in the PDF file from being replaced is achieved.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
It is obvious to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions. For the specific working process of the device described above, reference may be made to the corresponding process in the foregoing method embodiment, which is not described herein again.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for converting characters into curves in a PDF file is characterized by comprising the following steps:
identifying the font type of the character to be converted in the PDF file;
acquiring the character description of the character to be converted from a corresponding word stock according to the font type;
and acquiring curve information in the corresponding PDF description according to the character description, and writing the curve information into a content stream of the PDF file.
2. The method according to claim 1, characterized in that the character description is a straight line or a bezier curve described in the form of points;
the obtaining of the curve information in the corresponding PDF description according to the character description includes:
and converting a straight line or a Bezier curve described in the character description in the form of points into a curve in the PDF description so as to obtain curve information in the PDF description.
3. The method according to claim 1, wherein after writing the curve information into the content stream of the PDF file, the method further comprises:
and deleting the display instruction of the character to be converted in the PDF file.
4. The method according to any one of claims 1 to 3, wherein the obtaining of the character description of the character to be converted from the corresponding word stock according to the font type includes:
acquiring charcode information of the character to be converted;
and acquiring the character description of the character to be converted from a corresponding word stock according to the charcode information.
5. The method according to claim 4, wherein the acquiring charcode information of the character to be converted comprises:
if the font Type is a simple font Type, directly acquiring charcode information of the character to be converted, wherein the simple font Type comprises Type1, TrueType or Type 3;
and if the character is a composite font type, acquiring the CMap of the font type, analyzing the CMap, and acquiring charcode information of the character to be converted by combining the CMap of the font referenced by the content stream in the PDF file, wherein the composite font type comprises a CID font.
6. An apparatus for converting characters into curves in a PDF file, comprising:
the identification module is used for identifying the font type of the character to be converted in the PDF file;
the character description acquisition module is used for acquiring the character description of the character to be converted from a corresponding font library according to the font type;
and the conversion module is used for acquiring curve information in the corresponding PDF description according to the character description and writing the curve information into the content stream of the PDF file.
7. The apparatus of claim 6, wherein the character description is a straight line or a bezier curve described in the form of points;
the conversion module is specifically configured to:
and converting a straight line or a Bezier curve described in the character description in the form of points into a curve in the PDF description so as to obtain curve information in the PDF description.
8. The apparatus of claim 6, wherein the conversion module is further configured to:
and deleting the display instruction of the character to be converted in the PDF file.
9. The apparatus according to any one of claims 6-8, wherein the character description acquisition module is configured to:
acquiring charcode information of the character to be converted;
and acquiring the character description of the character to be converted from a corresponding word stock according to the charcode information.
10. The apparatus of claim 9, wherein the character description acquisition module is configured to:
if the font Type is a simple font Type, directly acquiring charcode information of the character to be converted, wherein the simple font Type comprises Type1, TrueType or Type 3;
and if the character is a composite font type, acquiring the CMap of the font type, analyzing the CMap, and acquiring charcode information of the character to be converted by combining the CMap of the font referenced by the content stream in the PDF file, wherein the composite font type comprises a CID font.
CN201810829748.0A 2018-07-25 2018-07-25 Method and device for converting characters into curves in PDF (Portable document Format) file Pending CN110852039A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810829748.0A CN110852039A (en) 2018-07-25 2018-07-25 Method and device for converting characters into curves in PDF (Portable document Format) file

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810829748.0A CN110852039A (en) 2018-07-25 2018-07-25 Method and device for converting characters into curves in PDF (Portable document Format) file

Publications (1)

Publication Number Publication Date
CN110852039A true CN110852039A (en) 2020-02-28

Family

ID=69594630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810829748.0A Pending CN110852039A (en) 2018-07-25 2018-07-25 Method and device for converting characters into curves in PDF (Portable document Format) file

Country Status (1)

Country Link
CN (1) CN110852039A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444675A (en) * 2020-03-12 2020-07-24 稿定(厦门)科技有限公司 Character round angle processing method, medium, equipment and device
CN112613277B (en) * 2020-12-09 2024-05-28 万兴科技(湖南)有限公司 Method, system and storage medium for converting PDF document into DXF document

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1551013A (en) * 2003-04-01 2004-12-01 三星电子株式会社 Method and apparatus for generating vector font
CN102063415A (en) * 2009-11-16 2011-05-18 北大方正集团有限公司 Method and system for embedding single-byte fonts in PDF (Portable Document Format) file
CN103544408A (en) * 2013-09-23 2014-01-29 中山大学 Method for embedment and extraction of PDF document hidden information according to composite font
US20140215325A1 (en) * 2013-01-30 2014-07-31 Hewlett-Packard Development Company, L.P. Embedding bitmap fonts in pdf files
CN105488471A (en) * 2015-11-30 2016-04-13 北大方正集团有限公司 Character pattern recognition method and device
CN105528345A (en) * 2014-09-28 2016-04-27 北大方正集团有限公司 Terminal, server and character complementing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1551013A (en) * 2003-04-01 2004-12-01 三星电子株式会社 Method and apparatus for generating vector font
CN102063415A (en) * 2009-11-16 2011-05-18 北大方正集团有限公司 Method and system for embedding single-byte fonts in PDF (Portable Document Format) file
US20140215325A1 (en) * 2013-01-30 2014-07-31 Hewlett-Packard Development Company, L.P. Embedding bitmap fonts in pdf files
CN103544408A (en) * 2013-09-23 2014-01-29 中山大学 Method for embedment and extraction of PDF document hidden information according to composite font
CN105528345A (en) * 2014-09-28 2016-04-27 北大方正集团有限公司 Terminal, server and character complementing method
CN105488471A (en) * 2015-11-30 2016-04-13 北大方正集团有限公司 Character pattern recognition method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHONGDEDE: "pdf是如何识别其文件中的字符的", 《HTTPS://BBS.CSDN.NET/TOPICS/340109816》 *
百度经验: "PDF文件文字转曲线(转曲)", 《HTTPS://JINGYAN.BAIDU.COM/ARTICLE/A681BODEE5DB593BL 9434667.HTML》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111444675A (en) * 2020-03-12 2020-07-24 稿定(厦门)科技有限公司 Character round angle processing method, medium, equipment and device
CN111444675B (en) * 2020-03-12 2022-08-16 稿定(厦门)科技有限公司 Character round angle processing method, medium, equipment and device
CN112613277B (en) * 2020-12-09 2024-05-28 万兴科技(湖南)有限公司 Method, system and storage medium for converting PDF document into DXF document

Similar Documents

Publication Publication Date Title
CN109062874B (en) Financial data acquisition method, terminal device and medium
CN101996160B (en) Method and system for processing script data
CN108108342B (en) Structured text generation method, search method and device
CN102081594B (en) Equipment and method for extracting enclosing rectangles of characters from portable electronic documents
JP2002507301A (en) Paragraph layout method using layout service library
CN108038093B (en) PDF character extraction method and device
US20210019366A1 (en) Text Extraction Heuristics
RU2406142C2 (en) System and method of storing documents in serial binary format
CN112966469B (en) Method, device, equipment and storage medium for processing charts in document
CN110852039A (en) Method and device for converting characters into curves in PDF (Portable document Format) file
CN110674250A (en) Text matching method, text matching device, computer system and readable storage medium
CN113626561A (en) Component model identification method, device, medium and equipment
CN110377885B (en) Method, device, equipment and computer storage medium for converting PDF file
CN110008807B (en) Training method, device and equipment for contract content recognition model
CN102063415B (en) Method and system for embedding single-byte fonts in PDF (Portable Document Format) file
US8930808B2 (en) Processing rich text data for storing as legacy data records in a data storage system
CN102063416B (en) Method and system for embedding double-byte fonts into PDF file
US20180157658A1 (en) Streamlining citations and references
CN111414730A (en) Method, system, terminal and storage medium for acquiring document character format information
CN112818687B (en) Method, device, electronic equipment and storage medium for constructing title recognition model
CN114743012A (en) Text recognition method and device
CN114239562A (en) Method, device and equipment for identifying program code blocks in document
CN113255369A (en) Text similarity analysis method and device and storage medium
CN113539518A (en) Medicine data processing method and device based on RPA and AI and electronic equipment
CN111159234A (en) Method and device for comparing reports

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200228

RJ01 Rejection of invention patent application after publication