CN105335339A - Pdf document conversion method - Google Patents

Pdf document conversion method Download PDF

Info

Publication number
CN105335339A
CN105335339A CN201510674740.8A CN201510674740A CN105335339A CN 105335339 A CN105335339 A CN 105335339A CN 201510674740 A CN201510674740 A CN 201510674740A CN 105335339 A CN105335339 A CN 105335339A
Authority
CN
China
Prior art keywords
picture
word
formula
mark
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510674740.8A
Other languages
Chinese (zh)
Inventor
孙锡元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Woeasy Software Co Ltd
Original Assignee
Jiangsu Woeasy Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Woeasy Software Co Ltd filed Critical Jiangsu Woeasy Software Co Ltd
Priority to CN201510674740.8A priority Critical patent/CN105335339A/en
Publication of CN105335339A publication Critical patent/CN105335339A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Editing Of Facsimile Originals (AREA)

Abstract

The invention discloses a pdf document conversion method, which can be used for converting a word text with a great amount of rich information into a pdf document identical with the original document, wherein the style, the size and the position of characters, marks, pictures and tablets respectively maintain the completely identical states with the original document. In addition, jpg pictures can be synchronously fused with the word document to form a pdf document with the picture watermark. Meanwhile, even when the document is tampered, important information in the document can also be ensured not to be lost; higher safety performance is realized.

Description

The conversion method of pdf file
Technical field
The invention belongs to electronic information technical field, be specifically related to a kind of conversion method of pdf file.
Background technology
In current PC World, two kinds of document modes that utilization rate is the highest are the Doc form of MicrosoftWord and the Pdf formatted file of AdobeAcrobat respectively.Due to the infiltration of Microsoft, the form of our present used most manuscript or report is all Doc, and the file of Pdf form is also widely used.Pdf form is an open standard, the original form of energy document retaining, convenient, the safety when transmission over networks especially crosses over platform.Therefore usually word text-converted is become pdf file when carrying out high security file transfer.At present, there is pdf crossover tool miscellaneous, but in use we find, when incessantly possessing word in word text, when also having picture, formula, form, the frequent and former word document of the pdf form after conversion is distinguished to some extent, even occurs misplacing and mess code, even during pure words conversion, also have inconsistent phenomenon and occur.
Summary of the invention
For solving the problem, the invention discloses a kind of conversion method of pdf file.
In order to achieve the above object, the invention provides following technical scheme:
The conversion method of Pdf file, comprises the steps:
Separately obtain word in word document, picture, formula, form data, and the record wherein position of various information in the page, wherein Word message comprises text and text formatting, text and text formatting are extracted separately, first respectively mark is obtained when recording mark position, picture, formula, the coordinate position of upper left, form edge drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page, and take these to mark, picture, formula, the position at form edge is as separation, obtain mark respectively, picture, formula, text point before and after form, separately obtain word in form, mark, picture, formula, form data, and record the position of wherein various information, by word in word document, mark, picture, formula, form data is inserted in pdf formatted file according to its position of recording accordingly, wherein word is merged by the aforementioned text that separately obtains and text formatting and forms.
Further, when obtaining every terms of information and position thereof in word document respectively, the jpg picture that synchronous acquisition is chosen, when converting pdf file to, first according to the size of the parameter adjustment jpg picture preset, position and transparency, be embedded in pdf file and become digital watermarking, then the word obtained in word document, mark, picture, formula, form data are inserted in pdf formatted file accordingly according to its position of recording.
Further, to the text, mark, picture, the formula that separately obtain in word document, and text in form, mark, picture, formula info are encrypted separately, form sightless digital watermark embedding in pdf file.
Further, when obtaining the information position in form, acquisition be the positions of these information in overall page.
Further, when obtaining the information position in form, acquisition be the relative position of these information and matrix lattice.
Further, the coordinate position of upper left drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page is obtained when obtaining position.
Beneficial effect:
The conversion method of pdf file provided by the invention, the word text-converted including a large amount of abundant information can be become the pdf file consistent with original text shelves, the style of word wherein, mark, picture, form, size, position all keep identical with original text shelves.In addition, synchronously jpg picture and word document can also be merged, form the pdf file with picture watermark.Meanwhile, even if when file is subject to distorting, also can ensures that important information wherein can not be lost, there is high security performance.
Embodiment
Below with reference to specific embodiment, technical scheme provided by the invention is described in detail, following embodiment should be understood and be only not used in for illustration of the present invention and limit the scope of the invention.
This example provides a kind of conversion method of Pdf file, comprise the steps:
First, separately to obtain in word document the various information such as word, mark, picture, formula, form, and the record wherein position of various information in the page.Because Word message comprises text and text formatting, text is here the plain text data (comprising punctuation mark) not being accompanied with the format informations such as font, size, color, and text formatting then comprises the information such as font, size, color of word.Text data and Document type data are separately obtained.In a document, mark, picture, formula, form are mixed in the middle of word usually, first obtain respectively these marks, picture, formula, the coordinate position of upper left, form edge drift angle pixel in the page (can with the most lower left corner of page edge for initial point sets up two-dimensional coordinate system) and the most coordinate position of bottom right drift angle pixel in the page when recording mark position, mark, picture, formula, the form integral position in the page can be obtained like this.And take the position at these marks, picture, formula, form edge as separation, obtain the text point before and after mark, picture, formula, form respectively, when obtaining the position of above-mentioned word in the former word page, preferably segmented acquisition, get the coordinate position of each section of word most upper left drift angle pixel in the page and the most coordinate position point of bottom right drift angle pixel in the page, the integral position of this section of word in the page can be obtained.Even can obtain in lines, get the coordinate position of each style of writing word most upper left drift angle pixel in the page and the most coordinate position point of bottom right drift angle pixel in the page, obtain the integral position of this style of writing word in the page, this can make after pdf in word dislocation and former word position more identical.
Need to further illustrate, owing to usually containing word, picture in form, marking even form, these information therefore in form and position also should means same as described above be carried out separately obtaining.In above-mentioned paragraph, the acquisition of information of form is in fact the acquisition of form line information.When obtaining form data, should separately obtain the various information such as word in form, mark, picture, formula, form, and the record wherein position of various information in overall page.More preferably, obtain the relative position of word, mark, picture, formula, form and matrix lattice in form, information when can guarantee like this to change in form there will not be skew and distortion.
In addition, when the formula in original word be generated by formula editors time, formula letter wherein, symbol should obtain respectively and record their positions in the page, also the method for above-mentioned acquisition upper left, bottom right pixel point position can be adopted during record position, certainly, obtaining upper left, bottom right pixel point is not unique method, can adopt upper right, bottom left pixel point yet, even other modes such as corner pixel.Choose upper left, bottom right pixel point more meets conventional document in whole form of wording.
After obtaining above-mentioned information, insert in pdf formatted file by these words, mark, picture, formula, form data accordingly according to its position of recording, wherein word is merged by the aforementioned text that separately obtains and text formatting and forms.
As improvement of the present invention, when carrying out pdf conversion, independently jpg picture can also be merged in word document, when obtaining every terms of information and position thereof in word document respectively, the jpg picture that synchronous acquisition is chosen, when converting pdf file to, first according to the size of the parameter adjustment jpg picture preset, position and transparency, be embedded in pdf file and become digital watermarking, the word will obtained in word document again, mark, picture, formula, form data is inserted in pdf formatted file according to its position of recording accordingly, thus the fusion of jpg picture is become watermark shading in word document, as important mark.
In order to security can be promoted further, prevent the significant data in pdf file to be tampered and can also adopt the text, mark, picture, the formula that separately obtain from word document, and text in form, mark, picture, these important informations of formula are encrypted separately, form sightless digital watermark embedding in pdf file, during inspection pdf file, be decrypted by corresponding key pair information, and compare with the actual displayed in pdf, can know whether pdf file is tampered.
Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned embodiment, also comprises the technical scheme be made up of above technical characteristic combination in any.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.

Claims (6)

  1. The conversion method of 1.Pdf file, is characterized in that, comprises the steps:
    Separately obtain word in word document, picture, formula, form data, and the record wherein position of various information in the page, wherein Word message comprises text and text formatting, text and text formatting are extracted separately, first respectively mark is obtained when recording mark position, picture, formula, the coordinate position of upper left, form edge drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page, and take these to mark, picture, formula, the position at form edge is as separation, obtain mark respectively, picture, formula, text point before and after form, separately obtain word in form, mark, picture, formula, form data, and record the position of wherein various information, by word in word document, mark, picture, formula, form data is inserted in pdf formatted file according to its position of recording accordingly, wherein word is merged by the aforementioned text that separately obtains and text formatting and forms.
  2. 2. the conversion method of Pdf file according to claim 1, it is characterized in that: when obtaining every terms of information and position thereof in word document respectively, the jpg picture that synchronous acquisition is chosen, when converting pdf file to, first according to the size of the parameter adjustment jpg picture preset, position and transparency, be embedded in pdf file and become digital watermarking, then the word obtained in word document, mark, picture, formula, form data are inserted in pdf formatted file accordingly according to its position of recording.
  3. 3. the conversion method of Pdf file according to claim 1 and 2, it is characterized in that: to the text, mark, picture, the formula that separately obtain in word document, and text in form, mark, picture, formula info are encrypted separately, form sightless digital watermark embedding in pdf file.
  4. 4. the conversion method of Pdf file according to claim 1 and 2, is characterized in that: when obtaining the information position in form, acquisition be the positions of these information in overall page.
  5. 5. the conversion method of Pdf file according to claim 1 and 2, is characterized in that: when obtaining the information position in form, acquisition be the relative position of these information and matrix lattice.
  6. 6. the conversion method of Pdf file according to claim 1 and 2, is characterized in that: obtain the coordinate position of upper left drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page when obtaining position.
CN201510674740.8A 2015-10-19 2015-10-19 Pdf document conversion method Pending CN105335339A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510674740.8A CN105335339A (en) 2015-10-19 2015-10-19 Pdf document conversion method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510674740.8A CN105335339A (en) 2015-10-19 2015-10-19 Pdf document conversion method

Publications (1)

Publication Number Publication Date
CN105335339A true CN105335339A (en) 2016-02-17

Family

ID=55285884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510674740.8A Pending CN105335339A (en) 2015-10-19 2015-10-19 Pdf document conversion method

Country Status (1)

Country Link
CN (1) CN105335339A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239677A (en) * 2017-05-25 2017-10-10 广东省水利电力勘测设计研究院 A kind of result of design store method based on Pdf documents
TWI608374B (en) * 2016-05-23 2017-12-11 資通電腦股份有限公司 Invisible watermark applying method of digital document and verifying method for the invisible watermark
CN108984491A (en) * 2018-07-18 2018-12-11 沈文策 A kind of method and apparatus of document format conversion
CN109271613A (en) * 2018-09-25 2019-01-25 四川译讯信息科技有限公司 A kind of pdf document analytic method
CN109739981A (en) * 2018-12-17 2019-05-10 四川译讯信息科技有限公司 A kind of pdf document kind judging method and text extraction method
CN109829139A (en) * 2019-01-30 2019-05-31 中国软件与技术服务股份有限公司 The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format
CN110334585A (en) * 2019-05-22 2019-10-15 平安科技(深圳)有限公司 Table recognition method, apparatus, computer equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101853246A (en) * 2010-06-14 2010-10-06 深圳市万兴软件有限公司 Method and device for converting document format
CN102087692A (en) * 2009-12-02 2011-06-08 北大方正集团有限公司 Data replication prevention method and system for layout file
CN102663324A (en) * 2012-03-09 2012-09-12 北京神州数码思特奇信息技术股份有限公司 Method and device for electronic document anti-counterfeit
CN103488619A (en) * 2013-07-05 2014-01-01 百度在线网络技术(北京)有限公司 Method and device for processing document file

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102087692A (en) * 2009-12-02 2011-06-08 北大方正集团有限公司 Data replication prevention method and system for layout file
CN101853246A (en) * 2010-06-14 2010-10-06 深圳市万兴软件有限公司 Method and device for converting document format
CN102663324A (en) * 2012-03-09 2012-09-12 北京神州数码思特奇信息技术股份有限公司 Method and device for electronic document anti-counterfeit
CN103488619A (en) * 2013-07-05 2014-01-01 百度在线网络技术(北京)有限公司 Method and device for processing document file

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SIMONE MARINAI 等: "Table of Contents Recognition for Converting PDF Documents in E-book Formats", 《PROCEEDINGS OF THE 10TH ACM SYMPOSIUM ON DOCUMENT ENGINEERING》 *
扈小燕 等: "将Word文档自动转换成PDF格式的编程实现", 《计算机与现代化》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI608374B (en) * 2016-05-23 2017-12-11 資通電腦股份有限公司 Invisible watermark applying method of digital document and verifying method for the invisible watermark
CN107239677A (en) * 2017-05-25 2017-10-10 广东省水利电力勘测设计研究院 A kind of result of design store method based on Pdf documents
CN108984491A (en) * 2018-07-18 2018-12-11 沈文策 A kind of method and apparatus of document format conversion
CN109271613A (en) * 2018-09-25 2019-01-25 四川译讯信息科技有限公司 A kind of pdf document analytic method
CN109271613B (en) * 2018-09-25 2022-12-06 四川译讯信息科技有限公司 PDF file analysis method
CN109739981A (en) * 2018-12-17 2019-05-10 四川译讯信息科技有限公司 A kind of pdf document kind judging method and text extraction method
CN109739981B (en) * 2018-12-17 2020-12-29 四川译讯信息科技有限公司 PDF file type judgment method and character extraction method
CN109829139A (en) * 2019-01-30 2019-05-31 中国软件与技术服务股份有限公司 The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format
CN109829139B (en) * 2019-01-30 2023-04-18 中国软件与技术服务股份有限公司 Method and device for converting DOC/DOCX format streaming file into OFD format file
CN110334585A (en) * 2019-05-22 2019-10-15 平安科技(深圳)有限公司 Table recognition method, apparatus, computer equipment and storage medium
CN110334585B (en) * 2019-05-22 2023-10-24 平安科技(深圳)有限公司 Table identification method, apparatus, computer device and storage medium

Similar Documents

Publication Publication Date Title
CN105335339A (en) Pdf document conversion method
CN108415887B (en) Method for converting PDF file into OFD file
US7240209B2 (en) Methods of invisibly embedding and hiding data into soft-copy text documents
CN103577729B (en) Method for stamping electronic seal on PDF (portable document format) file
WO2009144796A1 (en) Electronic document processing system, method, and program
JP2010191944A (en) Creation and placement of two-dimensional barcode stamp on printed document for storing authentication information
CN102360413A (en) Steganographic method with misguiding function of controllable secret key sequence
CN102881034B (en) A kind of system and method inserting watermark in profile
CN102567938B (en) Watermark image blocking method and device for western language watermark processing
CN100367274C (en) Method for embedding and extracting watermark in English texts
CN105139334A (en) Multiline text watermark production device
CN107666550B (en) Image forming apparatus and document electronization method
CN100517299C (en) Typesetting method for implementing multiple alignment in word rows
CN106777061B (en) Information hiding system and method based on webpage text and image and extraction method
CN102142073A (en) System for preventing and identifying disclosure of paper documents based on hidden watermarks
CN103065101A (en) Anti-counterfeiting method for documents
CN103310130A (en) Text document digital watermark embedding and extracting method
CN115048665A (en) Excel file-based information hiding method, device, equipment and storage medium
JP5923981B2 (en) Image processing apparatus and image processing program
CN102609896A (en) Reversible watermark embedding and extracting method based on middle value keeping of histogram
JP4260076B2 (en) Document creation device, document verification device, document creation method, document verification method, document creation program, document verification program, recording medium storing document creation program, and recording medium storing document verification program
WO2010061456A1 (en) Information processing device, information processing method and image processing program
CN113296773B (en) Copyright labeling method and system for cascading style sheets
CN102609897A (en) Technology for implementing digital watermarking in digital image signals and vector track signals
CN101719246B (en) Method for authenticating and fast outputting soft-dog of text electronic seal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20160217