CN105335339A - Pdf document conversion method - Google Patents
Pdf document conversion method Download PDFInfo
- Publication number
- CN105335339A CN105335339A CN201510674740.8A CN201510674740A CN105335339A CN 105335339 A CN105335339 A CN 105335339A CN 201510674740 A CN201510674740 A CN 201510674740A CN 105335339 A CN105335339 A CN 105335339A
- Authority
- CN
- China
- Prior art keywords
- picture
- word
- formula
- mark
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/103—Formatting, i.e. changing of presentation of documents
- G06F40/117—Tagging; Marking up; Designating a block; Setting of attributes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Editing Of Facsimile Originals (AREA)
Abstract
The invention discloses a pdf document conversion method, which can be used for converting a word text with a great amount of rich information into a pdf document identical with the original document, wherein the style, the size and the position of characters, marks, pictures and tablets respectively maintain the completely identical states with the original document. In addition, jpg pictures can be synchronously fused with the word document to form a pdf document with the picture watermark. Meanwhile, even when the document is tampered, important information in the document can also be ensured not to be lost; higher safety performance is realized.
Description
Technical field
The invention belongs to electronic information technical field, be specifically related to a kind of conversion method of pdf file.
Background technology
In current PC World, two kinds of document modes that utilization rate is the highest are the Doc form of MicrosoftWord and the Pdf formatted file of AdobeAcrobat respectively.Due to the infiltration of Microsoft, the form of our present used most manuscript or report is all Doc, and the file of Pdf form is also widely used.Pdf form is an open standard, the original form of energy document retaining, convenient, the safety when transmission over networks especially crosses over platform.Therefore usually word text-converted is become pdf file when carrying out high security file transfer.At present, there is pdf crossover tool miscellaneous, but in use we find, when incessantly possessing word in word text, when also having picture, formula, form, the frequent and former word document of the pdf form after conversion is distinguished to some extent, even occurs misplacing and mess code, even during pure words conversion, also have inconsistent phenomenon and occur.
Summary of the invention
For solving the problem, the invention discloses a kind of conversion method of pdf file.
In order to achieve the above object, the invention provides following technical scheme:
The conversion method of Pdf file, comprises the steps:
Separately obtain word in word document, picture, formula, form data, and the record wherein position of various information in the page, wherein Word message comprises text and text formatting, text and text formatting are extracted separately, first respectively mark is obtained when recording mark position, picture, formula, the coordinate position of upper left, form edge drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page, and take these to mark, picture, formula, the position at form edge is as separation, obtain mark respectively, picture, formula, text point before and after form, separately obtain word in form, mark, picture, formula, form data, and record the position of wherein various information, by word in word document, mark, picture, formula, form data is inserted in pdf formatted file according to its position of recording accordingly, wherein word is merged by the aforementioned text that separately obtains and text formatting and forms.
Further, when obtaining every terms of information and position thereof in word document respectively, the jpg picture that synchronous acquisition is chosen, when converting pdf file to, first according to the size of the parameter adjustment jpg picture preset, position and transparency, be embedded in pdf file and become digital watermarking, then the word obtained in word document, mark, picture, formula, form data are inserted in pdf formatted file accordingly according to its position of recording.
Further, to the text, mark, picture, the formula that separately obtain in word document, and text in form, mark, picture, formula info are encrypted separately, form sightless digital watermark embedding in pdf file.
Further, when obtaining the information position in form, acquisition be the positions of these information in overall page.
Further, when obtaining the information position in form, acquisition be the relative position of these information and matrix lattice.
Further, the coordinate position of upper left drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page is obtained when obtaining position.
Beneficial effect:
The conversion method of pdf file provided by the invention, the word text-converted including a large amount of abundant information can be become the pdf file consistent with original text shelves, the style of word wherein, mark, picture, form, size, position all keep identical with original text shelves.In addition, synchronously jpg picture and word document can also be merged, form the pdf file with picture watermark.Meanwhile, even if when file is subject to distorting, also can ensures that important information wherein can not be lost, there is high security performance.
Embodiment
Below with reference to specific embodiment, technical scheme provided by the invention is described in detail, following embodiment should be understood and be only not used in for illustration of the present invention and limit the scope of the invention.
This example provides a kind of conversion method of Pdf file, comprise the steps:
First, separately to obtain in word document the various information such as word, mark, picture, formula, form, and the record wherein position of various information in the page.Because Word message comprises text and text formatting, text is here the plain text data (comprising punctuation mark) not being accompanied with the format informations such as font, size, color, and text formatting then comprises the information such as font, size, color of word.Text data and Document type data are separately obtained.In a document, mark, picture, formula, form are mixed in the middle of word usually, first obtain respectively these marks, picture, formula, the coordinate position of upper left, form edge drift angle pixel in the page (can with the most lower left corner of page edge for initial point sets up two-dimensional coordinate system) and the most coordinate position of bottom right drift angle pixel in the page when recording mark position, mark, picture, formula, the form integral position in the page can be obtained like this.And take the position at these marks, picture, formula, form edge as separation, obtain the text point before and after mark, picture, formula, form respectively, when obtaining the position of above-mentioned word in the former word page, preferably segmented acquisition, get the coordinate position of each section of word most upper left drift angle pixel in the page and the most coordinate position point of bottom right drift angle pixel in the page, the integral position of this section of word in the page can be obtained.Even can obtain in lines, get the coordinate position of each style of writing word most upper left drift angle pixel in the page and the most coordinate position point of bottom right drift angle pixel in the page, obtain the integral position of this style of writing word in the page, this can make after pdf in word dislocation and former word position more identical.
Need to further illustrate, owing to usually containing word, picture in form, marking even form, these information therefore in form and position also should means same as described above be carried out separately obtaining.In above-mentioned paragraph, the acquisition of information of form is in fact the acquisition of form line information.When obtaining form data, should separately obtain the various information such as word in form, mark, picture, formula, form, and the record wherein position of various information in overall page.More preferably, obtain the relative position of word, mark, picture, formula, form and matrix lattice in form, information when can guarantee like this to change in form there will not be skew and distortion.
In addition, when the formula in original word be generated by formula editors time, formula letter wherein, symbol should obtain respectively and record their positions in the page, also the method for above-mentioned acquisition upper left, bottom right pixel point position can be adopted during record position, certainly, obtaining upper left, bottom right pixel point is not unique method, can adopt upper right, bottom left pixel point yet, even other modes such as corner pixel.Choose upper left, bottom right pixel point more meets conventional document in whole form of wording.
After obtaining above-mentioned information, insert in pdf formatted file by these words, mark, picture, formula, form data accordingly according to its position of recording, wherein word is merged by the aforementioned text that separately obtains and text formatting and forms.
As improvement of the present invention, when carrying out pdf conversion, independently jpg picture can also be merged in word document, when obtaining every terms of information and position thereof in word document respectively, the jpg picture that synchronous acquisition is chosen, when converting pdf file to, first according to the size of the parameter adjustment jpg picture preset, position and transparency, be embedded in pdf file and become digital watermarking, the word will obtained in word document again, mark, picture, formula, form data is inserted in pdf formatted file according to its position of recording accordingly, thus the fusion of jpg picture is become watermark shading in word document, as important mark.
In order to security can be promoted further, prevent the significant data in pdf file to be tampered and can also adopt the text, mark, picture, the formula that separately obtain from word document, and text in form, mark, picture, these important informations of formula are encrypted separately, form sightless digital watermark embedding in pdf file, during inspection pdf file, be decrypted by corresponding key pair information, and compare with the actual displayed in pdf, can know whether pdf file is tampered.
Technological means disclosed in the present invention program is not limited only to the technological means disclosed in above-mentioned embodiment, also comprises the technical scheme be made up of above technical characteristic combination in any.It should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention, can also make some improvements and modifications, these improvements and modifications are also considered as protection scope of the present invention.
Claims (6)
- The conversion method of 1.Pdf file, is characterized in that, comprises the steps:Separately obtain word in word document, picture, formula, form data, and the record wherein position of various information in the page, wherein Word message comprises text and text formatting, text and text formatting are extracted separately, first respectively mark is obtained when recording mark position, picture, formula, the coordinate position of upper left, form edge drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page, and take these to mark, picture, formula, the position at form edge is as separation, obtain mark respectively, picture, formula, text point before and after form, separately obtain word in form, mark, picture, formula, form data, and record the position of wherein various information, by word in word document, mark, picture, formula, form data is inserted in pdf formatted file according to its position of recording accordingly, wherein word is merged by the aforementioned text that separately obtains and text formatting and forms.
- 2. the conversion method of Pdf file according to claim 1, it is characterized in that: when obtaining every terms of information and position thereof in word document respectively, the jpg picture that synchronous acquisition is chosen, when converting pdf file to, first according to the size of the parameter adjustment jpg picture preset, position and transparency, be embedded in pdf file and become digital watermarking, then the word obtained in word document, mark, picture, formula, form data are inserted in pdf formatted file accordingly according to its position of recording.
- 3. the conversion method of Pdf file according to claim 1 and 2, it is characterized in that: to the text, mark, picture, the formula that separately obtain in word document, and text in form, mark, picture, formula info are encrypted separately, form sightless digital watermark embedding in pdf file.
- 4. the conversion method of Pdf file according to claim 1 and 2, is characterized in that: when obtaining the information position in form, acquisition be the positions of these information in overall page.
- 5. the conversion method of Pdf file according to claim 1 and 2, is characterized in that: when obtaining the information position in form, acquisition be the relative position of these information and matrix lattice.
- 6. the conversion method of Pdf file according to claim 1 and 2, is characterized in that: obtain the coordinate position of upper left drift angle pixel in the page and the most coordinate position of bottom right drift angle pixel in the page when obtaining position.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510674740.8A CN105335339A (en) | 2015-10-19 | 2015-10-19 | Pdf document conversion method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510674740.8A CN105335339A (en) | 2015-10-19 | 2015-10-19 | Pdf document conversion method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105335339A true CN105335339A (en) | 2016-02-17 |
Family
ID=55285884
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510674740.8A Pending CN105335339A (en) | 2015-10-19 | 2015-10-19 | Pdf document conversion method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105335339A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107239677A (en) * | 2017-05-25 | 2017-10-10 | 广东省水利电力勘测设计研究院 | A kind of result of design store method based on Pdf documents |
TWI608374B (en) * | 2016-05-23 | 2017-12-11 | 資通電腦股份有限公司 | Invisible watermark applying method of digital document and verifying method for the invisible watermark |
CN108984491A (en) * | 2018-07-18 | 2018-12-11 | 沈文策 | A kind of method and apparatus of document format conversion |
CN109271613A (en) * | 2018-09-25 | 2019-01-25 | 四川译讯信息科技有限公司 | A kind of pdf document analytic method |
CN109739981A (en) * | 2018-12-17 | 2019-05-10 | 四川译讯信息科技有限公司 | A kind of pdf document kind judging method and text extraction method |
CN109829139A (en) * | 2019-01-30 | 2019-05-31 | 中国软件与技术服务股份有限公司 | The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format |
CN110334585A (en) * | 2019-05-22 | 2019-10-15 | 平安科技(深圳)有限公司 | Table recognition method, apparatus, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101853246A (en) * | 2010-06-14 | 2010-10-06 | 深圳市万兴软件有限公司 | Method and device for converting document format |
CN102087692A (en) * | 2009-12-02 | 2011-06-08 | 北大方正集团有限公司 | Data replication prevention method and system for layout file |
CN102663324A (en) * | 2012-03-09 | 2012-09-12 | 北京神州数码思特奇信息技术股份有限公司 | Method and device for electronic document anti-counterfeit |
CN103488619A (en) * | 2013-07-05 | 2014-01-01 | 百度在线网络技术(北京)有限公司 | Method and device for processing document file |
-
2015
- 2015-10-19 CN CN201510674740.8A patent/CN105335339A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102087692A (en) * | 2009-12-02 | 2011-06-08 | 北大方正集团有限公司 | Data replication prevention method and system for layout file |
CN101853246A (en) * | 2010-06-14 | 2010-10-06 | 深圳市万兴软件有限公司 | Method and device for converting document format |
CN102663324A (en) * | 2012-03-09 | 2012-09-12 | 北京神州数码思特奇信息技术股份有限公司 | Method and device for electronic document anti-counterfeit |
CN103488619A (en) * | 2013-07-05 | 2014-01-01 | 百度在线网络技术(北京)有限公司 | Method and device for processing document file |
Non-Patent Citations (2)
Title |
---|
SIMONE MARINAI 等: "Table of Contents Recognition for Converting PDF Documents in E-book Formats", 《PROCEEDINGS OF THE 10TH ACM SYMPOSIUM ON DOCUMENT ENGINEERING》 * |
扈小燕 等: "将Word文档自动转换成PDF格式的编程实现", 《计算机与现代化》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI608374B (en) * | 2016-05-23 | 2017-12-11 | 資通電腦股份有限公司 | Invisible watermark applying method of digital document and verifying method for the invisible watermark |
CN107239677A (en) * | 2017-05-25 | 2017-10-10 | 广东省水利电力勘测设计研究院 | A kind of result of design store method based on Pdf documents |
CN108984491A (en) * | 2018-07-18 | 2018-12-11 | 沈文策 | A kind of method and apparatus of document format conversion |
CN109271613A (en) * | 2018-09-25 | 2019-01-25 | 四川译讯信息科技有限公司 | A kind of pdf document analytic method |
CN109271613B (en) * | 2018-09-25 | 2022-12-06 | 四川译讯信息科技有限公司 | PDF file analysis method |
CN109739981A (en) * | 2018-12-17 | 2019-05-10 | 四川译讯信息科技有限公司 | A kind of pdf document kind judging method and text extraction method |
CN109739981B (en) * | 2018-12-17 | 2020-12-29 | 四川译讯信息科技有限公司 | PDF file type judgment method and character extraction method |
CN109829139A (en) * | 2019-01-30 | 2019-05-31 | 中国软件与技术服务股份有限公司 | The method and apparatus that a kind of stream-oriented file of DOC/DOCX format is converted into the layout files of OFD format |
CN109829139B (en) * | 2019-01-30 | 2023-04-18 | 中国软件与技术服务股份有限公司 | Method and device for converting DOC/DOCX format streaming file into OFD format file |
CN110334585A (en) * | 2019-05-22 | 2019-10-15 | 平安科技(深圳)有限公司 | Table recognition method, apparatus, computer equipment and storage medium |
CN110334585B (en) * | 2019-05-22 | 2023-10-24 | 平安科技(深圳)有限公司 | Table identification method, apparatus, computer device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105335339A (en) | Pdf document conversion method | |
CN108415887B (en) | Method for converting PDF file into OFD file | |
US7240209B2 (en) | Methods of invisibly embedding and hiding data into soft-copy text documents | |
CN103577729B (en) | Method for stamping electronic seal on PDF (portable document format) file | |
WO2009144796A1 (en) | Electronic document processing system, method, and program | |
JP2010191944A (en) | Creation and placement of two-dimensional barcode stamp on printed document for storing authentication information | |
CN102360413A (en) | Steganographic method with misguiding function of controllable secret key sequence | |
CN102881034B (en) | A kind of system and method inserting watermark in profile | |
CN102567938B (en) | Watermark image blocking method and device for western language watermark processing | |
CN100367274C (en) | Method for embedding and extracting watermark in English texts | |
CN105139334A (en) | Multiline text watermark production device | |
CN107666550B (en) | Image forming apparatus and document electronization method | |
CN100517299C (en) | Typesetting method for implementing multiple alignment in word rows | |
CN106777061B (en) | Information hiding system and method based on webpage text and image and extraction method | |
CN102142073A (en) | System for preventing and identifying disclosure of paper documents based on hidden watermarks | |
CN103065101A (en) | Anti-counterfeiting method for documents | |
CN103310130A (en) | Text document digital watermark embedding and extracting method | |
CN115048665A (en) | Excel file-based information hiding method, device, equipment and storage medium | |
JP5923981B2 (en) | Image processing apparatus and image processing program | |
CN102609896A (en) | Reversible watermark embedding and extracting method based on middle value keeping of histogram | |
JP4260076B2 (en) | Document creation device, document verification device, document creation method, document verification method, document creation program, document verification program, recording medium storing document creation program, and recording medium storing document verification program | |
WO2010061456A1 (en) | Information processing device, information processing method and image processing program | |
CN113296773B (en) | Copyright labeling method and system for cascading style sheets | |
CN102609897A (en) | Technology for implementing digital watermarking in digital image signals and vector track signals | |
CN101719246B (en) | Method for authenticating and fast outputting soft-dog of text electronic seal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160217 |