A kind of method of document format conversion and device
Technical field
The application relates to electronic document process field, particularly relates to method and the device of the conversion of a kind of document format.
Background technology
PDF document has become in many electronic format documents at present, most widely used one.In printing and the field of printing, PDF document occupy dominant position especially, and no matter be that front end typesetting or rear end export, PDF is the standard format that content of pages describes document.
On the other hand, Adobe is before the application of PDF document, the standard format that the content of pages defined describes document is PostScript, although PostScript has many deficiencies in page independent sum equipment related fields for PDF document, but the applicating history long due to PostScript and abundant periphery are supported, so there is the user of One's name is legion still to carry out printing and printing at use PostScript form.
Because PostScript format file is when processing, a series of trouble can be run into, so before output system does rasterization process before PostScript format file being submitted to print, needing first through standardization processing.Be PDF by PostScript format conversion.
In the standardization processing of PostScript formatted file, an important target is the original description type of conservation object as far as possible in the process of conversion.The text object in PDF is remained upon the transition as the text object in PostScript, Drawing Object in PostScript also remains the Drawing Object in PDF upon the transition, image object in PostScript also remains image object in PDF etc. upon the transition, original looks and the device independence of page-describing can be kept so to greatest extent, thus reproduce the design idea of front end user as far as possible truly.For text object, optimal situation remains text object upon the transition, and descriptive text presents the outline data of shape and control data is kept all completely, has not a particle of loss.
The conversion method to the word vector path obtained based on the charpath instruction in PostScript conventional at present, comprises the following two kinds:
(1) obtain the vector path of word based on the charpath instruction in PostScript, be described with filling or line pattern, as general vector path, carry out graphic plotting operation.Namely, directly the word vector path obtained based on PostScript is converted to corresponding PDF vector path.
(2) based on the Tr provided in PDF object (Textrenderingmode, Word-Drawing pattern) attribute, the vector path of the PostScript form obtained by charpath instruction is converted to corresponding PDF text object.
But in the process realizing technical scheme in the embodiment of the present application, the applicant finds that prior art at least exists following shortcoming:
(1) be the technical scheme directly civilian glyph of PostScript form being converted to corresponding PDF vector path due to what adopt in prior art one, so there is the technical matters of the control information that lost word;
(2) be the technical scheme directly civilian glyph of PostScript form being converted to corresponding PDF vector path due to what adopt in prior art one, so there is the technical matters that conversion is inaccurate, may there is ghost image.
(3) be Tr attribute based on providing in PDF object due to what adopt in prior art two, the vector path of the PostScript form obtained by charpath instruction is converted to the technical scheme of corresponding PDF text object, so there is the vector path obtained for non-charpath instruction, there is the technical matters of losing.
Summary of the invention
The invention provides method and the device of the conversion of a kind of document format, in order to solve in prior art, there is the technical matters of word control information loss or general vector path loss.
The present invention, by the embodiment in the application, provides following technical scheme:
On the one hand, the present invention, by an embodiment in the application, provides following technical scheme:
A kind of document format conversion method, it is pdf document that described method is used for PostScript file transform, and described method comprises:
Determine that the vector path of a PostScript formatted file is concentrated and include type path collection and figure path collection, wherein, described type path set pair should have the text object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
The parameter information that the text object of described PostScript form is relevant is recorded in the variable of described vector path collection;
Based on described parameter information, the text object of described PostScript form is converted to the text object of PDF;
The Drawing Object of described PostScript form is converted to the Drawing Object of PDF.
Further, the parameter information that the text object of described PostScript form is relevant, specifically comprises:
The font information of described text object, the character code of described text object, the position coordinates of described text object and the transformation matrix of described text object.
Further, the described vector path determining a PostScript formatted file also comprises before concentrating and including type path collection and figure path collection:
First font information is set for the font information that prestores;
First transformation matrix is set for the transformation matrix that prestores;
The first Word-Drawing pattern that arranges is the type mode that prestores.
Further, described based on described parameter information, the text object of described PostScript form is converted to the text object of PDF, specifically comprises:
The described font information that prestores is transformed to the font information of described text object by the first font information;
The described transformation matrix that prestores is transformed to the transformation matrix of described text object by the first transformation matrix;
Be the Word-Drawing pattern of described text object by the first Word-Drawing mode conversion by described Word-Drawing pattern;
Based on the drawing mode of the font information of described text object, the transformation matrix of described text object and described text object, the text object of described PostScript form is converted to the text object of PDF.
Further, the drawing mode of described text object, is specially:
Fill pattern or line pattern.
Further, described the described font information that prestores is transformed to the font information of described text object by the first font information before, also comprise:
Preserve described first font information;
Preserve described first transformation matrix;
Preserve described first Word-Drawing pattern.
Further, the described text object by described PostScript form also comprises after being converted to the text object of PDF:
The described font information that prestores is reverted to described first font information;
The described transformation matrix that prestores is reverted to described first transformation matrix;
Be described first Word-Drawing pattern by the described Word-Drawing pattern recovery that prestores.
On the other hand, the present invention provides following technical scheme by another embodiment in the application:
A kind of document format conversion equipment, it is pdf document that described device is used for PostScript file transform, and described device comprises:
Determining unit: include type path collection and figure path collection for determining that the vector path of a PostScript formatted file is concentrated, wherein, described type path set pair should have the text object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
Record cell: the parameter information that the text object for recording described PostScript form in the variable of described vector path collection is relevant;
Text conversion unit: for based on described parameter information, the text object of described PostScript form is converted to the text object of PDF;
Graphic Exchanging unit: for the Drawing Object of described PostScript form being converted to the Drawing Object of PDF.
Further, described device also comprises:
Set font unit: for determine a PostScript formatted file vector path concentrate include type path collection and figure path collection before, the first font information is set for the font information that prestores;
Matrix setting unit: for arranging the first transformation matrix for the transformation matrix that prestores before determining that the vector path of a PostScript formatted file is concentrated to include type path collection and figure path collection;
Draw setting unit: for determine a PostScript formatted file vector path concentrate include type path collection and figure path collection before, the first Word-Drawing pattern that arranges is the type mode that prestores.
Further, described text conversion unit, specifically comprises:
Font modular converter: for the described font information that prestores to be transformed to the font information of described text object by the first font information;
Matrix conversion module: for the described transformation matrix that prestores to be transformed to the transformation matrix of described text object by the first transformation matrix;
Draw modular converter: for by described Word-Drawing pattern by the first Word-Drawing mode conversion being the Word-Drawing pattern of described text object;
Modular converter: for the drawing mode based on the font information of described text object, the transformation matrix of described text object and described text object, the text object of described PostScript form is converted to the text object of PDF.
One or more technical schemes in technique scheme, have following technique effect or advantage:
(1) due to when word is converted to PDF by PostScript, have employed the technical scheme retaining word controling parameters, so reach the technique effect retaining word control information;
(2) due to when word is converted to PDF by PostScript, have employed the technical scheme retaining word controling parameters, so reach the technique effect eliminating ghost image;
(3) due to when PostScript is converted to PDF, have employed and adopt different schemes to carry out the technical scheme processed respectively text object and Drawing Object, so reach the technique effect saving controling parameters when realizing text conversion, figure path can not be lost again simultaneously.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of document format conversion method in the embodiment of the present application one;
Fig. 2 is each flow chart of steps be converted to by the text object of PostScript form in the embodiment of the present application one in the text object of PDF;
The process flow diagram that Fig. 3 specifically changes when being, in the embodiment of the present application one, the text object of PostScript form being converted to the text object of PDF;
Fig. 4 is the block scheme of document format conversion method in the embodiment of the present application two;
Fig. 5 is the block scheme of text conversion unit in application embodiment two;
Fig. 6 is the refinement block scheme of document format conversion apparatus in the embodiment of the present application two.
Embodiment
In order to make the application the technical staff in the technical field more clearly understand the application, below in conjunction with accompanying drawing, by specific embodiment, technical scheme is described in detail.
Please refer to Fig. 1 to Fig. 3, the embodiment of the present application one provides a kind of document format conversion method, and it is pdf document that described method is used for PostScript file transform, and as shown in Figure 1, described method comprises the steps:
S101: determine that the vector path of a PostScript formatted file is concentrated and include type path collection and figure path collection, wherein, described type path set pair should have the text object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
In specific implementation process, before carrying out document format conversion, in the device of document format conversion, comprise following information for carrying out:
Prestore font information: in the embodiment of the present application, and with the first font information for the font information that prestores, wherein, the first font information can be any font, such as regular script, the Song typeface, lishu etc., and the embodiment of the present application is not restricted this.
Prestore transformation matrix:
Prestore type mode: in the embodiment of the present application, with the first Word-Drawing pattern for the type mode that prestores, and the first Word-Drawing pattern can for filling or line pattern, if the first Word-Drawing pattern is fill pattern, the Tr value of its correspondence is set to 0, if the first Word-Drawing pattern is line pattern, the Tr value of its correspondence is set to 1.Certainly, in specific implementation process, for arranging which kind of Word-Drawing pattern, the application is not restricted, and when being pattern for Word-Drawing pattern, being set to 0 or 1 and not also being restricted.
In specific implementation process, can based on the difference of type path collection and figure path collection, to the type path in described path mark, corresponding parameter is set, such as if its value is set to 0 by type path, if its value is set to 1 by figure path, certainly this parameter value is not restricted, distinguishes as long as can make type path and figure path.In addition, in the embodiment of the present application, if obtain charpath instruction, namely represent that current path is type path, nature is for judging that this path is type path in which way, and the application is not also restricted.
S102: record the parameter information that the text object of described PostScript form is relevant in the variable of described vector path collection;
After judgement path is type path, record the correlation parameter of this type path, the font information of such as type path, position coordinates, character code and transformation matrix etc., then follow-up carry out PDF conversion time, namely can retain these correlation parameters.Certainly, for the correlation parameter of the type path of record, several parameters are above not limited to.
S103: based on described parameter information, is converted to the text object of PDF by the text object of described PostScript form;
In specific implementation process, the text object of PostScript form is converted to the text object of PDF, as shown in Figure 2, also specifically comprises following process:
S201: preserve the parameter that prestores.
In specific implementation process, here prestore parameter time carry out document format conversion before, for carrying out in the device of document format conversion, the following information comprised:
Prestore font information;
Prestore transformation matrix;
Prestore type mode.
S202: conversion parameter is set.
The correlation parameter of the mainly current word to be converted of conversion parameter here, carries out the correlation parameter that the optimum configurations that prestores in document subject feature vector device is mainly word to be converted by optimum configurations.Concrete comprises:
The described font information that prestores is transformed to the font information of described text object by the first font information;
If the font that such as prestores is the Song typeface, and the font of word to be converted is regular script, so the font information prestored is converted to regular script by the Song typeface;
The described transformation matrix that prestores is transformed to the transformation matrix of described text object by the first transformation matrix;
Be the Word-Drawing pattern of described text object by the first Word-Drawing mode conversion by described Word-Drawing pattern.
If the Word-Drawing pattern such as prestored is line pattern, and the Word-Drawing pattern of word to be converted is fill pattern, so the Word-Drawing pattern prestored is converted to line pattern by fill pattern.
S203: format conversion.
In specific implementation process, just the text object of PostScript form is converted to the text object of PDF, as shown in Figure 3, comprises again following process:
S301: according to the character code recorded in current character path and position coordinates, is converted to text object by type path to be converted;
S302: according to current character drawing mode for filling type or line type, the Word-Drawing pattern Tr of above-mentioned gained text object is set to 0 or 1.
S204: recover to prestore parameter.
Namely the font information that prestores above is reverted to the Song typeface again by regular script;
Be fill pattern by the drawing mode that prestores above by pattern recovery of ruling.
S104: the Drawing Object Drawing Object of described PostScript form being converted to PDF.
In specific implementation process, for other routing informations of non-legible object, be then directly converted to graph of a correspondence object.
Please refer to Fig. 4 to Fig. 5, the embodiment of the present application two provides a kind of document format conversion equipment, and it is pdf document that described device is used for PostScript file transform, and as shown in Figure 4, described device comprises:
Determining unit 401: include type path collection and figure path collection for determining that the vector path of a PostScript formatted file is concentrated, wherein, described type path set pair should have the text object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
Record cell 402: the parameter information that the text object for recording described PostScript form in the variable of described vector path collection is relevant;
Text conversion unit 403: for based on described parameter information, the text object of described PostScript form is converted to the text object of PDF;
Further, described text conversion unit, as shown in Figure 5, specifically comprises:
Font modular converter 501: for the described font information that prestores to be transformed to the font information of described text object by the first font information;
Matrix conversion module 502: for the described transformation matrix that prestores to be transformed to the transformation matrix of described text object by the first transformation matrix;
Draw modular converter 503: for by described Word-Drawing pattern by the first Word-Drawing mode conversion being the Word-Drawing pattern of described text object;
Modular converter 504: for the drawing mode based on the font information of described text object, the transformation matrix of described text object and described text object, the text object of described PostScript form is converted to the text object of PDF.
Graphic Exchanging unit 404: for the Drawing Object of described PostScript form being converted to the Drawing Object of PDF.
Further, as shown in Figure 6, described device also comprises:
Set font unit 405: for determine a PostScript formatted file vector path concentrate include type path collection and figure path collection before, the first font information is set for the font information that prestores;
Matrix setting unit 406: for arranging the first transformation matrix for the transformation matrix that prestores before determining that the vector path of a PostScript formatted file is concentrated to include type path collection and figure path collection;
Draw setting unit 407: for determine a PostScript formatted file vector path concentrate include type path collection and figure path collection before, the first Word-Drawing pattern that arranges is the type mode that prestores.
Due to the application implement device in two for the device corresponding to the method implemented in the embodiment of the present application one, so based on the method in the embodiment of the present application one, those skilled in the art can understand the application and implement the various versions that the specific implementation method of the device in two and the application implement the device of two.So introduce no longer in detail in this operation for this device, as long as the device that those skilled in the art adopt based on the method in the embodiment of the present application one, all belong to the application for the scope of protection.
One or more technical schemes in technique scheme, have following technique effect or advantage:
(1) due to when word is converted to PDF by PostScript, have employed the technical scheme retaining word controling parameters, so reach the technique effect retaining word control information;
(2) due to when word is converted to PDF by PostScript, have employed the technical scheme retaining word controling parameters, so reach the technique effect eliminating ghost image;
(3) due to when PostScript is converted to PDF, have employed and adopt different schemes to carry out the technical scheme processed respectively text object and Drawing Object, so reach the technique effect saving controling parameters when realizing text conversion, figure path can not be lost again simultaneously.
Although described the preferred embodiment of the application, those skilled in the art once obtain the basic creative concept of cicada, then can make other change and amendment to these embodiments.So claims are intended to be interpreted as comprising preferred embodiment and falling into all changes and the amendment of the application's scope.
Obviously, those skilled in the art can carry out various change and modification to the application and not depart from the spirit and scope of the application.Like this, if these amendments of the application and modification belong within the scope of the application's claim and equivalent technologies thereof, then the application is also intended to comprise these change and modification.