A kind of method and device of document format conversion
Technical field
The application relates to the electronic document process field, relates in particular to a kind of method and device of document format conversion.
Background technology
The PDF document has become in many electronic format documents at present, and is most widely used a kind of.In printing and printing field, the PDF document occupy dominant position especially, no matter be that front end is set type or rear end output, PDF is the standard format that content of pages is described document.
On the other hand, Adobe is before the PDF document is used, the standard format that the content of pages of definition is described document is PostScript, although PostScript has many deficiencies in page independence and equipment related fields for the PDF document, but because the long applicating history and abundant periphery support of PostScript, so there is the user of One's name is legion still using the PostScript form to print and printing.
Because the PostScript format file when handling, can run into a series of troubles, so before output system is done rasterization process before the PostScript format file being submitted to seal, need earlier through standardization processing.Being about to the PostScript format conversion is PDF.
In the standardization processing of PostScript formatted file, important target is the original description type of conservation object as far as possible in the process of conversion.After conversion, remain literal object among the PDF as the literal object among the PostScript, Drawing Object among the PostScript also remains the Drawing Object among the PDF after conversion, image object among the PostScript also remains image object among the PDF etc. after conversion, original looks and the device independence that can keep page-describing so to greatest extent, thereby the design idea of end subscriber before reproducing as far as possible truly.For the literal object, optimal situation is to remain the literal object after conversion, and descriptive text presents the outline data of shape and control data and all kept completely, has not a particle of loss.
The present conversion method of the literal vector path that the instruction based on the charpath among the PostScript is obtained commonly used comprises following two kinds:
(1) obtain the vector path of literal based on the charpath among PostScript instruction, with fill or the line pattern described, as vector path carry out the graphic plotting operation like that.Just, directly will be converted to corresponding PDF vector path based on the literal vector path that PostScript obtains.
(2) based on the Tr that provides in the PDF object (Text rendering mode, literal drawing mode) attribute, the vector path of the PostScript form that will be obtained by charpath instruction is converted to corresponding PDF literal object.
Yet in the process of technical scheme, the applicant finds that there is following shortcoming at least in prior art in realizing the embodiment of the present application:
(1) because what adopt in prior art one is the technical scheme that directly profile of the literal of PostScript form is converted to corresponding PDF vector path, so exist the technical matters of the control information of having lost literal;
(2) because what adopt in prior art one is directly the profile of the literal of PostScript form to be converted to the technical scheme of corresponding PDF vector path, so exist the technical matters that conversion is inaccurate, may have ghost image.
(3) owing to the Tr attribute that provides in the PDF object that is based on that in prior art two, adopts, the vector path of the PostScript form that will be obtained by charpath instruction is converted to the technical scheme of corresponding PDF literal object, so exist the vector path that obtains for non-charpath instruction, have the technical matters of losing.
Summary of the invention
The invention provides a kind of method and device of document format conversion, in order to solve in the prior art, exist the technical matters that the literal control information is lost or common vector path is lost.
The present invention provides following technical scheme by the embodiment among the application:
On the one hand, the present invention provides following technical scheme by an embodiment among the application:
A kind of document format conversion method, it is pdf document that described method is used for the PostScript file conversion, described method comprises:
The vector path of determining a PostScript formatted file is concentrated and to be included type path collection and figure path collection, and wherein, described type path set pair should have the literal object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
The relevant parameter information of literal object of the described PostScript form of record in the variable of described vector path collection;
Based on described parameter information, the literal object of described PostScript form is converted to the literal object of PDF;
The Drawing Object of described PostScript form is converted to the Drawing Object of PDF.
Further, the parameter information that the literal object of described PostScript form is relevant specifically comprises:
The transformation matrix of the position coordinates of the character code of the font information of described literal object, described literal object, described literal object and described literal object.
Further, described concentrated the including before type path collection and the figure path collection of vector path of determining a PostScript formatted file, also comprise:
First font information is set is the font information that prestores;
First transformation matrix is set is the transformation matrix that prestores;
The first literal drawing mode is set is the type mode that prestores.
Further, described based on described parameter information, the literal object of described PostScript form is converted to the literal object of PDF, specifically comprise:
The described font information that prestores is transformed to the font information of described literal object by first font information;
The described transformation matrix that prestores is transformed to the transformation matrix of described literal object by first transformation matrix;
Described literal drawing mode is transformed to the literal drawing mode of described literal object by the first literal drawing mode;
Based on the transformation matrix of the font information of described literal object, described literal object and the drawing mode of described literal object, the literal object of described PostScript form is converted to the literal object of PDF.
Further, the drawing mode of described literal object is specially:
Fill pattern or line pattern.
Further, described the described font information that prestores is transformed to by first font information before the font information of described literal object, also comprises:
Preserve described first font information;
Preserve described first transformation matrix;
Preserve the described first literal drawing mode.
Further, described literal object with described PostScript form is converted to after the literal object of PDF, also comprises:
The described font information that prestores is reverted to described first font information;
The described transformation matrix that prestores is reverted to described first transformation matrix;
The described literal drawing mode that prestores is reverted to the described first literal drawing mode.
On the other hand, the present invention provides following technical scheme by another embodiment among the application:
A kind of document format conversion equipment, it is pdf document that described device is used for the PostScript file conversion, described device comprises:
Determining unit: include type path collection and figure path collection for the vector path of determining a PostScript formatted file is concentrated, wherein, described type path set pair should have the literal object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
Record cell: be used for the relevant parameter information of literal object at the described PostScript form of variable record of described vector path collection;
Text conversion unit: be used for based on described parameter information, the literal object of described PostScript form be converted to the literal object of PDF;
Figure converting unit: the Drawing Object that is used for the Drawing Object of described PostScript form is converted to PDF.
Further, described device also comprises:
Font arranges the unit: be used for the vector path of determining a PostScript formatted file concentrate include type path collection and figure path collection before, first font information is set is the font information that prestores;
Arranged in matrix unit: be used for concentrating in the vector path of determining a PostScript formatted file first transformation matrix being set before including type path collection and figure path collection and being the transformation matrix that prestores;
Drafting arranges the unit: be used for the vector path of determining a PostScript formatted file concentrate include type path collection and figure path collection before, the first literal drawing mode is set is the type mode that prestores.
Further, described text conversion unit specifically comprises:
Font modular converter: the font information that is used for the described font information that prestores is transformed to by first font information described literal object;
Matrix conversion module: the transformation matrix that is used for the described transformation matrix that prestores is transformed to by first transformation matrix described literal object;
Draw modular converter: the literal drawing mode that is used for described literal drawing mode is transformed to by the first literal drawing mode described literal object;
Modular converter: be used for based on the transformation matrix of the font information of described literal object, described literal object and the drawing mode of described literal object, the literal object of described PostScript form be converted to the literal object of PDF.
One or more technical schemes in the technique scheme have following technique effect or advantage:
(1) owing to when literal is converted to PDF by PostScript, adopted the technical scheme of reservation literal control parameter, so reached the technique effect that keeps the literal control information;
(2) owing to when literal is converted to PDF by PostScript, adopted the technical scheme of reservation literal control parameter, so reached the technique effect of eliminating ghost image;
(3) because when PostScript is converted to PDF, adopted the technical scheme that adopts different schemes to handle respectively literal object and Drawing Object, when realizing text conversion, preserved the control parameter, can not lose the technique effect in figure path simultaneously again so reached.
Description of drawings
Fig. 1 is the process flow diagram of document format conversion method in the embodiment of the present application one;
Fig. 2 is each flow chart of steps in the literal object that the literal object of PostScript form is converted in the embodiment of the present application one PDF;
Fig. 3 concrete transformation flow figure when the literal object of PostScript form being converted in the embodiment of the present application one the literal object of PDF;
Fig. 4 is the block scheme of document format conversion method in the embodiment of the present application two;
Fig. 5 is the block scheme of literal converting unit among the application embodiment two;
Fig. 6 is the refinement block scheme of document format conversion apparatus in the embodiment of the present application two.
Embodiment
In order to make the application the technical staff in the technical field more be expressly understood the application, below in conjunction with accompanying drawing, by specific embodiment the present techniques scheme is described in detail.
Please refer to Fig. 1 to Fig. 3, the embodiment of the present application one provides a kind of document format conversion method, and it is pdf document that described method is used for the PostScript file conversion, and as shown in Figure 1, described method comprises the steps:
S101: the vector path of determining a PostScript formatted file is concentrated and to be included type path collection and figure path collection, and wherein, described type path set pair should have the literal object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
In specific implementation process, before carrying out the document format conversion, the device that is used for carrying out the document format conversion has comprised following information:
Font information prestores: in the embodiment of the present application, be the font information that prestores with first font information, wherein, first font information can be the arbitrary word body, and such as regular script, the Song typeface, lishu etc., the embodiment of the present application is not restricted this.
Transformation matrix prestores:
Type mode prestores: in the embodiment of the present application, be the type mode that prestores with the first literal drawing mode, and the first literal drawing mode can be for filling or the line pattern, if the first literal drawing mode is fill pattern, its corresponding Tr value is set to 0, if the first literal drawing mode is the line pattern, its corresponding Tr value is set to 1.Certainly in specific implementation process, for which kind of literal drawing mode is set, the application is not restricted, and when being pattern for the literal drawing mode, being set to 0 or 1 and not being restricted yet.
In specific implementation process, can be based on type path collection and figure path collection different, corresponding parameter is set for the type path sign in described path, such as if its value of type path is set to 0, if its value of figure path is set to 1, certainly be not restricted for this parameter value, as long as can make differentiation to type path and figure path.In addition, in the embodiment of the present application, if obtain the charpath instruction, represent that namely current path is type path, nature is for judging that in which way this path is type path, and the application is not restricted yet.
S102: the relevant parameter information of literal object of the described PostScript form of record in the variable of described vector path collection;
After judging that a path is type path, record the correlation parameter in this literal path, such as font information, position coordinates, character code and the transformation matrix etc. of type path, when carrying out the PDF conversion, namely can keep these correlation parameters follow-up then.Certainly, the correlation parameter for the type path that records is not limited to top several parameters.
S103: based on described parameter information, the literal object of described PostScript form is converted to the literal object of PDF;
In specific implementation process, the literal object of PostScript form is converted to the literal object of PDF, as shown in Figure 2, also specifically comprise following process:
S201: preserve the parameter that prestores.
In specific implementation process, before prestoring here carried out the document format conversion during parameter, the device that is used for carrying out the document format conversion, already contained following information:
Font information prestores;
Transformation matrix prestores;
Type mode prestores.
S202: conversion parameter is set.
The conversion parameter here mainly is the correlation parameter of current literal to be converted, and carrying out that parameter arranges mainly is the correlation parameter that the parameter that prestores in the document conversion equipment is set to literal to be converted.Concrete comprises:
The described font information that prestores is transformed to the font information of described literal object by first font information;
If be the Song typeface such as the font that prestores, and the font of literal to be converted is regular script, so the font information that prestores is converted to regular script by the Song typeface;
The described transformation matrix that prestores is transformed to the transformation matrix of described literal object by first transformation matrix;
Described literal drawing mode is transformed to the literal drawing mode of described literal object by the first literal drawing mode.
If be the line pattern such as the literal drawing mode that prestores, and the literal drawing mode of literal to be converted is fill pattern, so the literal drawing mode that prestores is converted to the line pattern by fill pattern.
S203: format conversion.
In specific implementation process, just the literal object of PostScript form is converted to the literal object of PDF, as shown in Figure 3, comprises following process again:
S301: according to the character code that records in the current type path and position coordinates, type path to be converted is converted to the literal object;
S302: for filling type or line type, the literal drawing mode Tr of above-mentioned gained literal object is set to 0 or 1 according to current literal drawing mode.
S204: recover to prestore parameter.
Just the font information that prestores with the front reverts to the Song typeface again by regular script;
The drawing mode that prestores of front is reverted to fill pattern by the line pattern.
S104: the Drawing Object that the Drawing Object of described PostScript form is converted to PDF.
In specific implementation process, for other routing informations of non-legible object, then directly be converted to the graph of a correspondence object and get final product.
Please refer to Fig. 4 to Fig. 5, the embodiment of the present application two provides a kind of document format conversion equipment, and it is pdf document that described device is used for the PostScript file conversion, and as shown in Figure 4, described device comprises:
Determining unit 401: include type path collection and figure path collection for the vector path of determining a PostScript formatted file is concentrated, wherein, described type path set pair should have the literal object of PostScript form; Described figure path set pair should have the Drawing Object of PostScript form;
Record cell 402: be used for the relevant parameter information of literal object at the described PostScript form of variable record of described vector path collection;
Text conversion unit 403: be used for based on described parameter information, the literal object of described PostScript form be converted to the literal object of PDF;
Further, described text conversion unit as shown in Figure 5, specifically comprises:
Font modular converter 501: the font information that is used for the described font information that prestores is transformed to by first font information described literal object;
Matrix conversion module 502: the transformation matrix that is used for the described transformation matrix that prestores is transformed to by first transformation matrix described literal object;
Draw modular converter 503: the literal drawing mode that is used for described literal drawing mode is transformed to by the first literal drawing mode described literal object;
Modular converter 504: be used for based on the transformation matrix of the font information of described literal object, described literal object and the drawing mode of described literal object, the literal object of described PostScript form be converted to the literal object of PDF.
Figure converting unit 404: the Drawing Object that is used for the Drawing Object of described PostScript form is converted to PDF.
Further, as shown in Figure 6, described device also comprises:
Font arranges unit 405: be used for the vector path of determining a PostScript formatted file concentrate include type path collection and figure path collection before, first font information is set is the font information that prestores;
Arranged in matrix unit 406: be used for concentrating in the vector path of determining a PostScript formatted file first transformation matrix being set before including type path collection and figure path collection and being the transformation matrix that prestores;
Drafting arranges unit 407: be used for the vector path of determining a PostScript formatted file concentrate include type path collection and figure path collection before, the first literal drawing mode is set is the type mode that prestores.
Because the device that the application implements in two is and the corresponding device of implementing in the embodiment of the present application one of method, so based on the method in the embodiment of the present application one, those skilled in the art can understand the application and implement the various versions that the specific implementation method of the device in two and the application implement two device.So introduce no longer in detail in this operation for this device, as long as the device that those skilled in the art adopt based on the method in the embodiment of the present application one all belongs to the scope that the application institute desire is protected.
One or more technical schemes in the technique scheme have following technique effect or advantage:
(1) owing to when literal is converted to PDF by PostScript, adopted the technical scheme of reservation literal control parameter, so reached the technique effect that keeps the literal control information;
(2) owing to when literal is converted to PDF by PostScript, adopted the technical scheme of reservation literal control parameter, so reached the technique effect of eliminating ghost image;
(3) because when PostScript is converted to PDF, adopted the technical scheme that adopts different schemes to handle respectively literal object and Drawing Object, when realizing text conversion, preserved the control parameter, can not lose the technique effect in figure path simultaneously again so reached.
Although described the application's preferred embodiment, in a single day those skilled in the art get the basic creative concept of cicada, then can make other change and modification to these embodiment.So claims are intended to all changes and the modification that are interpreted as comprising preferred embodiment and fall into the application's scope.
Obviously, those skilled in the art can carry out various changes and modification and the spirit and scope that do not break away from the application to the application.Like this, if these of the application are revised and modification belongs within the scope of the application's claim and equivalent technologies thereof, then the application also is intended to comprise these changes and modification interior.