Summary of the invention
The object of the invention is to, provide a kind of XML format file to be converted to the system and method for Word format file, can support the composing style of Word form, can avoid calling continually the Com interface of MS-Word and cause taking more system resource, but stability and high efficiency is converted to the Word format file with the XML format file in bulk.
Technical scheme of the present invention is as follows:
A kind of XML format file is converted to the method for Word format file, wherein, may further comprise the steps:
Read the fundamental element information in the XML format file;
According to the composing style of Word format file to the fundamental element information that the reads processing of setting type;
With the Information generation rich text format document of setting type after processing;
Call that the Com interface of MS-Word is disposable to be converted to the Word format file with the rich text format document, comprising: the CoInitialize interface of calling system is realized the Com environment of MS-Word; Call CreateInstance initialization application object ApplicationPtr; Simultaneously, call the put_Visible application object and be set to the backstage translative mode; Call the DocumentsPtr object that get_Documents obtains representing the Word document set; Call the open interface of DocumentsPtr and open the RTF intermediate file on the backstage.
Described XML format file is converted to the method for Word format file, wherein, in the step that reads XML format file fundamental element information, specifically may further comprise the steps:
Reading the storehouse by the XML in the XML file read module reads fundamental element;
Type according to fundamental element is divided into groups to element, and the element of each group is rearranged according to hierarchical sequence;
Element information after rearranging is input in the intermediate data structure.
Described XML format file is converted to the method for Word format file, wherein, in the step of carrying out Word format file style composing processing, is the fundamental element of text for type, may further comprise the steps:
A plurality of text block are carried out transverse cuts;
Whether have two text block on the same vertical direction in judging between longitudinal region;
In no situation, the text block merging is embarked on journey.
Described XML format file is converted to the method for Word format file, wherein, in the step of carrying out Word format file style composing processing, is the fundamental element of text for type, may further comprise the steps:
A plurality of line of text are vertically cut;
Whether have two line of text on the same level direction in judging between transverse region;
In no situation, line of text is merged into a text chunk.
Described XML format file is converted to the method for Word format file, wherein, in the step of carrying out Word format file style composing processing, is the fundamental element of pel and image for type, may further comprise the steps:
If the zone of pel or image is positioned at the zone of a text chunk, then with this pel or the image background information as text section;
If the zone of this pel or image has surpassed the zone of a text block, then with this pel or the image background information as full page.
Described XML format file is converted to the method for Word format file, and is wherein, in the step that generates the rich text format document, further comprising the steps of:
Text of every generation or pel want all first whether the inquiry color exists in color table, if exist then extract the index value of color, and if there is no newly-built color object and extract the index value of color in color table then.
Described XML format file is converted to the method for Word format file, and is wherein, in the step that generates the rich text format document, further comprising the steps of:
Whether the font that text object of every generation all will be inquired about the text exists in table of type, if exist then the index value of this font in table of type inputted as input value, if there is no newly-built this font object and the index value of newly-built font object inputted as input value in table of type then.
Described XML format file is converted to the method for Word format file, and is wherein, in the step that generates the rich text format document, further comprising the steps of:
The spacing of every row is the value of the ordinate of the ordinate of the downside of the one's own profession downside that deducts the lastrow text.
A kind of XML format file is converted to the system of Word format file, comprises XML file read module, Word style type-setting module, RTF file generating module and the Word file generating module of successively data connection, wherein:
XML file read module is for the fundamental element information that reads the XML format file;
Word style type-setting module is used for according to the composing style of Word format file the fundamental element information that the reads processing of setting type;
The RTF file generating module is for the Information generation rich text format document that will set type after processing;
The Word file generating module is converted to the Word format file for the Com interface that calls MS-Word is disposable with the rich text format document, and comprising: the CoInitialize interface of calling system is realized the Com environment of MS-Word; Call CreateInstance initialization application object ApplicationPtr; Simultaneously, call the put_Visible application object and be set to the backstage translative mode; Call the DocumentsPtr object that get_Documents obtains representing the Word document set; Call the open interface of DocumentsPtr and open the RTF intermediate file on the backstage.
Described XML format file is converted to the system of Word format file, and Word style type-setting module comprises capable merge cells, section merge cells and pel and the image merge cells that mutual data connect, wherein:
The row merge cells is used for a plurality of text block of transverse cuts, and the text block merging is embarked on journey;
The section merge cells is used for the vertically a plurality of line of text of cutting, and line of text is merged into a text chunk;
Pel and image merge cells are used for pel or the image background information as text chunk or full page.
A kind of XML format file provided by the present invention is converted to the system and method for Word format file, owing to adopted the intermediate file of rich text format document as conversion, in the process of conversion, utilize the rich text format document to carry out transition, all elements and the complicated composing style of MS-Word have not only been supported, Com calls but also evaded frequently, reduced the resource that too much takies, alleviated the load of equipment, also improved efficient and stability that the rich text format document generates, used when being fit to Batch conversion.
Embodiment
Below with reference to accompanying drawing, the specific embodiment of the present invention and embodiment are described in detail, described specific embodiment is not for limiting the specific embodiment of the present invention only in order to explain the present invention.
A kind of XML format file of the present invention is converted to the method for Word format file, and one of its embodiment as shown in Figure 1, may further comprise the steps:
Step S100, read the fundamental element information in the XML format file;
Step S200, according to the composing style of Word format file to the fundamental element information that the reads processing of setting type;
Information generation RTF (Rich Text Format, rich text form) format file after step S300, the processing of will setting type;
Step S400, the Com interface that calls MS-Word are converted to the Word format file with the rich text format document;
Step S500, judge whether to change next XML format file; Be then to return step S100, otherwise finish the step of conversion.
Based on above-mentioned conversion method, the invention allows for the system that a kind of XML format file is converted to the Word format file, as shown in Figure 4, at least the XML file read module 100, Word style type-setting module 200, RTF file generating module 300 and the Word file generating module 400 that comprise successively data connection, wherein:
XML file read module 100 is for the fundamental element information that reads the XML format file;
Word style type-setting module 200 is used for according to the composing style of Word format file the fundamental element information that the reads processing of setting type;
RTF file generating module 300 is for the Information generation rich text format document that will set type after processing;
Word file generating module 400 is converted to the Word format file for the Com interface that calls MS-Word with the rich text format document.
A kind of XML format file provided by the present invention is converted to the system and method for Word format file, owing to adopted the intermediate file of rich text format document as conversion, in the process of conversion, utilize the rich text format document to carry out transition, all elements and the complicated composing style of MS-Word have not only been supported, Com calls but also evaded frequently, reduced the resource that too much takies, alleviated the load of equipment, also improved efficient and stability that the rich text format document generates, used when being fit to Batch conversion.
Be converted in the preferred implementation of system and method for Word format file at XML format file of the present invention:
1, about step S100 and XML file read module 100:
In step S100, XML file read module 100 reads required information from XML file to be converted, and the information that herein reads is the physical message of element, comprise element big or small position, document number of pages and the information such as whether encrypt; XML file read module 100 comprises that the XML of successively data connection reads storehouse, fundamental element grouping arrangement units and element information input block.
Concrete, in step S100, at first read the storehouse by the XML in the XML file read module 100, fundamental element to be read, the type of fundamental element comprises text, image, pel, form, document and the page etc.; Then by the grouping of the fundamental element in the XML file read module 100 arrangement units, according to the type of fundamental element element is divided into groups, the element of each group is rearranged according to hierarchical sequence; By the element information input block in the XML file read module 100, the element information after rearranging is input in the intermediate data structure at last.
Need to prove that the XML file that reads among the present invention is the interface document of native system and other system, in addition, other need to generate the system of Word document, only need to generate first on request the XML file, can realize seamless link with native system.
2, about step S200 and Word style type-setting module 200:
In step S200, Word style type-setting module 200 can carry out to text, pel and image the composing of Word format file style; Word style type-setting module 200 comprises capable merge cells, section merge cells and pel and the image merge cells that mutual data connect.
Concrete, comprise that text block with hash synthesizes row and with the concrete operations of the row section of synthesizing for the composing of text, through following process, can be with the physical message of the text of step S200 output, be converted to the logical message that can input for Word, wherein:
Row merges rule: can be by the capable merge cells in the Word style type-setting module 200, first several text block of hash are carried out horizontal or horizontal resection, in between longitudinal region, if do not have two text block on the same vertical direction, then these several text block are merged into delegation; In other words, if several text block can cut in the horizontal direction, namely these texts between an identical longitudinal region in, and on same vertical direction, do not comprise two text block between this longitudinal region, just these several text block are synthesized delegation.
Section merges rule: can be by the section merge cells in the Word style type-setting module 200, first several line of text are carried out vertical or perpendicular cuts, in between transverse region, if do not have two line of text on the same level direction, then these several line of text are merged into a text chunk; In other words, if several line of text can cut in vertical direction, namely these texts between an identical transverse region in, and on the same level direction, do not comprise two line of text between this transverse region, just these several line of text are synthesized a text chunk.
And for the composing of pel and image, can finish by the pel in the Word style type-setting module 200 and image merge cells, its rule: if the zone of pel or image is positioned at the zone of a text chunk, then with this pel or the image background information as text section; If the zone of this pel or image has surpassed the zone of a text block, then with this pel or the image background information as full page.
3, about step S300 and RTF file generating module 300:
In step S300, RTF file generating module 300 is with the Information generation rich text format document after step S200 processes; RTF file generating module 300 comprises file header generation unit that mutual data connect, color table generation unit, table of type generation unit, composition information unit, fixedly line-spacing unit, text message unit, primitive information unit and image information units; Step S300 can be divided into the generation of RTF file header, the generation of color table, the generation of table of type, several parts such as generation, date generation, permission build and version number's generation of composition information, wherein:
The generation of file header comprises author information generation, date generation, permission build and version number's generation etc., can be realized by the file header generation unit in the RTF file generating module 300.
Color table is the palette of RTF document, comprises the generation of textcolor and the generation of pel color.The create-rule of color table among the step S300, can be realized by the color table generation unit in the RTF file generating module 300, that is: text of every generation or pel, whether all want to inquire about first this color in color table exists, if exist then extract the index value of this color, if there is no newly-built this color object and extract the index value of this color in color table then.
Table of type is to place a table to manage the font among the whole RTF.The create-rule of table of type among the step S300, can be realized by the table of type generation unit in the RTF file generating module 300, that is: whether text object of the every generation font that all will inquire about the text exists in table of type, if exist then the index value of this font in table of type inputted as input value, if there is no newly-built this font object and the index value of newly-built font object inputted as input value in table of type then.
Be preferably, in step S300 of the present invention, what the generation strategy of rich text format document used is full text shelves shared-color table and table of type; Color table generation unit and table of type generation unit have in full shelves sharing functionality; Therefore the fundamental element in every page is all shared table of type and color table, thus formation speed that can speed up document with have the size of less document.
The input of composition information comprises the input of the information such as joint information, hurdle information, segment information and row information, can be realized by the composition information unit in the RTF file generating module 300, the composition information unit comprises joint information subelement, hurdle information subelement, segment information subelement and the row information subelement that mutual data connect, wherein:
Joint information needs input in both cases: a kind of situation is, when a newly-built page or leaf is inputted, needs a newly-built joint, and page information such as the page size of this page separated with other pages; Another kind of situation is, set type situation about changing on the hurdle under, need new joint of input, come the information on Jiang Xinlan and old hurdle to separate; Specifically can be realized by the joint information subelement in the composition information unit.
Hurdle information comprises the width on the number on hurdle and hurdle and spacing etc., can be realized by the hurdle information subelement in the composition information unit.
Segment information comprises that mainly first trip indentation, the left spacing setting of section and a section space on right arrange etc., can be realized by the segment information subelement in the composition information unit.
The row input information mainly is the line space setting, can be realized by the capable information subelement in the composition information unit.
Be preferably, in order to control accurately line space, in step S300 of the present invention, what adopt is fixing line-spacing strategy, can be realized that by the fixedly line-spacing unit in the RTF file generating module 300 that is: the spacing of every row is the value of the ordinate of the ordinate of the downside of the one's own profession downside that deducts the lastrow text.
The input of text message comprises content information and the format information input of text, can be realized by the text message unit in the RTF file generating module 300; Content information is exactly the particular content of text, and control information comprises the input of the information such as font information, colouring information, runic, italic, underscore and strikethrough of text.
The input information of pel can be realized by the primitive information unit in the RTF file generating module 300; At first being the position of this pel of location, secondly is to draw this pel, is to read color and the line style information that color table arranges this pel at last.
The input information of image can be realized by the image information units in the RTF file generating module 300; Comprise two parts, first is this image of location, secondly is to be that the jpeg binary message is input in the RTF document with the jpeg storehouse with this image transitions.
With the concrete generative process of the Information generation rich text format document after step S200 processes, as shown in Figure 2, step S300 can may further comprise the steps:
Step S310, generating the beginning of a rich text format document, get one page element information;
Step S320, judge whether it is last page, be then to generate rich text format document and process ends, otherwise enter step S331;
Step S331, incoming page information;
Step S333, input composition information;
Step S335, from page or leaf, obtain a fundamental element;
Step S337, judge whether in addition element, be then to enter step S340, otherwise return step S310;
Step S340, judgement element are text, are then to enter step S350, otherwise enter step S360;
Step S350, input text upgrade color table and table of type;
Step S360, judge whether pel of element, be then to enter step S370, otherwise enter step S380;
Step S370, input primitive upgrade color table;
Step S380, input picture are the jpeg binary message with image transitions.
The method of above-mentioned direct generation rich text format document has evaded frequently that Com calls, and has reduced the resource that too much takies, and has alleviated the load of equipment, has also improved efficient and stability that the rich text format document generates.
4, about step S400 and Word file generating module 400:
In step S400, the rich text format document that Word file generating module 400 will generate behind step S300 is converted into the Word format file on the backstage, as shown in Figure 3, may further comprise the steps:
Step S410, initialization Com environment, i.e. initialization Com interface;
Step S420, the DocumentPtr object that calls in the Com module import the RTF file;
Step S430, inquiry registration table obtain the Word version information of user installation;
Step S440, generate the Word document of corresponding version on the backstage according to rule according to this version information; If user installation Word2002 and Word2003 then generate the doc format file, if user installation Word2007 and Word2010 then generate the docx format file;
Deletion is finished the XML formatted file to the conversion of Word formatted file as the RTF file of intermediate file after step S450, the conversion.
Wherein, it is as follows to call the detailed process of Com module among the present invention:
The CoInitialize interface of step S422, calling system is realized the Com environment of MS-Word;
Step S424, call CreateInstance initialization application object ApplicationPtr; Simultaneously, call put_Visible (VARIANT_FALSE) application object and be set to the backstage translative mode;
Step S426, call get_Documents and obtain the DocumentsPtr object; This object represents the Word document set;
Step S428, the open interface that calls DocumentsPtr are opened the RTF intermediate file on the backstage; For example, if current be the Word2002 system, then need call the Open2000 interface.
As seen, the Com interface that the present invention calls MS-Word be not employing be to call dynamically writing of fundamental element information and property control information among the robotization Com Interface realization Word of MS-Word.This is because at first, the Com interface of MS-Word is higher to the call format of configuration surroundings and input data, in situation about frequently repeatedly calling, often the situation that program is hung can occur; Secondly, fundamental element of every input all can relate to a Com and call, and the thing followed is a large amount of IO operation, and institute's conversion efficiency is lower; Moreover the robotization Com interface occupying system resources that calls dynamically MS-Word is more, easily causes machine loading heavier.
In addition, the SaveAs that the present invention the calls DocumentsPtr Word document that RTF saved File As on the backstage, its create-rule is: if being Word2002 and Word2003 then giving tacit consent to and generate the doc format file of user installation, if that user installation is word2007 and word2010, then acquiescence generates the document of docx form; If save as the doc form, wdFormatDocument then is set, if save as the docx form, then wdFormatXMLDocument need be set.The save File As method of Word format file of RTF is realized the conversion of rich text format document to the Word format file on the backstage, also supported the generation of various version Word file simultaneously.
Generally speaking, XML format file of the present invention is converted to the preferred implementation of the system and method for Word format file, content according to the Word file generates identical RTF file with setting type, disposable at the Com of Background scheduling MS-Word interface is the Word file with the RTF file conversion, in the process that generates a Word document, only need like this a Com to call, evade to the full extent inefficiencies and instability that Com calls, reduced the resources occupation amount.As seen, its significant progressive being embodied in: improved conversion efficiency and conversion stability; Set type effective to the element support comprehensively; Support the generation of all Word version files; Reduced the occupancy volume of conversion to system resource.
Should be understood that; the above only is preferred embodiment of the present invention; be not sufficient to limit technical scheme of the present invention; for those of ordinary skills; within the spirit and principles in the present invention; can be increased and decreased according to the above description; replace; conversion or improvement; for example; be converted to the intermediate document of Word format file as the XML format file; also can adopt the XML format file of Word2003 to be equal to replacement rich text format document; and all these increases and decreases; replace; conversion or improve after technical scheme, all should belong to the protection domain of claims of the present invention.