CN116775939A - Word file combination method and system based on linux server - Google Patents
Word file combination method and system based on linux server Download PDFInfo
- Publication number
- CN116775939A CN116775939A CN202310839005.2A CN202310839005A CN116775939A CN 116775939 A CN116775939 A CN 116775939A CN 202310839005 A CN202310839005 A CN 202310839005A CN 116775939 A CN116775939 A CN 116775939A
- Authority
- CN
- China
- Prior art keywords
- file
- word
- xml
- main
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000013507 mapping Methods 0.000 claims abstract description 15
- 238000010276 construction Methods 0.000 claims description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
Abstract
The application provides a word file combination method and system based on a linux server, comprising the following steps: s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file; s2: combining each sub xml file with the main xml file to obtain a combined xml file; s3: and converting the combined xml file into a combined word file. According to the application, the mapping relation is constructed between the word file and the xml file, so that the xml file can inherit the tag object ID, the logic structure and the data structure of the word file, the xml file can be randomly combined through the ordering of the tag object ID, and then the combined xml file is converted into the combined word file, so that the low-cost combination of the word file is realized.
Description
Technical Field
The application relates to the field of file combination, in particular to a word file combination method and system based on a linux server.
Background
The main technology of document generation is to realize the output of various template documents based on the SDK opened by Microsoft. Based on java language realization, there is Jacob on Win platform and apose, docx4j on cross platform. jacob processes documents on a win platform, primarily also based on macros or programming languages (vba). Apose is a commodity product that supports word processing by installing a local library. Docx4j is a semi-open source product, and the basic word processing is open, but the Docx4j only provides support for commercial versions, like the combination of words, which are high-level requirements.
The Jacob is based on a win platform, and meanwhile, the server also has to install an office, so that the server is not high in availability, concurrent processing is not supported, and the functions of combining Apose and Docx4j in word and the like are that closed source software cannot be used arbitrarily, so that the cost of combining files by using the existing software is high, and the file is inconvenient to call.
Disclosure of Invention
In order to solve the technical problems, the application provides a word file combination method based on a linux server, which comprises the following steps:
s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;
s2: combining each sub xml file with the main xml file to obtain a combined xml file;
s3: and converting the combined xml file into a combined word file.
Preferably, step S1 specifically includes:
s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;
s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;
s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;
s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.
Preferably, step S2 specifically includes:
s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;
s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.
Preferably, the construction process of the mapping relation is as follows:
the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.
Preferably, step S22 specifically includes:
and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.
A word file combination system based on a linux server, comprising:
the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;
the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;
and the xml-to-word module is used for converting the combined xml file into a combined word file.
The application has the following beneficial effects:
according to the application, the mapping relation is constructed between the word file and the xml file, so that the xml file can inherit the tag object ID, the logic structure and the data structure of the word file, the xml file can be randomly combined through the ordering of the tag object ID, and then the combined xml file is converted into the combined word file, so that the low-cost combination of the word file is realized.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present application;
the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, the present application provides a word file combination method based on a linux server, including:
s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;
s2: combining each sub xml file with the main xml file to obtain a combined xml file;
s3: and converting the combined xml file into a combined word file.
Further, the step S1 specifically includes:
s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;
s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;
s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;
s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.
Specifically, the application adopts the OpenXML technology to convert word files into XML files;
the Open XML main features include:
1. patency: openXML is an open standard whose specification can be understood and implemented by anyone. This allows different software developers and vendors to freely use and support the Open XML format.
2. Scalability: open XML uses XML as its foundation, allowing the extension and customization of logical structures and content. In this way, the user can define new elements, properties and styles according to his own needs.
3. Structuring and layering: open XML documents have structured and layered properties, with different elements and attributes used to represent various parts and properties of the document. This structure makes parsing, editing and processing of documents more convenient and reliable.
4. Compatibility: the Open XML format may be supported and parsed by different software applications and platforms. Thus, users can share and exchange Open XML documents between different software without losing format or content.
Open XML is widely used for document formats in the microsoft office suite, such as. Docx (Word document),. Xlsx (Excel spreadsheet), and. Pptx (PowerPoint presentation). It provides a versatile way to store and process document data and facilitates cross-platform and interoperability implementations. Microsoft is based on this set of theory, and the wordprocessing ML document object is realized. The following is abstracted from the Microsoft's functional network description of the wordprocessingML, the current version being the Open XML SDK 2.5API.
Word data structure: the method is used for recording all relevant attributes and operation methods which are possibly read by word;
the basic logical structure of a WordProcessingML document consists of < document > and < body > elements, followed by one or more block-level elements, e.g., < p > representing a paramraph. The paramraph contains one or more < r > elements. < r > represents run, which is a text region having a set of common attributes (e.g., format settings). run contains one or more < t > elements. The < t > element contains a text range. The important wordprocessing ml portion is as follows:
the Open XML SDK 2.5API provides a strongly typed class in the docurmentformat. Open XML WordprocessingML namespace, corresponding to the WordprocessingML element.
Table 1 below lists some of the important WordProcessingML elements, the WordProcessingML document package portion to which these elements correspond (if applicable), and the hosting class representing the elements in the OpenXML SDK 2.5API.
TABLE 1
The WordprocessingML document is organized according to article concepts, the articles being content regions in the WordprocessingML document, the WordprocessingML article comprising:
annotating
Tail injection
Footer of page
Foot note
Frame, vocabulary document
Title of the book
Body part
Sub-documents
Text box
Not all articles need exist in a valid WordprocessingML document. The simplest and most efficient wordprocessing ml document requires only one article-the master document article. In WordProcessingML, a master document article is represented by a master document portion. At a minimum, if a valid WordProcessingML document is to be created using code, please add the main document portion to the document.
The following information in ISO/IEC 29500 is presented as the WordProcessingML element required in the main document portion to complete the minimum document schema.
The main document article of the simplest WordprocessingML document includes the following XML elements:
the root element of the document-wordprocessing ml's main document part is used to define the main document article.
body-a container containing a set of block-level structures of the main article.
p-one paragraph.
r-a piece of continuous text.
t-a text range.
Docx4j defines a new model object Wordprocessing MLPackage based on the structure of openxml, with JAXB used in the bottom layer. It establishes a bridge for the interconversion of XML files and JAVA objects. JAXB (Java Architecture for XML Binding) is an industry standard and is a technology that can generate Java classes from XMLSchema. In this process, JAXB also provides a method for reverse generating the Java object tree from the XML instance document, and can rewrite the contents of the Java object tree to the XML instance document. On the other hand, JAXB provides a quick and easy way to bind XML schemas to Java representations, enabling Java developers to easily combine XML data and processing functions in Java applications.
Further, the step S2 specifically includes:
s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;
s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.
Further, the construction process of the mapping relation comprises the following steps:
the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.
Specifically, a logical structure of the combined xml file is called, sub xml is loaded in a main xml document area according to the logical structure, and the logical structure is stored in JAXB; and obtaining all tag object IDs comprising an OLEObject, image data, a skip and the like, iterating a resource management object (Relationship) contained in the sub-xml logic structure, calling an addtagetePart method of the main xml document area logic structure, generating a new part, and using a mapping relation between the sub-tag object ID in the temporary map object cache sub-xml logic structure and the main tag object ID of the newly generated part.
Further, step S22 specifically includes:
and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.
Specifically, the primary keys in the temporary map are sequentially arranged, and the primary keys represent a generation sequence of the tag object ID; iterating nodes with P labels in the sub xml logic structure, and converting node information into character string texts; and simultaneously iterating the ordered temporary maps, and replacing old tag object IDs with new tag object IDs in batches by using a regular table formula to generate new node information character string texts.
The application provides a word file combination system based on a linux server, which comprises the following components:
the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;
the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;
and the xml-to-word module is used for converting the combined xml file into a combined word file.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (6)
1. A word file combination method based on a linux server is characterized by comprising the following steps:
s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;
s2: combining each sub xml file with the main xml file to obtain a combined xml file;
s3: and converting the combined xml file into a combined word file.
2. The word file format generating method based on the linux server according to claim 1, wherein step S1 specifically includes:
s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;
s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;
s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;
s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.
3. The word file format generating method based on the linux server according to claim 2, wherein step S2 specifically includes:
s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;
s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.
4. The word file format generating method based on the linux server according to claim 3, wherein the construction process of the mapping relation is as follows:
the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.
5. The word file format generating method based on the linux server according to claim 3, wherein step S22 is specifically:
and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.
6. A word file combination system based on a linux server, comprising:
the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;
the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;
and the xml-to-word module is used for converting the combined xml file into a combined word file.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310839005.2A CN116775939A (en) | 2023-07-07 | 2023-07-07 | Word file combination method and system based on linux server |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310839005.2A CN116775939A (en) | 2023-07-07 | 2023-07-07 | Word file combination method and system based on linux server |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116775939A true CN116775939A (en) | 2023-09-19 |
Family
ID=88009851
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310839005.2A Pending CN116775939A (en) | 2023-07-07 | 2023-07-07 | Word file combination method and system based on linux server |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116775939A (en) |
-
2023
- 2023-07-07 CN CN202310839005.2A patent/CN116775939A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7694284B2 (en) | Shareable, bidirectional mechanism for conversion between object model and XML | |
Tidwell | XSLT: mastering XML transformations | |
US7770180B2 (en) | Exposing embedded data in a computer-generated document | |
AU2006200047B2 (en) | Data store for software application documents | |
US6725426B1 (en) | Mechanism for translating between word processing documents and XML documents | |
AU2003225697B2 (en) | Dynamic generation of schema information for data description languages | |
US20110023022A1 (en) | Method for application authoring employing an application template stored in a database | |
US7559052B2 (en) | Meta-model for associating multiple physical representations of logically equivalent entities in messaging and other applications | |
US20080109464A1 (en) | Extending Clipboard Augmentation | |
US20050066315A1 (en) | Localization tool | |
Hilbert et al. | Making CONCUR work | |
US20080162530A1 (en) | Method and Apparatus for Utilizing an Extensible Markup Language Data Structure For Defining a Data-Analysis Parts Container For Use in a Word Processor Application | |
CN116775939A (en) | Word file combination method and system based on linux server | |
Rahtz et al. | Reviewing the TEI ODD system | |
Wagner et al. | Web Applications with Javascript or Java: Volume 1: Constraint Validation, Enumerations, Special Datatypes | |
Burnard | New tricks from an old dog: An overview of TEI P5 | |
Choi et al. | A study on efficiency of markup language using DOM tree | |
JP2004529427A (en) | Design of extensible style sheet using meta tag information | |
Lang et al. | Package ‘xml’ | |
JPH11110391A (en) | Document managing method | |
Signer et al. | Back to the Future: Bringing Original Hypermedia and Cross-Media Concepts to Modern Desktop Environments | |
Lieske et al. | Internationalization tag set (ITS) version 1.0 | |
Mun-Young et al. | Developing an XML Schema generator based on UML class | |
Adida et al. | Semantic Annotation and Retrieval: Web of Hypertext-RDFa and Microformats. | |
Jarada et al. | Rules for Effective Mapping Between Two Data Environments: Object Database Language and XML |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |