CN116775939A - Word file combination method and system based on linux server - Google Patents

Word file combination method and system based on linux server Download PDF

Info

Publication number
CN116775939A
CN116775939A CN202310839005.2A CN202310839005A CN116775939A CN 116775939 A CN116775939 A CN 116775939A CN 202310839005 A CN202310839005 A CN 202310839005A CN 116775939 A CN116775939 A CN 116775939A
Authority
CN
China
Prior art keywords
file
word
xml
main
sub
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310839005.2A
Other languages
Chinese (zh)
Inventor
蔡祖雄
张玉
余嘉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hubei Changfengtong Information Technology Co ltd
Original Assignee
Hubei Changfengtong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hubei Changfengtong Information Technology Co ltd filed Critical Hubei Changfengtong Information Technology Co ltd
Priority to CN202310839005.2A priority Critical patent/CN116775939A/en
Publication of CN116775939A publication Critical patent/CN116775939A/en
Pending legal-status Critical Current

Links

Abstract

The application provides a word file combination method and system based on a linux server, comprising the following steps: s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file; s2: combining each sub xml file with the main xml file to obtain a combined xml file; s3: and converting the combined xml file into a combined word file. According to the application, the mapping relation is constructed between the word file and the xml file, so that the xml file can inherit the tag object ID, the logic structure and the data structure of the word file, the xml file can be randomly combined through the ordering of the tag object ID, and then the combined xml file is converted into the combined word file, so that the low-cost combination of the word file is realized.

Description

Word file combination method and system based on linux server
Technical Field
The application relates to the field of file combination, in particular to a word file combination method and system based on a linux server.
Background
The main technology of document generation is to realize the output of various template documents based on the SDK opened by Microsoft. Based on java language realization, there is Jacob on Win platform and apose, docx4j on cross platform. jacob processes documents on a win platform, primarily also based on macros or programming languages (vba). Apose is a commodity product that supports word processing by installing a local library. Docx4j is a semi-open source product, and the basic word processing is open, but the Docx4j only provides support for commercial versions, like the combination of words, which are high-level requirements.
The Jacob is based on a win platform, and meanwhile, the server also has to install an office, so that the server is not high in availability, concurrent processing is not supported, and the functions of combining Apose and Docx4j in word and the like are that closed source software cannot be used arbitrarily, so that the cost of combining files by using the existing software is high, and the file is inconvenient to call.
Disclosure of Invention
In order to solve the technical problems, the application provides a word file combination method based on a linux server, which comprises the following steps:
s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;
s2: combining each sub xml file with the main xml file to obtain a combined xml file;
s3: and converting the combined xml file into a combined word file.
Preferably, step S1 specifically includes:
s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;
s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;
s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;
s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.
Preferably, step S2 specifically includes:
s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;
s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.
Preferably, the construction process of the mapping relation is as follows:
the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.
Preferably, step S22 specifically includes:
and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.
A word file combination system based on a linux server, comprising:
the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;
the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;
and the xml-to-word module is used for converting the combined xml file into a combined word file.
The application has the following beneficial effects:
according to the application, the mapping relation is constructed between the word file and the xml file, so that the xml file can inherit the tag object ID, the logic structure and the data structure of the word file, the xml file can be randomly combined through the ordering of the tag object ID, and then the combined xml file is converted into the combined word file, so that the low-cost combination of the word file is realized.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present application;
the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, the present application provides a word file combination method based on a linux server, including:
s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;
s2: combining each sub xml file with the main xml file to obtain a combined xml file;
s3: and converting the combined xml file into a combined word file.
Further, the step S1 specifically includes:
s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;
s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;
s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;
s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.
Specifically, the application adopts the OpenXML technology to convert word files into XML files;
the Open XML main features include:
1. patency: openXML is an open standard whose specification can be understood and implemented by anyone. This allows different software developers and vendors to freely use and support the Open XML format.
2. Scalability: open XML uses XML as its foundation, allowing the extension and customization of logical structures and content. In this way, the user can define new elements, properties and styles according to his own needs.
3. Structuring and layering: open XML documents have structured and layered properties, with different elements and attributes used to represent various parts and properties of the document. This structure makes parsing, editing and processing of documents more convenient and reliable.
4. Compatibility: the Open XML format may be supported and parsed by different software applications and platforms. Thus, users can share and exchange Open XML documents between different software without losing format or content.
Open XML is widely used for document formats in the microsoft office suite, such as. Docx (Word document),. Xlsx (Excel spreadsheet), and. Pptx (PowerPoint presentation). It provides a versatile way to store and process document data and facilitates cross-platform and interoperability implementations. Microsoft is based on this set of theory, and the wordprocessing ML document object is realized. The following is abstracted from the Microsoft's functional network description of the wordprocessingML, the current version being the Open XML SDK 2.5API.
Word data structure: the method is used for recording all relevant attributes and operation methods which are possibly read by word;
the basic logical structure of a WordProcessingML document consists of < document > and < body > elements, followed by one or more block-level elements, e.g., < p > representing a paramraph. The paramraph contains one or more < r > elements. < r > represents run, which is a text region having a set of common attributes (e.g., format settings). run contains one or more < t > elements. The < t > element contains a text range. The important wordprocessing ml portion is as follows:
the Open XML SDK 2.5API provides a strongly typed class in the docurmentformat. Open XML WordprocessingML namespace, corresponding to the WordprocessingML element.
Table 1 below lists some of the important WordProcessingML elements, the WordProcessingML document package portion to which these elements correspond (if applicable), and the hosting class representing the elements in the OpenXML SDK 2.5API.
TABLE 1
The WordprocessingML document is organized according to article concepts, the articles being content regions in the WordprocessingML document, the WordprocessingML article comprising:
annotating
Tail injection
Footer of page
Foot note
Frame, vocabulary document
Title of the book
Body part
Sub-documents
Text box
Not all articles need exist in a valid WordprocessingML document. The simplest and most efficient wordprocessing ml document requires only one article-the master document article. In WordProcessingML, a master document article is represented by a master document portion. At a minimum, if a valid WordProcessingML document is to be created using code, please add the main document portion to the document.
The following information in ISO/IEC 29500 is presented as the WordProcessingML element required in the main document portion to complete the minimum document schema.
The main document article of the simplest WordprocessingML document includes the following XML elements:
the root element of the document-wordprocessing ml's main document part is used to define the main document article.
body-a container containing a set of block-level structures of the main article.
p-one paragraph.
r-a piece of continuous text.
t-a text range.
Docx4j defines a new model object Wordprocessing MLPackage based on the structure of openxml, with JAXB used in the bottom layer. It establishes a bridge for the interconversion of XML files and JAVA objects. JAXB (Java Architecture for XML Binding) is an industry standard and is a technology that can generate Java classes from XMLSchema. In this process, JAXB also provides a method for reverse generating the Java object tree from the XML instance document, and can rewrite the contents of the Java object tree to the XML instance document. On the other hand, JAXB provides a quick and easy way to bind XML schemas to Java representations, enabling Java developers to easily combine XML data and processing functions in Java applications.
Further, the step S2 specifically includes:
s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;
s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.
Further, the construction process of the mapping relation comprises the following steps:
the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.
Specifically, a logical structure of the combined xml file is called, sub xml is loaded in a main xml document area according to the logical structure, and the logical structure is stored in JAXB; and obtaining all tag object IDs comprising an OLEObject, image data, a skip and the like, iterating a resource management object (Relationship) contained in the sub-xml logic structure, calling an addtagetePart method of the main xml document area logic structure, generating a new part, and using a mapping relation between the sub-tag object ID in the temporary map object cache sub-xml logic structure and the main tag object ID of the newly generated part.
Further, step S22 specifically includes:
and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.
Specifically, the primary keys in the temporary map are sequentially arranged, and the primary keys represent a generation sequence of the tag object ID; iterating nodes with P labels in the sub xml logic structure, and converting node information into character string texts; and simultaneously iterating the ordered temporary maps, and replacing old tag object IDs with new tag object IDs in batches by using a regular table formula to generate new node information character string texts.
The application provides a word file combination system based on a linux server, which comprises the following components:
the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;
the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;
and the xml-to-word module is used for converting the combined xml file into a combined word file.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (6)

1. A word file combination method based on a linux server is characterized by comprising the following steps:
s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;
s2: combining each sub xml file with the main xml file to obtain a combined xml file;
s3: and converting the combined xml file into a combined word file.
2. The word file format generating method based on the linux server according to claim 1, wherein step S1 specifically includes:
s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;
s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;
s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;
s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.
3. The word file format generating method based on the linux server according to claim 2, wherein step S2 specifically includes:
s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;
s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.
4. The word file format generating method based on the linux server according to claim 3, wherein the construction process of the mapping relation is as follows:
the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.
5. The word file format generating method based on the linux server according to claim 3, wherein step S22 is specifically:
and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.
6. A word file combination system based on a linux server, comprising:
the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;
the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;
and the xml-to-word module is used for converting the combined xml file into a combined word file.
CN202310839005.2A 2023-07-07 2023-07-07 Word file combination method and system based on linux server Pending CN116775939A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310839005.2A CN116775939A (en) 2023-07-07 2023-07-07 Word file combination method and system based on linux server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310839005.2A CN116775939A (en) 2023-07-07 2023-07-07 Word file combination method and system based on linux server

Publications (1)

Publication Number Publication Date
CN116775939A true CN116775939A (en) 2023-09-19

Family

ID=88009851

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310839005.2A Pending CN116775939A (en) 2023-07-07 2023-07-07 Word file combination method and system based on linux server

Country Status (1)

Country Link
CN (1) CN116775939A (en)

Similar Documents

Publication Publication Date Title
US7694284B2 (en) Shareable, bidirectional mechanism for conversion between object model and XML
Tidwell XSLT: mastering XML transformations
US7770180B2 (en) Exposing embedded data in a computer-generated document
AU2006200047B2 (en) Data store for software application documents
US6725426B1 (en) Mechanism for translating between word processing documents and XML documents
AU2003225697B2 (en) Dynamic generation of schema information for data description languages
US20110023022A1 (en) Method for application authoring employing an application template stored in a database
US7559052B2 (en) Meta-model for associating multiple physical representations of logically equivalent entities in messaging and other applications
US20080109464A1 (en) Extending Clipboard Augmentation
US20050066315A1 (en) Localization tool
Hilbert et al. Making CONCUR work
US20080162530A1 (en) Method and Apparatus for Utilizing an Extensible Markup Language Data Structure For Defining a Data-Analysis Parts Container For Use in a Word Processor Application
CN116775939A (en) Word file combination method and system based on linux server
Rahtz et al. Reviewing the TEI ODD system
Wagner et al. Web Applications with Javascript or Java: Volume 1: Constraint Validation, Enumerations, Special Datatypes
Burnard New tricks from an old dog: An overview of TEI P5
Choi et al. A study on efficiency of markup language using DOM tree
JP2004529427A (en) Design of extensible style sheet using meta tag information
Lang et al. Package ‘xml’
JPH11110391A (en) Document managing method
Signer et al. Back to the Future: Bringing Original Hypermedia and Cross-Media Concepts to Modern Desktop Environments
Lieske et al. Internationalization tag set (ITS) version 1.0
Mun-Young et al. Developing an XML Schema generator based on UML class
Adida et al. Semantic Annotation and Retrieval: Web of Hypertext-RDFa and Microformats.
Jarada et al. Rules for Effective Mapping Between Two Data Environments: Object Database Language and XML

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination