CN116775939A

CN116775939A - Word file combination method and system based on linux server

Info

Publication number: CN116775939A
Application number: CN202310839005.2A
Authority: CN
Inventors: 蔡祖雄; 张玉; 余嘉
Original assignee: Hubei Changfengtong Information Technology Co ltd
Current assignee: Hubei Changfengtong Information Technology Co ltd
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-09-19

Abstract

The application provides a word file combination method and system based on a linux server, comprising the following steps: s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file; s2: combining each sub xml file with the main xml file to obtain a combined xml file; s3: and converting the combined xml file into a combined word file. According to the application, the mapping relation is constructed between the word file and the xml file, so that the xml file can inherit the tag object ID, the logic structure and the data structure of the word file, the xml file can be randomly combined through the ordering of the tag object ID, and then the combined xml file is converted into the combined word file, so that the low-cost combination of the word file is realized.

Description

Word file combination method and system based on linux server

Technical Field

The application relates to the field of file combination, in particular to a word file combination method and system based on a linux server.

Background

The main technology of document generation is to realize the output of various template documents based on the SDK opened by Microsoft. Based on java language realization, there is Jacob on Win platform and apose, docx4j on cross platform. jacob processes documents on a win platform, primarily also based on macros or programming languages (vba). Apose is a commodity product that supports word processing by installing a local library. Docx4j is a semi-open source product, and the basic word processing is open, but the Docx4j only provides support for commercial versions, like the combination of words, which are high-level requirements.

The Jacob is based on a win platform, and meanwhile, the server also has to install an office, so that the server is not high in availability, concurrent processing is not supported, and the functions of combining Apose and Docx4j in word and the like are that closed source software cannot be used arbitrarily, so that the cost of combining files by using the existing software is high, and the file is inconvenient to call.

Disclosure of Invention

In order to solve the technical problems, the application provides a word file combination method based on a linux server, which comprises the following steps:

s1: acquiring a main word file and a plurality of sub word files, converting the main word file into a main xml file, and converting each sub word file into a corresponding sub xml file;

s2: combining each sub xml file with the main xml file to obtain a combined xml file;

s3: and converting the combined xml file into a combined word file.

Preferably, step S1 specifically includes:

s11: acquiring a logic structure and a data structure of a main word file and a logic structure and a data structure of each sub word file;

s12: obtaining a main tag object ID of the main word file through a logic structure and a data structure of the main word file, and obtaining a sub tag object ID corresponding to each sub word file through the logic structure and the data structure of each sub word file;

s13: converting the main word file into a main xml file, wherein the main xml file inherits a main tag object ID, a logic structure and a data structure of the main word file;

s14: and converting each sub word file into a corresponding sub xml file, and inheriting the sub tag object ID, the logic structure and the data structure of the corresponding sub word file by each sub xml file.

Preferably, step S2 specifically includes:

s21: setting a logic structure of a combined xml file, and constructing a mapping relation between a main tag object ID and each sub tag object ID according to the logic structure of the combined xml file;

s22: and combining each sub xml file into the main xml file according to the mapping relation to obtain a combined xml file.

Preferably, the construction process of the mapping relation is as follows:

the placement order of each xml file in the combined xml file is set, and the tag object IDs corresponding to the xml files are ordered according to the placement order.

Preferably, step S22 specifically includes:

and ordering the positions of the sub xml files and the main xml files in the combined xml file according to the mapping relation, and generating a character string text at the corresponding position through the logic structure and the data structure of each xml file to generate the combined xml file.

A word file combination system based on a linux server, comprising:

the word-to-xml module is used for acquiring a main word file and a plurality of sub word files, converting the main word file into the main xml file, and converting each sub word file into a corresponding sub xml file;

the combination module is used for combining each sub xml file with the main xml file to obtain a combined xml file;

and the xml-to-word module is used for converting the combined xml file into a combined word file.

The application has the following beneficial effects:

according to the application, the mapping relation is constructed between the word file and the xml file, so that the xml file can inherit the tag object ID, the logic structure and the data structure of the word file, the xml file can be randomly combined through the ordering of the tag object ID, and then the combined xml file is converted into the combined word file, so that the low-cost combination of the word file is realized.

Drawings

FIG. 1 is a flow chart of a method according to an embodiment of the present application;

the achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.

Detailed Description

It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

Referring to fig. 1, the present application provides a word file combination method based on a linux server, including:

s3: and converting the combined xml file into a combined word file.

Further, the step S1 specifically includes:

Specifically, the application adopts the OpenXML technology to convert word files into XML files;

the Open XML main features include:

1. patency: openXML is an open standard whose specification can be understood and implemented by anyone. This allows different software developers and vendors to freely use and support the Open XML format.

2. Scalability: open XML uses XML as its foundation, allowing the extension and customization of logical structures and content. In this way, the user can define new elements, properties and styles according to his own needs.

3. Structuring and layering: open XML documents have structured and layered properties, with different elements and attributes used to represent various parts and properties of the document. This structure makes parsing, editing and processing of documents more convenient and reliable.

4. Compatibility: the Open XML format may be supported and parsed by different software applications and platforms. Thus, users can share and exchange Open XML documents between different software without losing format or content.

Open XML is widely used for document formats in the microsoft office suite, such as. Docx (Word document),. Xlsx (Excel spreadsheet), and. Pptx (PowerPoint presentation). It provides a versatile way to store and process document data and facilitates cross-platform and interoperability implementations. Microsoft is based on this set of theory, and the wordprocessing ML document object is realized. The following is abstracted from the Microsoft's functional network description of the wordprocessingML, the current version being the Open XML SDK 2.5API.

Word data structure: the method is used for recording all relevant attributes and operation methods which are possibly read by word;

the basic logical structure of a WordProcessingML document consists of < document > and < body > elements, followed by one or more block-level elements, e.g., < p > representing a paramraph. The paramraph contains one or more < r > elements. < r > represents run, which is a text region having a set of common attributes (e.g., format settings). run contains one or more < t > elements. The < t > element contains a text range. The important wordprocessing ml portion is as follows:

the Open XML SDK 2.5API provides a strongly typed class in the docurmentformat. Open XML WordprocessingML namespace, corresponding to the WordprocessingML element.

Table 1 below lists some of the important WordProcessingML elements, the WordProcessingML document package portion to which these elements correspond (if applicable), and the hosting class representing the elements in the OpenXML SDK 2.5API.

TABLE 1

The WordprocessingML document is organized according to article concepts, the articles being content regions in the WordprocessingML document, the WordprocessingML article comprising:

annotating

Tail injection

Footer of page

Foot note

Frame, vocabulary document

Title of the book

Body part

Sub-documents

Text box

Not all articles need exist in a valid WordprocessingML document. The simplest and most efficient wordprocessing ml document requires only one article-the master document article. In WordProcessingML, a master document article is represented by a master document portion. At a minimum, if a valid WordProcessingML document is to be created using code, please add the main document portion to the document.

The following information in ISO/IEC 29500 is presented as the WordProcessingML element required in the main document portion to complete the minimum document schema.

The main document article of the simplest WordprocessingML document includes the following XML elements:

the root element of the document-wordprocessing ml's main document part is used to define the main document article.

body-a container containing a set of block-level structures of the main article.

p-one paragraph.

r-a piece of continuous text.

t-a text range.

Docx4j defines a new model object Wordprocessing MLPackage based on the structure of openxml, with JAXB used in the bottom layer. It establishes a bridge for the interconversion of XML files and JAVA objects. JAXB (Java Architecture for XML Binding) is an industry standard and is a technology that can generate Java classes from XMLSchema. In this process, JAXB also provides a method for reverse generating the Java object tree from the XML instance document, and can rewrite the contents of the Java object tree to the XML instance document. On the other hand, JAXB provides a quick and easy way to bind XML schemas to Java representations, enabling Java developers to easily combine XML data and processing functions in Java applications.

Further, the step S2 specifically includes:

Further, the construction process of the mapping relation comprises the following steps:

Specifically, a logical structure of the combined xml file is called, sub xml is loaded in a main xml document area according to the logical structure, and the logical structure is stored in JAXB; and obtaining all tag object IDs comprising an OLEObject, image data, a skip and the like, iterating a resource management object (Relationship) contained in the sub-xml logic structure, calling an addtagetePart method of the main xml document area logic structure, generating a new part, and using a mapping relation between the sub-tag object ID in the temporary map object cache sub-xml logic structure and the main tag object ID of the newly generated part.

Further, step S22 specifically includes:

Specifically, the primary keys in the temporary map are sequentially arranged, and the primary keys represent a generation sequence of the tag object ID; iterating nodes with P labels in the sub xml logic structure, and converting node information into character string texts; and simultaneously iterating the ordered temporary maps, and replacing old tag object IDs with new tag object IDs in batches by using a regular table formula to generate new node information character string texts.

The application provides a word file combination system based on a linux server, which comprises the following components:

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the terms first, second, third, etc. do not denote any order, but rather the terms first, second, third, etc. are used to interpret the terms as labels.

The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A word file combination method based on a linux server is characterized by comprising the following steps:

s3: and converting the combined xml file into a combined word file.

2. The word file format generating method based on the linux server according to claim 1, wherein step S1 specifically includes:

3. The word file format generating method based on the linux server according to claim 2, wherein step S2 specifically includes:

4. The word file format generating method based on the linux server according to claim 3, wherein the construction process of the mapping relation is as follows:

5. The word file format generating method based on the linux server according to claim 3, wherein step S22 is specifically:

6. A word file combination system based on a linux server, comprising: