CN111159995A - Method for generating word document in template mode - Google Patents

Method for generating word document in template mode Download PDF

Info

Publication number
CN111159995A
CN111159995A CN202010045710.1A CN202010045710A CN111159995A CN 111159995 A CN111159995 A CN 111159995A CN 202010045710 A CN202010045710 A CN 202010045710A CN 111159995 A CN111159995 A CN 111159995A
Authority
CN
China
Prior art keywords
document
expression
text
xml
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010045710.1A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangxi Kinggrid Technology Co ltd
Original Assignee
Jiangxi Kinggrid Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangxi Kinggrid Technology Co ltd filed Critical Jiangxi Kinggrid Technology Co ltd
Priority to CN202010045710.1A priority Critical patent/CN111159995A/en
Publication of CN111159995A publication Critical patent/CN111159995A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/116Details of conversion of file system types or formats

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a method for generating a word document in a templating manner, which comprises the steps of decompressing a docx document, extracting a word/document.xml file, and analyzing document.xml to obtain an xml object; the method comprises the steps that (1) w: p paragraph nodes are obtained by traversing from a root node object document.xml, and all text contents of a word are in the w: p nodes; and completing traversal analysis of all w: p, generating a new document.xml file, covering the new document/document.xml file in the template document, and completing expression replacement of the document. The method is realized by using the docx document based on XML and ZIP technology, analyzes the XML structure, extracts the expression and replaces the text, so that the disorder of the document structure and the style can be avoided, the generation efficiency of the format document is improved, and the method accords with the word document standard.

Description

Method for generating word document in template mode
Technical Field
The invention relates to a document processing technology, in particular to a method for generating a word document in a templating mode.
Background
With the application of word documents to our work, in a service scene of partial document processing, there are a large number of documents with the same format, and service personnel need to manually edit and review the large number of format documents, which requires a lot of time consumption and even error condition. It is desirable to provide a method for templating word, replacing dynamic content in a format document with a variable expression (similar to $ { variable name } in other template engines), and replacing the variable expression with the dynamic content when generating the document to complete generation of the document, thereby solving the problems of time consumption and easy error of the document.
Microsoft Office Word is the most popular Word processing program and is the essential productivity tool in our work.
Apache POI is an open source code function library of an Apache software foundation, provides API for Java programs to have the functions of reading and writing Microsoft Office format archives, and generates and modifies word documents through a service system.
The ZIP file format is a file format for data compression and document storage, Microsoft has built-in support for the ZIP format from the Windows ME operating system, and can open and produce compressed files in the ZIP format even if decompression software is not installed on a user's computer, and OS X and popular Linux operating systems also provide similar support for the ZIP format. So the zip format is often the most common choice if files are propagated and distributed over a network.
XML is a markup language for marking electronic documents to be structured.
In generating a large number of documents with the same format and internally distinct, there are two ways:
the first is that business personnel manually write documents, and as the number of documents increases, a large amount of time is consumed, and the problem of errors can easily occur.
The second method is that a developer acquires service data by using an Apache POI and generates a document according to an API provided by the POI, so that the problem of document generation efficiency can be solved, but with the increase of service documents, the developer needs to develop and write service codes responding to the service data, the service person needs to spend a large amount of time on document formats, and the time for testing, version release and the like is also needed.
At present, a business system generates a word format document according to business logic, developers are required to compile corresponding business codes and inquire business data to generate the word document, new business codes are required to be compiled by the developers with the increase of business scenes, and the setting of word styles in a code mode is not as convenient as the visual editing of words due to the complex word format, so that time is saved and the word document is generated. The new business needs to be developed, tested, on-line and the like when being on-line. More manpower, time and maintenance cost are needed.
Disclosure of Invention
The invention provides a method for generating a word document in a templating manner, which is characterized in that a < variable name > mark template expression is compiled by using a docx document, all expressions in the docx document are obtained through zip + xml (word/document. xml) analysis, and value replacement is carried out to generate a new docx document and complete the generation of the document. The format document can be compiled according to actual business requirements, is simple to compile and easy to integrate, and can be applied to business scenes.
The object of the invention is thus achieved. A method for generating word documents in a templating manner comprises the following steps:
1) beginning to decompress the docx document, extracting word/document.xml files, and analyzing the document.xml to obtain an xml object;
2) the method comprises the steps that (1) w: p paragraph nodes are obtained by traversing from a root node object document.xml, and all text contents of a word are in the w: p nodes;
3) traversing all w: r/w: t sub-nodes of w: p paragraph nodes, obtaining w: t text contents, splicing the texts, and obtaining paragraph contents;
4) by regular expressions
Figure 190406DEST_PATH_IMAGE002
Judging whether the paragraph content has an expression or not, if not, continuing to analyze the next paragraph;
5) traversing w: t text nodes, and starting to judge and analyze the expression of the w: t text content;
6) judging whether the expression $ initial character exists or not, if yes, marking the position of w: t in w: p paragraph; if not, continuing to return to the step 6) to search a starting mark for the next w: t;
7) continuing to judge the expression } end character, if the character exists, recording the position of w: t in w: p paragraph; if not, continuing to return to the step 7) to search for an end mark for the next w: t;
8) recording a starting position and an ending position, collecting text node information from a text node of w: t at the beginning to a text node of w: t at the end, and acquiring text contents from the starting node to the ending node; if w is still, t is not analyzed completely, returning to the step 5) to continue searching the expression;
9) starting to traverse the information collected by the paragraph where the expression $ { } is located, acquiring all w: t nodes of the expression, and acquiring a starting w: t node;
10) traversing all w: t nodes of the expression, and splicing text contents of all w: t;
11) cleaning w: t text contents behind the w: t node at the beginning, judging whether the w: t node with the beginning mark of the next expression possibly exists, clearing the text contents before the ending mark, and writing all the contents spliced in the step 10) into the w: t node at the beginning;
12) reconstructing new empty character string content, searching the position of the character string at the start of the expression, splicing the character strings at the position of the character string at the start of the expression, searching the position of the character string at the end of the expression, extracting a variable name, and returning to the step 9) to continue the next expression analysis if all the character strings are searched;
13) obtaining the value of the corresponding variable name from the parameter map according to the variable name extracted in the step 12), and splicing the value to the content of the character string;
14) judging whether a starting mark $ {' exists in text content behind the expression in the step 12), if yes, returning to the step 12), and continuing splicing and analyzing the expressions until all expressions are analyzed;
15) and completing traversal analysis of all w: p, generating a new document.xml file, covering the new document/document.xml file in the template document, and completing expression replacement of the document.
The method has the advantages that the method is suitable for efficiently generating WORD in batches only by making a template document, replacing the expression with data by a subsequent program, analyzing the XML structure by using a docx document based on XML and ZIP technologies, extracting the expression, performing text replacement, avoiding disorder of the document structure and the style, improving the generation efficiency of the format document, and meeting the standard of WORD documents.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
fig. 2 is a view showing a frame structure of a docx file in the present invention.
Detailed Description
In the existing business system, a general word needs to depend on an office suite, and due to the fact that a plurality of business personnel write, a certain difference exists between a possibly generated document format and an expected word format, and in the situations that a large amount of documents are needed, efficiency and error rate exist, rework and the like, in order to solve the problems, the business personnel write a format document with a variable expression, store the format document in the business system, obtain dynamic data needed by a template document through the business system, call the invention to complete the generation of the document, and the detailed description is given below on a specific scheme.
Firstly, template managers write format documents, variable contents in the documents, and the writers use a $ { variable name } mode as a replacement grammar mark to finish writing the template documents.
The following results the figures and implementations further illustrate the invention:
1) decompressing the docx document, extracting word/document.xml files, and analyzing the document.xml to obtain an xml object;
2) xml, preparing to obtain a variable expression in a paragraph, wherein all text contents of the word are in the w: p nodes (the position of $ { name } needs to be obtained through analysis as follows):
Figure DEST_PATH_IMAGE003
3) all w: r/w: t child nodes of w: p paragraph nodes are traversed. And acquiring the text content of w: t and splicing the text. Acquiring the content of the paragraph;
4) by regular expressions
Figure 970144DEST_PATH_IMAGE002
Judging whether the paragraph content has an expression or not, and continuing the next paragraph analysis if not;
5) traversing w: t text nodes, and starting to judge and analyze the expression of the w: t text content;
6) judging whether the expression $ initial character exists or not, if yes, marking the position of w: t in w: p paragraph; if not, continuing to enter the step 6) to search a starting mark for the next w: t;
7) continuing to judge the expression } end character, if the character exists, recording the position of w: t in w: p paragraph; if not, continuing to step 7) to search for an end mark for the next w: t;
8) recording the starting position and the ending position, and collecting text node information from the text node of the beginning w: t to the text node of the ending w: t; and the text content from the starting node to the ending node can be obtained, if w is t is not analyzed, the step 5) is returned to continue searching the expression;
9) beginning to traverse the expression information collected by the paragraph, acquiring all w: t nodes of the expression, and acquiring the beginning w: t nodes;
10) traversing all w: t nodes of the expression, and splicing text contents of all w: t;
11) clearing the text contents of w: t except the initial w: t node (the w: t node of the initial mark of the expression of the next expression possibly exists, judging the w: t node which is coincident with the next mark, and clearing the text contents before the end mark), and completely writing all the spliced contents into the initial w: t node in step 10);
12) reconstructing new empty character string content, searching the expression starting character string position, splicing character strings at the starting character string position, searching the expression ending character string position, extracting variable names, and returning to the step 9) to continue the next expression analysis if all the character strings are searched;
13) according to the extracted variable names, obtaining values corresponding to the variable names from the parameter map, and splicing the values to the content of the character string;
14) judging whether an expression starting mark exists in text content behind the expression, returning to the step 12) and continuing splicing and analyzing the expression until all expressions are analyzed;
15) and completing traversal analysis of p, generating a new document.xml file, covering the word/document.xml file of the template document, completing expression replacement of the document, and covering the word/document.xml file in the compressed document to complete document generation.
Because the docx file is generated based on a compression mode, no dependency relationship exists between the operating system environment generated by the WORD document and a third-party middleware, so that the independence of the operating system is realized, and the purpose of generating the WORD document in any environment in a deployable manner is achieved.
The open source software POI (http:// POI. apache. org /) supports the function of analyzing word documents for editing, theoretically, the function can also be realized, and the invention scheme is adopted by considering the factors of execution efficiency, software dependence, clear docx document structure and the like.

Claims (1)

1. A method for generating word documents in a templating manner is characterized by comprising the following steps:
1) beginning to decompress the docx document, extracting word/document.xml files, and analyzing the document.xml to obtain an xml object;
2) the method comprises the steps that (1) w: p paragraph nodes are obtained by traversing from a root node object document.xml, and all text contents of a word are in the w: p nodes;
3) traversing all w: r/w: t sub-nodes of w: p paragraph nodes, obtaining w: t text contents, splicing the texts, and obtaining paragraph contents;
4) by regular expressions
Figure DEST_PATH_IMAGE001
Judging whether the paragraph content has an expression or not, if not, continuing to analyze the next paragraph;
5) traversing w: t text nodes, and starting to judge and analyze the expression of the w: t text content;
6) judging whether the expression $ initial character exists or not, if yes, marking the position of w: t in w: p paragraph; if not, continuing to return to the step 6) to search a starting mark for the next w: t;
7) continuing to judge the expression } end character, if the character exists, recording the position of w: t in w: p paragraph; if not, continuing to return to the step 7) to search for an end mark for the next w: t;
8) recording a starting position and an ending position, collecting text node information from a text node of w: t at the beginning to a text node of w: t at the end, and acquiring text contents from the starting node to the ending node; if w is still, t is not analyzed completely, returning to the step 5) to continue searching the expression;
9) starting to traverse the information collected by the paragraph where the expression $ { } is located, acquiring all w: t nodes of the expression, and acquiring a starting w: t node;
10) traversing all w: t nodes of the expression, and splicing text contents of all w: t;
11) cleaning w: t text contents behind the w: t node at the beginning, judging whether the w: t node with the beginning mark of the next expression possibly exists, clearing the text contents before the ending mark, and writing all the contents spliced in the step 10) into the w: t node at the beginning;
12) reconstructing new empty character string content, searching the position of the character string at the start of the expression, splicing the character strings at the position of the character string at the start of the expression, searching the position of the character string at the end of the expression, extracting a variable name, and returning to the step 9) to continue the next expression analysis if all the character strings are searched;
13) obtaining the value of the corresponding variable name from the parameter map according to the variable name extracted in the step 12), and splicing the value to the content of the character string;
14) judging whether a starting mark $ {' exists in text content behind the expression in the step 12), if yes, returning to the step 12), and continuing splicing and analyzing the expressions until all expressions are analyzed;
15) and completing traversal analysis of all w: p, generating a new document.xml file, covering the new document/document.xml file in the template document, and completing expression replacement of the document.
CN202010045710.1A 2020-01-16 2020-01-16 Method for generating word document in template mode Pending CN111159995A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010045710.1A CN111159995A (en) 2020-01-16 2020-01-16 Method for generating word document in template mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010045710.1A CN111159995A (en) 2020-01-16 2020-01-16 Method for generating word document in template mode

Publications (1)

Publication Number Publication Date
CN111159995A true CN111159995A (en) 2020-05-15

Family

ID=70563307

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010045710.1A Pending CN111159995A (en) 2020-01-16 2020-01-16 Method for generating word document in template mode

Country Status (1)

Country Link
CN (1) CN111159995A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950247A (en) * 2020-07-08 2020-11-17 北京明略软件系统有限公司 Configuration-based Word document generation method
CN112232032A (en) * 2020-09-04 2021-01-15 科航(苏州)信息科技有限公司 Method for automatically converting content style of docx document
CN112765948A (en) * 2020-12-31 2021-05-07 山西三友和智慧信息技术股份有限公司 Document generation editing method
CN114239529A (en) * 2021-12-16 2022-03-25 深圳前海环融联易信息科技服务有限公司 Document generation method, device, equipment and medium based on template engine
CN115062252A (en) * 2022-06-15 2022-09-16 江苏未至科技股份有限公司 Method for solving format conflict of webpage generation file when WPS and Word are opened

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106104518A (en) * 2014-03-08 2016-11-09 微软技术许可有限责任公司 For the framework extracted according to the data of example
CN107392143A (en) * 2017-07-20 2017-11-24 中国科学院软件研究所 A kind of resume accurate Analysis method based on SVM text classifications
CN107608951A (en) * 2017-09-22 2018-01-19 上海金智晟东电力科技有限公司 Report form generation method and system
CN108052488A (en) * 2017-12-06 2018-05-18 广东技术师范学院天河学院 Paper automatic generation method based on template
CN108763171A (en) * 2018-04-20 2018-11-06 中国船舶重工集团公司第七〇九研究所 A kind of document automation generation method based on format module
CN109388612A (en) * 2018-09-14 2019-02-26 中国科学院光电研究院 A kind of method, equipment, system and the medium of data summarization document structure tree

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106104518A (en) * 2014-03-08 2016-11-09 微软技术许可有限责任公司 For the framework extracted according to the data of example
CN107392143A (en) * 2017-07-20 2017-11-24 中国科学院软件研究所 A kind of resume accurate Analysis method based on SVM text classifications
CN107608951A (en) * 2017-09-22 2018-01-19 上海金智晟东电力科技有限公司 Report form generation method and system
CN108052488A (en) * 2017-12-06 2018-05-18 广东技术师范学院天河学院 Paper automatic generation method based on template
CN108763171A (en) * 2018-04-20 2018-11-06 中国船舶重工集团公司第七〇九研究所 A kind of document automation generation method based on format module
CN109388612A (en) * 2018-09-14 2019-02-26 中国科学院光电研究院 A kind of method, equipment, system and the medium of data summarization document structure tree

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨倩晨 等: "基于XML的文档自动排版技术" *
袁敏: "学术论文格式检查和内容校对的研究" *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111950247A (en) * 2020-07-08 2020-11-17 北京明略软件系统有限公司 Configuration-based Word document generation method
CN112232032A (en) * 2020-09-04 2021-01-15 科航(苏州)信息科技有限公司 Method for automatically converting content style of docx document
CN112232032B (en) * 2020-09-04 2023-08-18 科航(苏州)信息科技有限公司 Automatic conversion method for content style of docx document
CN112765948A (en) * 2020-12-31 2021-05-07 山西三友和智慧信息技术股份有限公司 Document generation editing method
CN112765948B (en) * 2020-12-31 2024-01-19 山西三友和智慧信息技术股份有限公司 Document generation editing method
CN114239529A (en) * 2021-12-16 2022-03-25 深圳前海环融联易信息科技服务有限公司 Document generation method, device, equipment and medium based on template engine
CN115062252A (en) * 2022-06-15 2022-09-16 江苏未至科技股份有限公司 Method for solving format conflict of webpage generation file when WPS and Word are opened
CN115062252B (en) * 2022-06-15 2023-09-19 江苏未至科技股份有限公司 Method for solving format conflict of webpage generation file when WPS and Word are opened

Similar Documents

Publication Publication Date Title
CN111159995A (en) Method for generating word document in template mode
KR101098718B1 (en) System and method for creating, managing and using code segments
CN108762743B (en) Data table operation code generation method and device
US8589877B2 (en) Modeling and linking documents for packaged software application configuration
KR100692172B1 (en) Universal string analyzer and method thereof
CN110209387B (en) Method and device for generating top-level HDL file and computer readable storage medium
JP2003242136A (en) Syntax information tag imparting support system and method therefor
US20220236971A1 (en) Adapting existing source code snippets to new contexts
US20160224338A1 (en) Analyzing Components Related To A Software Application In A Software Development Environment
CN111913739B (en) Service interface primitive defining method and system
CN112162751A (en) Automatic generation method and system of interface document
CN111124380A (en) Front-end code generation method
Santos OCR evaluation tools for the 21st century
CN109325217B (en) File conversion method, system, device and computer readable storage medium
JPWO2007081017A1 (en) Document processing device
CN111311461A (en) B-S based editor and generation method for structured dynamic medical record form
CN118245050A (en) Front end frame assembly automatic conversion method, system, electronic device and storage medium
CN114116938A (en) Map plotting method and device based on WebGIS
CN112699642A (en) Index extraction method and device for complex medical texts, medium and electronic equipment
CN117725927A (en) Method for identifying and processing clause file of insurance business
CN109597624A (en) A kind of method that SQL is formatted
CN116880826B (en) Visualized code generation method
JP3187317B2 (en) Interactive program generation device
JP4776972B2 (en) Cache generation method, apparatus, program, and recording medium
CN118151936A (en) Code annotation management method, storage medium and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200515