CN111159995A - Method for generating word document in template mode - Google Patents
Method for generating word document in template mode Download PDFInfo
- Publication number
- CN111159995A CN111159995A CN202010045710.1A CN202010045710A CN111159995A CN 111159995 A CN111159995 A CN 111159995A CN 202010045710 A CN202010045710 A CN 202010045710A CN 111159995 A CN111159995 A CN 111159995A
- Authority
- CN
- China
- Prior art keywords
- document
- expression
- text
- xml
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 230000014509 gene expression Effects 0.000 claims abstract description 59
- 238000010195 expression analysis Methods 0.000 claims description 3
- 238000004140 cleaning Methods 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 3
- 239000000284 extract Substances 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 2
- 238000007906 compression Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000013144 data compression Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000012552 review Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/11—File system administration, e.g. details of archiving or snapshots
- G06F16/116—Details of conversion of file system types or formats
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a method for generating a word document in a templating manner, which comprises the steps of decompressing a docx document, extracting a word/document.xml file, and analyzing document.xml to obtain an xml object; the method comprises the steps that (1) w: p paragraph nodes are obtained by traversing from a root node object document.xml, and all text contents of a word are in the w: p nodes; and completing traversal analysis of all w: p, generating a new document.xml file, covering the new document/document.xml file in the template document, and completing expression replacement of the document. The method is realized by using the docx document based on XML and ZIP technology, analyzes the XML structure, extracts the expression and replaces the text, so that the disorder of the document structure and the style can be avoided, the generation efficiency of the format document is improved, and the method accords with the word document standard.
Description
Technical Field
The invention relates to a document processing technology, in particular to a method for generating a word document in a templating mode.
Background
With the application of word documents to our work, in a service scene of partial document processing, there are a large number of documents with the same format, and service personnel need to manually edit and review the large number of format documents, which requires a lot of time consumption and even error condition. It is desirable to provide a method for templating word, replacing dynamic content in a format document with a variable expression (similar to $ { variable name } in other template engines), and replacing the variable expression with the dynamic content when generating the document to complete generation of the document, thereby solving the problems of time consumption and easy error of the document.
Microsoft Office Word is the most popular Word processing program and is the essential productivity tool in our work.
Apache POI is an open source code function library of an Apache software foundation, provides API for Java programs to have the functions of reading and writing Microsoft Office format archives, and generates and modifies word documents through a service system.
The ZIP file format is a file format for data compression and document storage, Microsoft has built-in support for the ZIP format from the Windows ME operating system, and can open and produce compressed files in the ZIP format even if decompression software is not installed on a user's computer, and OS X and popular Linux operating systems also provide similar support for the ZIP format. So the zip format is often the most common choice if files are propagated and distributed over a network.
XML is a markup language for marking electronic documents to be structured.
In generating a large number of documents with the same format and internally distinct, there are two ways:
the first is that business personnel manually write documents, and as the number of documents increases, a large amount of time is consumed, and the problem of errors can easily occur.
The second method is that a developer acquires service data by using an Apache POI and generates a document according to an API provided by the POI, so that the problem of document generation efficiency can be solved, but with the increase of service documents, the developer needs to develop and write service codes responding to the service data, the service person needs to spend a large amount of time on document formats, and the time for testing, version release and the like is also needed.
At present, a business system generates a word format document according to business logic, developers are required to compile corresponding business codes and inquire business data to generate the word document, new business codes are required to be compiled by the developers with the increase of business scenes, and the setting of word styles in a code mode is not as convenient as the visual editing of words due to the complex word format, so that time is saved and the word document is generated. The new business needs to be developed, tested, on-line and the like when being on-line. More manpower, time and maintenance cost are needed.
Disclosure of Invention
The invention provides a method for generating a word document in a templating manner, which is characterized in that a < variable name > mark template expression is compiled by using a docx document, all expressions in the docx document are obtained through zip + xml (word/document. xml) analysis, and value replacement is carried out to generate a new docx document and complete the generation of the document. The format document can be compiled according to actual business requirements, is simple to compile and easy to integrate, and can be applied to business scenes.
The object of the invention is thus achieved. A method for generating word documents in a templating manner comprises the following steps:
1) beginning to decompress the docx document, extracting word/document.xml files, and analyzing the document.xml to obtain an xml object;
2) the method comprises the steps that (1) w: p paragraph nodes are obtained by traversing from a root node object document.xml, and all text contents of a word are in the w: p nodes;
3) traversing all w: r/w: t sub-nodes of w: p paragraph nodes, obtaining w: t text contents, splicing the texts, and obtaining paragraph contents;
4) by regular expressionsJudging whether the paragraph content has an expression or not, if not, continuing to analyze the next paragraph;
5) traversing w: t text nodes, and starting to judge and analyze the expression of the w: t text content;
6) judging whether the expression $ initial character exists or not, if yes, marking the position of w: t in w: p paragraph; if not, continuing to return to the step 6) to search a starting mark for the next w: t;
7) continuing to judge the expression } end character, if the character exists, recording the position of w: t in w: p paragraph; if not, continuing to return to the step 7) to search for an end mark for the next w: t;
8) recording a starting position and an ending position, collecting text node information from a text node of w: t at the beginning to a text node of w: t at the end, and acquiring text contents from the starting node to the ending node; if w is still, t is not analyzed completely, returning to the step 5) to continue searching the expression;
9) starting to traverse the information collected by the paragraph where the expression $ { } is located, acquiring all w: t nodes of the expression, and acquiring a starting w: t node;
10) traversing all w: t nodes of the expression, and splicing text contents of all w: t;
11) cleaning w: t text contents behind the w: t node at the beginning, judging whether the w: t node with the beginning mark of the next expression possibly exists, clearing the text contents before the ending mark, and writing all the contents spliced in the step 10) into the w: t node at the beginning;
12) reconstructing new empty character string content, searching the position of the character string at the start of the expression, splicing the character strings at the position of the character string at the start of the expression, searching the position of the character string at the end of the expression, extracting a variable name, and returning to the step 9) to continue the next expression analysis if all the character strings are searched;
13) obtaining the value of the corresponding variable name from the parameter map according to the variable name extracted in the step 12), and splicing the value to the content of the character string;
14) judging whether a starting mark $ {' exists in text content behind the expression in the step 12), if yes, returning to the step 12), and continuing splicing and analyzing the expressions until all expressions are analyzed;
15) and completing traversal analysis of all w: p, generating a new document.xml file, covering the new document/document.xml file in the template document, and completing expression replacement of the document.
The method has the advantages that the method is suitable for efficiently generating WORD in batches only by making a template document, replacing the expression with data by a subsequent program, analyzing the XML structure by using a docx document based on XML and ZIP technologies, extracting the expression, performing text replacement, avoiding disorder of the document structure and the style, improving the generation efficiency of the format document, and meeting the standard of WORD documents.
Drawings
FIG. 1 is a flow chart of the steps of the present invention;
fig. 2 is a view showing a frame structure of a docx file in the present invention.
Detailed Description
In the existing business system, a general word needs to depend on an office suite, and due to the fact that a plurality of business personnel write, a certain difference exists between a possibly generated document format and an expected word format, and in the situations that a large amount of documents are needed, efficiency and error rate exist, rework and the like, in order to solve the problems, the business personnel write a format document with a variable expression, store the format document in the business system, obtain dynamic data needed by a template document through the business system, call the invention to complete the generation of the document, and the detailed description is given below on a specific scheme.
Firstly, template managers write format documents, variable contents in the documents, and the writers use a $ { variable name } mode as a replacement grammar mark to finish writing the template documents.
The following results the figures and implementations further illustrate the invention:
1) decompressing the docx document, extracting word/document.xml files, and analyzing the document.xml to obtain an xml object;
2) xml, preparing to obtain a variable expression in a paragraph, wherein all text contents of the word are in the w: p nodes (the position of $ { name } needs to be obtained through analysis as follows):
3) all w: r/w: t child nodes of w: p paragraph nodes are traversed. And acquiring the text content of w: t and splicing the text. Acquiring the content of the paragraph;
4) by regular expressionsJudging whether the paragraph content has an expression or not, and continuing the next paragraph analysis if not;
5) traversing w: t text nodes, and starting to judge and analyze the expression of the w: t text content;
6) judging whether the expression $ initial character exists or not, if yes, marking the position of w: t in w: p paragraph; if not, continuing to enter the step 6) to search a starting mark for the next w: t;
7) continuing to judge the expression } end character, if the character exists, recording the position of w: t in w: p paragraph; if not, continuing to step 7) to search for an end mark for the next w: t;
8) recording the starting position and the ending position, and collecting text node information from the text node of the beginning w: t to the text node of the ending w: t; and the text content from the starting node to the ending node can be obtained, if w is t is not analyzed, the step 5) is returned to continue searching the expression;
9) beginning to traverse the expression information collected by the paragraph, acquiring all w: t nodes of the expression, and acquiring the beginning w: t nodes;
10) traversing all w: t nodes of the expression, and splicing text contents of all w: t;
11) clearing the text contents of w: t except the initial w: t node (the w: t node of the initial mark of the expression of the next expression possibly exists, judging the w: t node which is coincident with the next mark, and clearing the text contents before the end mark), and completely writing all the spliced contents into the initial w: t node in step 10);
12) reconstructing new empty character string content, searching the expression starting character string position, splicing character strings at the starting character string position, searching the expression ending character string position, extracting variable names, and returning to the step 9) to continue the next expression analysis if all the character strings are searched;
13) according to the extracted variable names, obtaining values corresponding to the variable names from the parameter map, and splicing the values to the content of the character string;
14) judging whether an expression starting mark exists in text content behind the expression, returning to the step 12) and continuing splicing and analyzing the expression until all expressions are analyzed;
15) and completing traversal analysis of p, generating a new document.xml file, covering the word/document.xml file of the template document, completing expression replacement of the document, and covering the word/document.xml file in the compressed document to complete document generation.
Because the docx file is generated based on a compression mode, no dependency relationship exists between the operating system environment generated by the WORD document and a third-party middleware, so that the independence of the operating system is realized, and the purpose of generating the WORD document in any environment in a deployable manner is achieved.
The open source software POI (http:// POI. apache. org /) supports the function of analyzing word documents for editing, theoretically, the function can also be realized, and the invention scheme is adopted by considering the factors of execution efficiency, software dependence, clear docx document structure and the like.
Claims (1)
1. A method for generating word documents in a templating manner is characterized by comprising the following steps:
1) beginning to decompress the docx document, extracting word/document.xml files, and analyzing the document.xml to obtain an xml object;
2) the method comprises the steps that (1) w: p paragraph nodes are obtained by traversing from a root node object document.xml, and all text contents of a word are in the w: p nodes;
3) traversing all w: r/w: t sub-nodes of w: p paragraph nodes, obtaining w: t text contents, splicing the texts, and obtaining paragraph contents;
4) by regular expressionsJudging whether the paragraph content has an expression or not, if not, continuing to analyze the next paragraph;
5) traversing w: t text nodes, and starting to judge and analyze the expression of the w: t text content;
6) judging whether the expression $ initial character exists or not, if yes, marking the position of w: t in w: p paragraph; if not, continuing to return to the step 6) to search a starting mark for the next w: t;
7) continuing to judge the expression } end character, if the character exists, recording the position of w: t in w: p paragraph; if not, continuing to return to the step 7) to search for an end mark for the next w: t;
8) recording a starting position and an ending position, collecting text node information from a text node of w: t at the beginning to a text node of w: t at the end, and acquiring text contents from the starting node to the ending node; if w is still, t is not analyzed completely, returning to the step 5) to continue searching the expression;
9) starting to traverse the information collected by the paragraph where the expression $ { } is located, acquiring all w: t nodes of the expression, and acquiring a starting w: t node;
10) traversing all w: t nodes of the expression, and splicing text contents of all w: t;
11) cleaning w: t text contents behind the w: t node at the beginning, judging whether the w: t node with the beginning mark of the next expression possibly exists, clearing the text contents before the ending mark, and writing all the contents spliced in the step 10) into the w: t node at the beginning;
12) reconstructing new empty character string content, searching the position of the character string at the start of the expression, splicing the character strings at the position of the character string at the start of the expression, searching the position of the character string at the end of the expression, extracting a variable name, and returning to the step 9) to continue the next expression analysis if all the character strings are searched;
13) obtaining the value of the corresponding variable name from the parameter map according to the variable name extracted in the step 12), and splicing the value to the content of the character string;
14) judging whether a starting mark $ {' exists in text content behind the expression in the step 12), if yes, returning to the step 12), and continuing splicing and analyzing the expressions until all expressions are analyzed;
15) and completing traversal analysis of all w: p, generating a new document.xml file, covering the new document/document.xml file in the template document, and completing expression replacement of the document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010045710.1A CN111159995A (en) | 2020-01-16 | 2020-01-16 | Method for generating word document in template mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010045710.1A CN111159995A (en) | 2020-01-16 | 2020-01-16 | Method for generating word document in template mode |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111159995A true CN111159995A (en) | 2020-05-15 |
Family
ID=70563307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010045710.1A Pending CN111159995A (en) | 2020-01-16 | 2020-01-16 | Method for generating word document in template mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111159995A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950247A (en) * | 2020-07-08 | 2020-11-17 | 北京明略软件系统有限公司 | Configuration-based Word document generation method |
CN112232032A (en) * | 2020-09-04 | 2021-01-15 | 科航(苏州)信息科技有限公司 | Method for automatically converting content style of docx document |
CN112765948A (en) * | 2020-12-31 | 2021-05-07 | 山西三友和智慧信息技术股份有限公司 | Document generation editing method |
CN114239529A (en) * | 2021-12-16 | 2022-03-25 | 深圳前海环融联易信息科技服务有限公司 | Document generation method, device, equipment and medium based on template engine |
CN115062252A (en) * | 2022-06-15 | 2022-09-16 | 江苏未至科技股份有限公司 | Method for solving format conflict of webpage generation file when WPS and Word are opened |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106104518A (en) * | 2014-03-08 | 2016-11-09 | 微软技术许可有限责任公司 | For the framework extracted according to the data of example |
CN107392143A (en) * | 2017-07-20 | 2017-11-24 | 中国科学院软件研究所 | A kind of resume accurate Analysis method based on SVM text classifications |
CN107608951A (en) * | 2017-09-22 | 2018-01-19 | 上海金智晟东电力科技有限公司 | Report form generation method and system |
CN108052488A (en) * | 2017-12-06 | 2018-05-18 | 广东技术师范学院天河学院 | Paper automatic generation method based on template |
CN108763171A (en) * | 2018-04-20 | 2018-11-06 | 中国船舶重工集团公司第七〇九研究所 | A kind of document automation generation method based on format module |
CN109388612A (en) * | 2018-09-14 | 2019-02-26 | 中国科学院光电研究院 | A kind of method, equipment, system and the medium of data summarization document structure tree |
-
2020
- 2020-01-16 CN CN202010045710.1A patent/CN111159995A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106104518A (en) * | 2014-03-08 | 2016-11-09 | 微软技术许可有限责任公司 | For the framework extracted according to the data of example |
CN107392143A (en) * | 2017-07-20 | 2017-11-24 | 中国科学院软件研究所 | A kind of resume accurate Analysis method based on SVM text classifications |
CN107608951A (en) * | 2017-09-22 | 2018-01-19 | 上海金智晟东电力科技有限公司 | Report form generation method and system |
CN108052488A (en) * | 2017-12-06 | 2018-05-18 | 广东技术师范学院天河学院 | Paper automatic generation method based on template |
CN108763171A (en) * | 2018-04-20 | 2018-11-06 | 中国船舶重工集团公司第七〇九研究所 | A kind of document automation generation method based on format module |
CN109388612A (en) * | 2018-09-14 | 2019-02-26 | 中国科学院光电研究院 | A kind of method, equipment, system and the medium of data summarization document structure tree |
Non-Patent Citations (2)
Title |
---|
杨倩晨 等: "基于XML的文档自动排版技术" * |
袁敏: "学术论文格式检查和内容校对的研究" * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111950247A (en) * | 2020-07-08 | 2020-11-17 | 北京明略软件系统有限公司 | Configuration-based Word document generation method |
CN112232032A (en) * | 2020-09-04 | 2021-01-15 | 科航(苏州)信息科技有限公司 | Method for automatically converting content style of docx document |
CN112232032B (en) * | 2020-09-04 | 2023-08-18 | 科航(苏州)信息科技有限公司 | Automatic conversion method for content style of docx document |
CN112765948A (en) * | 2020-12-31 | 2021-05-07 | 山西三友和智慧信息技术股份有限公司 | Document generation editing method |
CN112765948B (en) * | 2020-12-31 | 2024-01-19 | 山西三友和智慧信息技术股份有限公司 | Document generation editing method |
CN114239529A (en) * | 2021-12-16 | 2022-03-25 | 深圳前海环融联易信息科技服务有限公司 | Document generation method, device, equipment and medium based on template engine |
CN115062252A (en) * | 2022-06-15 | 2022-09-16 | 江苏未至科技股份有限公司 | Method for solving format conflict of webpage generation file when WPS and Word are opened |
CN115062252B (en) * | 2022-06-15 | 2023-09-19 | 江苏未至科技股份有限公司 | Method for solving format conflict of webpage generation file when WPS and Word are opened |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111159995A (en) | Method for generating word document in template mode | |
KR101098718B1 (en) | System and method for creating, managing and using code segments | |
CN108762743B (en) | Data table operation code generation method and device | |
US8589877B2 (en) | Modeling and linking documents for packaged software application configuration | |
KR100692172B1 (en) | Universal string analyzer and method thereof | |
CN110209387B (en) | Method and device for generating top-level HDL file and computer readable storage medium | |
JP2003242136A (en) | Syntax information tag imparting support system and method therefor | |
US20220236971A1 (en) | Adapting existing source code snippets to new contexts | |
US20160224338A1 (en) | Analyzing Components Related To A Software Application In A Software Development Environment | |
CN111913739B (en) | Service interface primitive defining method and system | |
CN112162751A (en) | Automatic generation method and system of interface document | |
CN111124380A (en) | Front-end code generation method | |
Santos | OCR evaluation tools for the 21st century | |
CN109325217B (en) | File conversion method, system, device and computer readable storage medium | |
JPWO2007081017A1 (en) | Document processing device | |
CN111311461A (en) | B-S based editor and generation method for structured dynamic medical record form | |
CN118245050A (en) | Front end frame assembly automatic conversion method, system, electronic device and storage medium | |
CN114116938A (en) | Map plotting method and device based on WebGIS | |
CN112699642A (en) | Index extraction method and device for complex medical texts, medium and electronic equipment | |
CN117725927A (en) | Method for identifying and processing clause file of insurance business | |
CN109597624A (en) | A kind of method that SQL is formatted | |
CN116880826B (en) | Visualized code generation method | |
JP3187317B2 (en) | Interactive program generation device | |
JP4776972B2 (en) | Cache generation method, apparatus, program, and recording medium | |
CN118151936A (en) | Code annotation management method, storage medium and computer equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200515 |