CN112668282A - Method and system for converting format of equipment procedure document - Google Patents
Method and system for converting format of equipment procedure document Download PDFInfo
- Publication number
- CN112668282A CN112668282A CN202011580075.3A CN202011580075A CN112668282A CN 112668282 A CN112668282 A CN 112668282A CN 202011580075 A CN202011580075 A CN 202011580075A CN 112668282 A CN112668282 A CN 112668282A
- Authority
- CN
- China
- Prior art keywords
- document
- data
- file
- jsp
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The invention relates to a method and a system for converting the format of a device procedure document, wherein the method comprises the following steps: s1: a step of reading at least one device protocol document; s2: a step of parsing the read device protocol document, S3: step of filtering invalid flag data, S4: a step of training streaming document in-memory model data, S5: step of JSP file model generation, S6: and writing out the JSP file module.
Description
Technical Field
The invention belongs to the technical field of document conversion, and particularly relates to a method and a system for converting the format of a device procedure document.
Background
Equipment code documents are a class of documents of requirements and procedures associated with the accurate operation of equipment by plant operators. However, due to the fact that the data structures of various devices are different, the requirements of operating devices are necessarily different, the compiled operating device regulation documents are different, and the method for converting thousands of equipment regulation documents in a power plant into the entry pages of each type of equipment regulation in an information system becomes a great problem of improving the development efficiency of the information system.
In the prior art, when a word document is converted into an html page, the problems of unit frame line loss, invalid character scaling function, minimum absolute line height change, wrong characters when pictures exist in the document and the like exist, the word document cannot be converted into a jsp entry page on the basis of the generated html page, a common open source tool is only used for processing the content in the document, a tool for converting a device specification document (word) into the jsp entry page is lacked, each jsp entry page can be designed only by means of a system developer referring to the word document, and the efficiency is very low. This is a disadvantage of the prior art.
In view of the above, the present invention provides a method and system for converting a device specification document format, so as to solve the above-mentioned defects in the prior art.
Disclosure of Invention
The present invention is directed to provide a method and a system for converting a device specification document format to solve the above technical problems.
In order to achieve the purpose, the invention provides the following technical scheme:
a method of device protocol document format conversion comprising the steps of:
s1: a step of reading at least one device protocol document;
s2: the step of analyzing the read equipment procedure document comprises the following steps:
acquiring tables, texts, pictures, formulas and special characters in document data, generating original marking data, and storing the pictures in the document data into a preset first folder in a classified manner;
s3: a step of filtering invalid marker data, comprising:
filtering the original marked data, filtering invalid marked data, and generating streaming document memory model data;
s4: the method for training the streaming document memory model data comprises the following steps:
inputting the memory model data of the streaming document into a pre-trained rule engine model for training, identifying and analyzing each document, acquiring the category of each element and the position information in a source document, recording the analyzed position and category parameters, and performing data conversion;
s5: the JSP file model generation step comprises the following steps:
generating a JSP file model corresponding to each source file according to the recognition result output by the rule engine model;
s6: the step of writing out the JSP file module comprises the following steps:
writing out the JSP file model to a disk, generating a standard JSP entry page, a style file and a script file, and giving content prompt information of successful conversion or abnormal conversion.
Preferably, in the step S1,
judging whether a file with a specified format is detected;
and judging whether the read file exceeds the default file capacity size, and if so, compressing the file through ZIP compression. File reading errors and the situation that the file cannot be uploaded due to overlarge capacity are avoided.
Preferably, in the step S3,
traversing the document tree structure by taking a document as a unit;
extracting data in the stream memory object by taking a page as a unit to construct stream document memory model data;
and saving the document pictures according to a preset path. Invalid marking data are quickly and accurately filtered.
Preferably, in step S4, the rule engine model is obtained by converting a plurality of device rule document documents including pictures, tables, formulas, and special characters into JSP files and training the JSP files.
Preferably, in step S6, the presentation information includes: and converting the position of the abnormal information in the source document and the position of the converted abnormal information in the JSP page.
The invention also provides a system for converting the format of the equipment procedure document, which comprises the following steps:
the document uploading module uploads the document data of at least one equipment procedure to the application server;
the document analysis module is used for analyzing the read equipment procedure document, acquiring tables, texts, pictures, formulas and special characters in the document data, generating original marked data and storing the pictures in the document data into a preset first folder in a classified manner;
the data filtering module is used for filtering the original marked data and filtering invalid marked data to generate streaming document memory model data;
the data conversion module is used for training streaming document memory model data, inputting the streaming document memory model data into a pre-trained rule engine model for training, identifying and analyzing each document, acquiring the category of each element and position information in a source document, recording the analyzed position and category parameters, and performing data conversion;
the file generation module is used for generating a JSP file model corresponding to each source file according to the recognition result output by the rule engine model;
and the information display module writes the JSP file model out of the disk, generates a standard JSP entry page, a style file and a script file, and gives content prompt information of successful conversion or abnormal conversion.
Preferably, in the document uploading module,
judging whether a file with a specified format is detected;
and judging whether the read file exceeds the default file capacity, and if so, compressing the file through ZIP compression. File reading errors and the situation that the file cannot be uploaded due to overlarge capacity are avoided.
Preferably, in the data filtering module,
traversing the document tree structure by taking a document as a unit;
extracting data in the stream memory object by taking a page as a unit to construct stream document memory model data;
and saving the document pictures according to a preset path. Invalid marking data are quickly and accurately filtered.
Preferably, in the data conversion module, the rule engine model is obtained by converting a plurality of device rule document documents including pictures, tables, formulas and special characters into JSP files in advance and training the JSP files.
Preferably, in the information display module, the prompt information includes: and converting the position of the abnormal information in the source document and the position of the converted abnormal information in the JSP page.
In the present application, the document is a Microsoft Office Word document.
The method has the advantages that the problems that when a common word document is converted into an html page, cell frame lines are lost, the character zooming function is invalid, the absolute line height is changed into the minimum line height, and when pictures exist in the document, characters are wrongly arranged are solved; based on a rule engine model, identifying and analyzing memory model data corresponding to each document, acquiring the category of each element and position information and style data in a source document, recording the analyzed position and category parameters, and accurately converting the data into a JSP (Java Server Page) file model according to a machine training model to improve the document conversion accuracy; the method frees developers from the complicated and repeated work of copying, pasting and continuously drawing the table, so that the developers are concentrated in the work of regular model training and JSP page verification, development efficiency is improved, and development cost is saved.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.
Drawings
FIG. 1 is a flow chart of a method for device protocol document format conversion provided by the present invention.
FIG. 2 is a functional block diagram of a system for device protocol document format conversion in accordance with the present invention.
The system comprises a document uploading module, a document analyzing module, a data filtering module, a data converting module, a file generating module and an information displaying module, wherein the document uploading module is 1, the document analyzing module is 2, the data filtering module is 3, the data converting module is 4, the file generating module is 5, and the information displaying module is 6.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings by way of specific examples, which are illustrative of the present invention and are not limited to the following embodiments.
Example 1:
as shown in FIG. 1, the present embodiment provides a method for converting the format of a device protocol document, which includes the following steps:
step S1: uploading one or more device protocol documents to the system.
It is practicable that the document uploading is transmitted by using zip compression technology, and the document may be a Microsoft Office Word document.
Step S2: analyzing the source file through Jacob, acquiring tables, texts, pictures, formulas, special characters and the like in the file to generate original marking data, and storing the pictures in the file into a first preset folder in a classified manner.
Elements in each page of the document, such as text, images, paths, shadows, etc., may be rendered into a List set, looping through all pages.
Step S3: and filtering invalid tag data from the data in the S2 to generate streaming document memory model data.
The filtering of invalid data is performed using the Jsoup component.
And step S4, inputting the streaming document memory model data filtered in the step S3 into a pre-trained rule engine model, identifying and analyzing each document, acquiring the category of each element and the position information in the source document, recording the analyzed position and category parameters, and performing data conversion according to a preset rule.
The rule engine model is obtained by converting a plurality of equipment procedure documents containing pictures, tables, formulas and special characters into JSP files in advance and training the JSP files.
In particular, in one embodiment, a rule engine model of Drools may be used to identify the document in-memory model data.
The pre-trained rule engine model may be a binary model obtained by pre-training a rule task. After the document to be recognized is input into the pre-trained rule engine model, if the output of the rule engine model is true, the document is successfully converted into the JSP file model; and if the output of the rule engine model is false, the abnormal content or the content which cannot be converted exists in the document in the conversion process.
The rule engine model is obtained by training the following steps:
and acquiring memory document model data after conversion of a plurality of equipment procedure documents containing various pictures, tables, formulas and special characters as training samples, wherein if a standard JSP (Java Server Page) file model can be generated after analysis, filtration and conversion are carried out on the samples according to a specified rule task, the real result is 1, and otherwise, the real result is 0.
Inputting a plurality of training samples into a rule engine model to be trained;
the rule engine model to be trained outputs a prediction result of whether the conversion in the sample is successful; the prediction result is 1 or 0;
confirming whether a real result exists in the training sample, a prediction result of the output JSP file model and a preset loss function are used for calculating a success rate;
judging whether the rule engine model to be trained is OVER according to the success rate; if the OVER is adopted, the rule engine model to be trained is the trained rule engine model;
and if not, adjusting the rule task of the rule engine model to be trained, and returning to the step of inputting the plurality of training samples into the rule engine model to be trained.
Step S5: and generating a JSP file model corresponding to each source file according to the recognition result output by the rule engine model.
Step S6: writing out the jsp file model to a disk, generating a standard jsp input page, a style file and a script file, and giving prompt information of contents which are converted successfully and are abnormal by the system.
In an implementation manner, the S6 may display a prompt box including prompt information of a document conversion operation on the display interface, where the prompt information of the operation includes: and successfully converting, converting the position of the abnormal information in the source document, converting the position of the abnormal information in the JSP page and the like.
Example 2:
as shown in FIG. 2, the present embodiment provides a system for format conversion of a device protocol document, which includes:
the file upload module 1, upload the data of at least one apparatus procedure file to the application server;
in the module, whether a file with a specified format is detected is judged;
and judging whether the read file exceeds the default file capacity, and if so, compressing the file through ZIP compression. File reading errors and the situation that the file cannot be uploaded due to overlarge capacity are avoided.
A document analysis module 2, in which the read device procedure document is analyzed, tables, texts, pictures, formulas and special characters in the document data are obtained, original marked data are generated, and the pictures in the document data are classified and stored in a preset first folder;
a data filtering module 3, in which the original marked data is filtered, and invalid marked data is filtered out, so as to generate streaming document memory model data;
in the module, a document tree structure is traversed by taking a document as a unit;
extracting data in the stream memory object by taking a page as a unit to construct stream document memory model data;
and saving the document pictures according to a preset path. Invalid marking data are quickly and accurately filtered.
A data conversion module 4, in which the streaming document memory model data is trained, the streaming document memory model data is input into a pre-trained rule engine model for training, each document is identified and analyzed, the category of each element and the position information in the source document are obtained, the analyzed position and category parameters are recorded, and data conversion is performed;
in the module, a rule engine model is obtained by converting a plurality of equipment procedure document documents containing pictures, tables, formulas and special characters into JSP files in advance and training the JSP files.
A file generation module 5, in which a JSP file model corresponding to each source file is generated according to the recognition result output by the rule engine model;
and the information display module 6 is used for writing out the JSP file model to a disk, generating a standard JSP input page, a style file and a script file, and giving content prompt information of successful conversion or abnormal conversion. The prompt message includes: and converting the position of the abnormal information in the source document and the position of the converted abnormal information in the JSP page.
After a standard entry page is generated, the uploaded device procedure document data is cleared through a timing thread, so that the situation that the uploaded document occupies too much disk space is avoided.
The rules engine model referred to in this application is based on Drools, in which the Rete algorithm is referred to as ReteOO, meaning that Drools enhances and optimizes the Rete algorithm for Object Oriented systems (Object Oriented systems).
The rule engine model related to the application mainly comprises: a global rules engine and a transformation rules engine.
The global rule engine describes the contents of global rule matching and conversion, and mainly comprises a JSP output path rule task, a picture name ID generation rule task, an input box/drop-down list/date component generation rule task and other global rule tasks;
the conversion rule engine is mainly used for converting the analyzed document memory data into a display and entry rule task in jsp file model data. The display rule task is mainly used for judging the rules of non-input contents such as table width automatic conversion, picture path automatic conversion, title automatic conversion and the like; the entry rule is mainly used for rule standards of input contents such as date/input box/drop-down list and the like.
In the present application:
equipment procedures: the technical specification that operators need to master operation skills for ensuring safe operation of instruments and equipment and keeping good working state is provided. The content of the equipment operation regulations is to make provisions for matters, programs, actions and the like which must be observed by operators in the whole operation process according to the structural operation characteristics of the equipment, the requirements of safe operation and the like.
Jacob: you can call COM components and Win32 libraries in the Java application through this component.
Jsoup is a Java HTML parser and can directly parse a certain URL address and HTML text content. It provides a very labor-saving set of APIs that can fetch and manipulate data through DOM, CSS and jQuery-like manipulation methods.
Drools: based on Rete algorithm, the open source business rule engine model is easy to access enterprise strategy, adjust and manage, meets the standard in the industry, and is high in speed and efficiency.
The above disclosure is only for the preferred embodiments of the present invention, but the present invention is not limited thereto, and any non-inventive changes that can be made by those skilled in the art and several modifications and amendments made without departing from the principle of the present invention shall fall within the protection scope of the present invention.
Claims (10)
1. A method of device protocol document format conversion, comprising the steps of:
s1: a step of reading at least one device protocol document;
s2: the step of analyzing the read equipment procedure document comprises the following steps:
acquiring tables, texts, pictures, formulas and special characters in document data, generating original marking data, and storing the pictures in the document data into a preset first folder in a classified manner;
s3: a step of filtering invalid marker data, comprising:
filtering the original marked data, filtering invalid marked data, and generating streaming document memory model data;
s4: the method for training the streaming document memory model data comprises the following steps:
inputting the memory model data of the streaming document into a pre-trained rule engine model for training, identifying and analyzing each document, acquiring the category of each element and the position information in a source document, recording the analyzed position and category parameters, and performing data conversion;
s5: the JSP file model generation step comprises the following steps:
generating a JSP file model corresponding to each source file according to the recognition result output by the rule engine model;
s6: the step of writing out the JSP file module comprises the following steps:
writing out the JSP file model to a disk, generating a standard JSP entry page, a style file and a script file, and giving content prompt information of successful conversion or abnormal conversion.
2. The method for device protocol document format conversion according to claim 1, wherein in the step S1,
judging whether a file with a specified format is detected;
and judging whether the read file exceeds the default file capacity, and if so, compressing the file through ZIP compression.
3. The method for device protocol document format conversion according to claim 2, wherein in the step S3,
traversing the document tree structure by taking a document as a unit;
extracting data in the stream memory object by taking a page as a unit to construct stream document memory model data;
and saving the document pictures according to a preset path.
4. The method according to claim 3, wherein in step S4, the rule engine model is obtained by converting a plurality of device rule document documents containing pictures, tables, formulas and special characters into JSP files in advance and training them.
5. The method for device protocol document format conversion according to claim 4, wherein in step S6, the prompt message includes: and converting the position of the abnormal information in the source document and the position of the converted abnormal information in the JSP page.
6. A system for device protocol document format conversion, comprising:
the document uploading module uploads the document data of at least one equipment procedure to the application server;
the document analysis module is used for analyzing the read equipment procedure document, acquiring tables, texts, pictures, formulas and special characters in the document data, generating original marked data and storing the pictures in the document data into a preset first folder in a classified manner;
the data filtering module is used for filtering the original marked data and filtering invalid marked data to generate streaming document memory model data;
the data conversion module is used for training streaming document memory model data, inputting the streaming document memory model data into a pre-trained rule engine model for training, identifying and analyzing each document, acquiring the category of each element and position information in a source document, recording the analyzed position and category parameters, and performing data conversion;
the file generation module is used for generating a JSP file model corresponding to each source file according to the recognition result output by the rule engine model;
and the information display module writes the JSP file model out of the disk, generates a standard JSP entry page, a style file and a script file, and gives content prompt information of successful conversion or abnormal conversion.
7. The device protocol document format conversion system of claim 6, wherein in the document upload module,
judging whether a file with a specified format is detected;
and judging whether the read file exceeds the default file capacity, and if so, compressing the file through ZIP compression.
8. The system for device protocol document format conversion of claim 7, wherein in the data filtering module,
traversing the document tree structure by taking a document as a unit;
extracting data in the stream memory object by taking a page as a unit to construct stream document memory model data;
and saving the document pictures according to a preset path.
9. The system of claim 8, wherein the rule engine model in the data transformation module is obtained by converting a plurality of device procedure documents containing pictures, tables, formulas and special characters into JSP files in advance and training the JSP files.
10. The system for device protocol document format conversion of claim 9, wherein the information display module includes: and converting the position of the abnormal information in the source document and the position of the converted abnormal information in the JSP page.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011580075.3A CN112668282B (en) | 2020-12-28 | 2020-12-28 | Method and system for converting format of equipment procedure document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011580075.3A CN112668282B (en) | 2020-12-28 | 2020-12-28 | Method and system for converting format of equipment procedure document |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112668282A true CN112668282A (en) | 2021-04-16 |
CN112668282B CN112668282B (en) | 2023-02-03 |
Family
ID=75410959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011580075.3A Active CN112668282B (en) | 2020-12-28 | 2020-12-28 | Method and system for converting format of equipment procedure document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112668282B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779235A (en) * | 2021-09-13 | 2021-12-10 | 北京市律典通科技有限公司 | Word document outline recognition processing method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1773508A (en) * | 2005-11-15 | 2006-05-17 | 李利鹏 | Method for converting source file to target web document |
CN102130843A (en) * | 2010-01-20 | 2011-07-20 | 北京开普互联科技有限公司 | Intelligent-document-platform-based multi-channel information acquisition and exchange method |
CN103136173A (en) * | 2011-11-29 | 2013-06-05 | 北京建龙重工集团有限公司 | Method converting mass word or excel format form documents into webpages |
CN107239271A (en) * | 2016-03-29 | 2017-10-10 | 滴滴(中国)科技有限公司 | Develop document structure tree method and device |
CN111382437A (en) * | 2020-03-03 | 2020-07-07 | 思客云(北京)软件技术有限公司 | Defect detection method, device and computer readable storage medium based on configuration analysis engine |
-
2020
- 2020-12-28 CN CN202011580075.3A patent/CN112668282B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1773508A (en) * | 2005-11-15 | 2006-05-17 | 李利鹏 | Method for converting source file to target web document |
CN102130843A (en) * | 2010-01-20 | 2011-07-20 | 北京开普互联科技有限公司 | Intelligent-document-platform-based multi-channel information acquisition and exchange method |
CN103136173A (en) * | 2011-11-29 | 2013-06-05 | 北京建龙重工集团有限公司 | Method converting mass word or excel format form documents into webpages |
CN107239271A (en) * | 2016-03-29 | 2017-10-10 | 滴滴(中国)科技有限公司 | Develop document structure tree method and device |
CN111382437A (en) * | 2020-03-03 | 2020-07-07 | 思客云(北京)软件技术有限公司 | Defect detection method, device and computer readable storage medium based on configuration analysis engine |
Non-Patent Citations (1)
Title |
---|
ZORRO1X1: "将word文档按原格式放在JSP页面里", 《CSDN博客》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113779235A (en) * | 2021-09-13 | 2021-12-10 | 北京市律典通科技有限公司 | Word document outline recognition processing method and device |
CN113779235B (en) * | 2021-09-13 | 2024-02-02 | 北京市律典通科技有限公司 | Word document outline recognition processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN112668282B (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108595389B (en) | Method for converting Word document into txt plain text document | |
CN111310693B (en) | Intelligent labeling method, device and storage medium for text in image | |
CN108170468B (en) | Method and system for automatically detecting annotation and code consistency | |
CN109670477B (en) | PDF table-oriented automatic identification system and method | |
CN106960058A (en) | A kind of structure of web page alteration detection method and system | |
CN111695014A (en) | Method, system, device and storage medium for automatically generating manuscripts based on AI (artificial intelligence) | |
CN113010638A (en) | Entity recognition model generation method and device and entity extraction method and device | |
CN112418813A (en) | AEO qualification intelligent rating management system and method based on intelligent analysis and identification and storage medium | |
CN112668282B (en) | Method and system for converting format of equipment procedure document | |
CN112990142B (en) | Video guide generation method, device and equipment based on OCR (optical character recognition), and storage medium | |
CN112990091A (en) | Research and report analysis method, device, equipment and storage medium based on target detection | |
RU2398276C2 (en) | Analysis alternatives in scope trees | |
CN112783957A (en) | Method and system for importing word document format for English reading | |
KR20160014335A (en) | Computer readable medium recording program for authoring online learning contents and d method of authoring online learning contents | |
CN116306506A (en) | Intelligent mail template method based on content identification | |
CN110941947A (en) | Document editing method and device, computer storage medium and terminal | |
CN115546815A (en) | Table identification method, device, equipment and storage medium | |
CN114973798A (en) | Word learning card generation method and device | |
CN114115831A (en) | Data processing method, device, equipment and storage medium | |
CN114037828A (en) | Component identification method and device, electronic equipment and storage medium | |
CN113821555A (en) | Unstructured data collection processing method of intelligent supervision black box | |
CN113947510A (en) | Real estate electronic license management system based on file format self-adaptation | |
CN112395189A (en) | Method, device and equipment for automatically identifying test video and storage medium | |
CN110837614A (en) | Method and system for efficiently generating webpage information extraction rule | |
JP2020198023A (en) | Information processing apparatus, method, and program |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder |
Address after: Yinhe building, 2008 Xinluo street, high tech Industrial Development Zone, Jinan City, Shandong Province Patentee after: Shandong luruan Digital Technology Co.,Ltd. Address before: Yinhe building, 2008 Xinluo street, high tech Industrial Development Zone, Jinan City, Shandong Province Patentee before: SHANDONG LUNENG SOFTWARE TECHNOLOGY Co.,Ltd. |
|
CP01 | Change in the name or title of a patent holder |