CN112699641A - Method for quickly converting batch copy of WORD content to DM based on S1000D standard - Google Patents

Method for quickly converting batch copy of WORD content to DM based on S1000D standard Download PDF

Info

Publication number
CN112699641A
CN112699641A CN202110316627.8A CN202110316627A CN112699641A CN 112699641 A CN112699641 A CN 112699641A CN 202110316627 A CN202110316627 A CN 202110316627A CN 112699641 A CN112699641 A CN 112699641A
Authority
CN
China
Prior art keywords
tag
node
html
standard
label
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110316627.8A
Other languages
Chinese (zh)
Other versions
CN112699641B (en
Inventor
孙国防
蒋巍
孙浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Guorui Xinwei Software Co ltd
Original Assignee
Nanjing Guorui Xinwei Software Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Guorui Xinwei Software Co ltd filed Critical Nanjing Guorui Xinwei Software Co ltd
Priority to CN202110316627.8A priority Critical patent/CN112699641B/en
Publication of CN112699641A publication Critical patent/CN112699641A/en
Application granted granted Critical
Publication of CN112699641B publication Critical patent/CN112699641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention relates to a method for quickly converting batch copy of WORD content to DM based on S1000D standard, belonging to the field of data format content conversion, comprising the following steps: html of the document; judging whether the word document is copied, if so, checking whether the word document is a registered word paste command, and if not, calling a common conversion operation; judging whether the current cursor is positioned at the editing area node or not; acquiring an object of a paste command; starting multi-thread processing; judging whether to end the thread according to the processing time of each thread; acquiring template information; necessary parameters are set in the Transformer conversion object, and the object is converted into an Xml file of standard DM S1000D. According to the method, the contents such as the title, the emphasis, the subscript adding, the order, the disorder, the table, the picture, the icon, the text and the like in the word can be automatically converted into the contents corresponding to the S1000D through the template engine, and the editing efficiency is improved.

Description

Method for quickly converting batch copy of WORD content to DM based on S1000D standard
Technical Field
The invention relates to a method for quickly converting batch copy of word content to DM based on S1000D standard, belonging to the technical field of intelligent data processing.
Background
The S1000D standard DM is defined in Xml format. When compiling the S1000D standard manual, all DMs must eventually be saved in Xml files. The IETM manual has most users written manual data in the form of word documents before it is widespread domestically.
In order to convert the original manual content into the standard IETM manual data of S1000D, the conventional method is to transcribe the content in the word into the Xml format by copying, pasting and the like in the process of writing DM. The method cannot fully reuse original data, and has low compiling efficiency and low accuracy.
Disclosure of Invention
In order to solve the technical problem, the invention provides a method for fast conversion of batch copy of WORD content to DM based on S1000D standard, comprising the following steps:
step 1: html of the document: performing html tag language text conversion on the full text of the document, and outputting an html tag language text;
step 2: judging whether the html tag language text obtained in the step 1 is from a word document, if so, entering a step 3, and if not, calling a common conversion operation;
and step 3: checking whether the html tag language text is a registered word pasting command or not, if not, registering the word pasting command in a cached command set, and then entering a step 4, and if so, directly entering the step 4;
and 4, step 4: judging whether the current cursor is positioned in the editing area node or not, if not, warning and prompting, and if so, entering the step 5;
and 5: acquiring an object to be pasted of a word, executing a pasting command, and initializing an xslt style template;
step 6: starting multi-thread operation, cutting, adjusting or deleting unnecessary contents of the pasted html label language text, and preparing for converting the S1000D node;
and 7: if the current thread can process the source data within 1 second, a log dialog box is not popped up, and if the current thread is not executed within 1 second, the current thread is ended, and an operation log dialog box is popped up;
and 8: acquiring a style template according to an xslt style file path, if style template information exists in a cache, acquiring default style template information from the cache, if no style template information exists in the cache, storing the style template information into the cache, and directly acquiring the style template information from the cache when the style template information is needed;
and step 9: and acquiring a Transformer conversion object through the style template, setting necessary parameters in the Transformer conversion object, and converting the Xml file into an Xml file of S1000D standard DM.
Further, in the step 1, a copy/paste command is triggered through a keyboard "CTRL + C/CTRL + V", document contents are copied to the clipboard, a document data format is set to html format, and html format contents in the clipboard are obtained through a transform object.
Further, in the step 2, whether the html tag language text is copied from the WORD document is judged according to the information of 'xmlns: w = urn: schema-microsoft-com: office: WORD' in the underlying html data.
Further, the specific execution process of the multi-thread operation in step 6 is as follows:
step 6.1: converting character strings in the html tag language text into Document objects for analysis, deleting a naming space and annotation content in the html tag language text, downloading picture content to a local temporary folder, storing common pictures in png format, storing VISIO format in EMF compression format, and numbering the file in image + format;
step 6.2: deleting redundant meta tag content corresponding to the bottom layer in the html tag language text, and deleting a style related to the global font;
step 6.3: deleting the original word plain text content style, the ordered list, the unordered list style and the text layout format, and modifying the language into a zh-CN format;
step 6.4: deleting the link style in the whole text of the bottom layer original word document;
step 6.5: modifying the column width represented by percentage according to the column width of the original word document bottom table, and adding a processing instruction for preprocessing the table represented by the standard label of S1000D;
step 6.6: modifying a picture tag, changing a bottom layer < img > tag in an original word document into < figure >, adding an attribute ' class = ' figchoice ' to represent a picture, and if the attribute is not added, defaulting as an icon;
step 6.7: judging whether a bottom < p > tag is an ordered list or an unordered list according to the attribute ' class = ' MsoNormal ' of the bottom < p > tag in an original word document, if so, converting the < p > tag into a corresponding ordered "< ol > < li >" tag, and if not, converting the < p > tag into a corresponding unordered "< ul > < li >" tag;
step 6.8: processing the legend or the table notation, and filling the title of the picture or the table according to the "class = 'five choice' or the" class = 'choice' in the picture or the table label in the original word document;
step 6.9: preprocessing the title label, converting the original word document into a corresponding "< div class = 'section1' > < h1> < h1> </div >" label according to a bottom layer label in the whole text of the original word document, such as a first-level title label "< h1> </h1 >", and converting the label into a corresponding S1000D node or step label according to "class = 'section 1'";
step 6.10: deleting bottom-layer empty text labels, span labels and styles in the original word document, replacing contents in tags of < b >, < big >, < cite >, < em >, < i >, < small >, < strong > and < u > "in an outer < p > label, and deleting tags of < b >, < big >, < state >, < em >, < i >, < small >, < strong > and < u >" and styles thereof;
step 6.11: generating corresponding label content as the input original Xml content converted into the S1000D label according to the steps 6.1-6.10;
step 6.12: and (4) resolving the input original Xml content generated in the step 6.11 into a "Document" object, and preparing for converting into standard S1000D content.
Further, the current thread is ended by thread in step 7.
Further, the specific process of acquiring the style template through the singleton schema in step 9 and setting the necessary parameters in the "Document" object includes:
when hierarchy or step node conversion is carried out, setting the current DM type into a "Document" object, if the current DM type is a description type, converting an html label < div class = "section1" > into an S1000D standard level paragraph label < levelledPara >, converting the html label < div class = "section2" > into an S1000D standard level sub-level paragraph label < levelledPara >, representing a level paragraph label < levelledPara > style by 1, 1.1, 1.1.1, and determining the parent-child or brother relationship of the node according to the class attribute value; if the current DM type is a program type, the html tag < div class = "section1" > is converted into a level step node tag < procarallstep >, < div class = "section2" > is converted into a sub-level step node tag < leveledpara > or < procarallstep >, and the level style is represented by 1, 1.1, 1.1.1, and the parent-child or sibling relationship of the node is determined according to the attribute value of class;
when the title node conversion is carried out, if the current DM type is a description type, the current DM type is converted into a < title > tag; if the current DM is a program type, converting html tags < h1> - < h6> into a para tag of S1000D standard Xml;
when paragraph node conversion is carried out, converting the < P > tag of html into the corresponding < para > tag of Xml of S1000D standard, and converting the < P > tag of html into the < warning and automation Para > tag of Xml of S1000D standard if the parent node of the current node is a warning tag, an attention tag and a note tag during analysis according to the context relationship of the nodes;
when performing ordered/unordered list conversion, converting the ordered tag < ol > < li > of html into < sequentialllist > < listItem > tag of S1000D standard Xml, and representing the hierarchical paragraph tag style by 1, 1.1, 1.1.1; expressing the unordered style by using a ". multidot.L" as an unordered tag < ul > < li > S1000D standard Xml < randomList > < listItem > tag;
when icon conversion is carried out, if no < div class = ' figure ' > label exists in source data html, and only < img alt = ' … ' src = ' xxx.png ' >/img > appears, the default is changed to S1000D standard Xml icon label < symbol src = '/>;
when table conversion is carried out, according to a table style template, converting the source data table node and the child node with the S1000D node according to the corresponding relation of tr < - - > row, td < - - > entry, cluster < - - > colspec and caption < - > title; if the source data table is crossed, calculating a current starting column and an ending column through a processing instruction, and taking a corresponding result as the attribute values of the cells namest and nameend;
when text operation is performed, if the processing instruction <. toxml-text > exists, the text is regarded as a default to be copied, and if the processing instruction <. toxml-text > does not exist, the text is regarded as a para node to be copied.
Further, in step 9, after the template style file is converted into the standard S1000D node, whether the converted node can be inserted is determined according to the context constraint relationship of the current cursor position node, if the converted node can be inserted, the converted node according to the style template is pasted to the corresponding position, and if the converted node cannot be inserted, the prompt message description is performed.
The invention has the beneficial effects that: users only need to copy word contents in batches and paste the word contents into an editing tool, and Xml files meeting the standard requirements can be generated quickly. In the generating process, the contents of title, emphasis, subscript adding, ordering, disorder, table, picture, icon, text and the like in the word can be automatically converted into the contents of chapter number, bold, slant, underline, subscript adding, ordering, disorder, table, graph, text and the like corresponding to S1000D through the template engine, and the editing efficiency is improved.
Drawings
FIG. 1 is a schematic diagram of the conversion step of the present invention;
FIG. 2 is a diagram of a WORD raw file according to an embodiment of the present invention;
FIG. 3 is a graph of data after conversion of an embodiment of the present invention to the Xml standard of S1000D.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
As shown in FIG. 1, the method for batch copy of WORD content to fast conversion based on S1000D standard DM includes the following steps:
step 1: html of the document: performing html tag language text conversion on the full text of the document, and outputting an html tag language text; triggering a system clipboard command through a keyboard 'CTRL + C/CTRL + V', setting a data format to be an html format, and obtaining html format contents in the clipboard through a Transformer conversion object.
Step 2: determining whether the html tag language text obtained in the step 1 is from a word document, if so, entering a step 3, and if not, calling a common conversion operation; whether html markup language text is copied from a WORD document or other documents is determined from "xmlns: w = urn: schema-microsoft-com: office: WORD" information in the underlying html data.
And step 3: checking whether the html tag language text is a registered word pasting command or not, if not, registering the word pasting command in a cached command set, and then entering the step 4, and if so, directly entering the step 4;
and 4, step 4: judging whether the current cursor is positioned in the editing area node or not, if not, warning and prompting, and if so, performing the step 5;
and 5: acquiring an object to be pasted of a word, executing a pasting command, and initializing an xslt style template;
step 6: starting multithreading operation, cutting, adjusting or deleting unnecessary content of the html label language text corresponding to the pasted word content, and preparing for converting the S1000D node; the specific execution process of the multi-thread operation is as follows:
step 6.1: converting character strings in the html tag language text into Document objects for analysis, deleting a naming space and annotation content in the html tag language text, downloading picture content to a local temporary folder, storing common pictures in png format, storing pictures in VISIO format in EMF compression format, and numbering the file in image format plus format;
step 6.2: deleting redundant meta tag content corresponding to the bottom layer in the html tag language text, and deleting a style related to the global font;
step 6.3: deleting the original word plain text content style, the ordered list, the unordered list style and the text layout format, and modifying the language into a zh-CN format;
step 6.4: deleting the link style in the whole text of the bottom layer original word document;
step 6.5: modifying the column width represented by percentage according to the column width of the original word document bottom table, and adding a processing instruction for preprocessing the table represented by the standard label of S1000D;
step 6.6: modifying a picture tag, changing a bottom layer < img > tag in an original word document into < figure >, adding an attribute ' class = ' figchoice ' to represent a picture, and if the attribute is not added, defaulting as an icon;
step 6.7: judging whether a bottom < p > tag is an ordered list or an unordered list according to the attribute ' class = ' MsoNormal ' of the bottom < p > tag in an original word document, if so, converting the < p > tag into a corresponding ordered "< ol > < li >" tag, and if not, converting the < p > tag into a corresponding unordered "< ul > < li >" tag;
step 6.8: processing the legend or the table notation, and filling the title of the picture or the table according to the "class = 'five choice' or the" class = 'choice' in the picture or the table label in the original word document;
step 6.9: preprocessing the title label, converting the original word document into a corresponding "< div class = 'section1' > < h1> < h1> </div >" label according to a bottom layer label in the whole text of the original word document, such as a first-level title label "< h1> </h1 >", and converting the label into a corresponding S1000D node or step label according to "class = 'section 1'";
step 6.10: deleting bottom-layer empty text labels, span labels and styles in the original word document, replacing contents in tags of < b >, < big >, < cite >, < em >, < i >, < small >, < strong > and < u > "in an outer < p > label, and deleting tags of < b >, < big >, < state >, < em >, < i >, < small >, < strong > and < u >" and styles thereof;
step 6.11: generating corresponding label content as the input original Xml content converted into the S1000D label according to the steps 6.1-6.10;
step 6.12: and (4) resolving the input original Xml content generated in the step 6.11 into a "Document" object, and preparing for converting into standard S1000D content.
And 7: if the current thread can process the source data within 1 second, no log dialog box is popped up, and if the current thread is not executed within 1 second, the current thread is ended, and an operation log dialog box is popped up; join (1000) ends the current thread, where 1000 is 1 second.
And 8: acquiring a style template according to an xslt style file path, if template information exists in a cache, acquiescing the template information to be taken from the cache, if the template information does not exist, storing the template information into the cache, and directly taking the template information from the cache when the template information is needed;
and step 9: and acquiring a Transformer conversion object through a template, setting necessary parameters in the Transformer conversion object, and converting the Xml file into an Xml file of standard DM S1000D.
An example of a specific application of the present invention is given below, with reference to fig. 2 and 3:
step 1: firstly, selecting the content to be copied, namely the content in fig. 2, and positioning the cursor to the node position to be copied in the editing area through a keyboard 'CTRL + C/CTRL + V'.
Step 2: whether the copied data is sourced from a WORD document is determined according to the 'xmlns: w = urn: schema-microsoft-com: office: WORD' information in the underlying html data.
And step 3: if the word is copied from the word document, firstly checking whether a word paste command is registered, if not, registering the paste command in a command set, and directly searching the current operation command from the cache when copying and pasting at the later stage.
And 4, step 4: before executing the command, preprocessing judgment is carried out, whether the current cursor is positioned in the node of the editing area is judged, and if the cursor of the editing area is not positioned in the node, warning prompt is carried out.
And 5: and acquiring html content corresponding to the word by using the word source data as shown in FIG. 2 through the Transformer conversion object.
Step 6: and then cutting, adjusting or deleting unnecessary content of the html content at the bottom layer, preparing for converting the standard node of S1000D, popping up a dialog box, displaying specific operation log information, reducing repeated operation, saving steps and improving command accuracy.
And 7: whether the dialog box is displayed or not is determined according to the word content execution time, and if the current thread finishes processing the source data within 1 second, a log dialog box is not popped up by default; if it has not been executed for more than 1 second, the current thread is ended by thread.join (1000), and an oplog dialog box pops up.
And 8: the method comprises the steps of obtaining a style template through an xslt style file path, obtaining the style template mainly through a singleton mode, obtaining default template information from a cache if the cache has the template information, storing the template information into the cache if the cache does not have the template information, directly obtaining the template information from the cache if needed, and improving the efficiency without re-analysis.
And step 9: the Transformer conversion object is obtained through the template, and necessary parameters are set in the object.
Step 9.1: setting the current DM type into this object, the effect exhibited by the following DM, which is different when styling, is different.
Step 9.2: the current cursor node is set in the object, and whether the converted node can be inserted or not is judged according to the current cursor position node when pasting. Parameters are set so that the conversion is standard and clear, and preparation is made for subsequent conversion.
Converting according to the template style file, wherein the specific conversion steps are as follows:
the node conversion operation for a hierarchy or step is as follows: according to the DM type introduced in step 9, if the current DM type is a description type, < div class = "section1" > is converted to < levelledPara >, if the current DM is a program type, < div class = "section1" > is converted to < procduralStep >, and < div class = "section2" > is converted to a child level < levelparka > or < procdurestap >, and the hierarchy style is represented by 1, 1.1, 1.1.1, and the parent-child or sibling relationship of the node is determined according to the attribute value of class.
The conversion logic for the title node is as follows: according to the source data < h1> < h2> … … < h6>, if the current DM type is a description type, it is turned to < title >; if the current DM is a program type, the procedure goes to < para >. Referring to fig. 3, for example, source data:
<div class="section1">
< h1> description of Whole machine function </h1>
</div>
If the current DM type is a description type, turning to:
<levelledPara>
< title >1. description of the overall machine function </title >
</levelledPara>
If the current DM type is a program type, turning to:
<proceduralStep>
< para 1. complete machine function description </para >
</proceduralStep>
The conversion for paragraph nodes is as follows: the < p > tag is converted to the corresponding < para >, wherein if the node where the cursor is positioned in the editing area is < warming >, < resolution >, or < note >, it is converted to the corresponding < warming and automationpara > node.
The conversion for an ordered unordered list is as follows: < ol > < li > is converted into a corresponding < sequentialllist > < listItem > node, and the hierarchical style is represented by 1, 1.1, 1.1.1; and converting the node into a corresponding < randomList > < listItem > node according to < ul > < li >, and expressing the unordered style by using ". multidot.. For example, ordered source data:
<ol><li>
< p > low and medium altitude warning; [ p ]
</li></ol>
Transition to the corresponding S1000D ordered list tag:
<sequentialList><listItem>
< para > low and medium altitude warning; [ par ] of
<listItem><sequentialList>
Unordered list source data:
<ul><li>
< p > low and medium altitude warning; [ p ]
</li></ul>
Transition to the corresponding S1000D unordered list tag:
<randomList><listItem>
< para > low and medium altitude warning; [ par ] of
<listItem><randomList>
The conversion for a picture is as follows: converting the source data < div class = "configuration" > into a picture according to class = "configuration", wherein the corresponding S1000D node is < configuration >; converting into a title corresponding to the picture according to class = 'figcation' in < p class = 'figcation'; and converting into a corresponding < graphic > node according to < img alt = "" src = "" xxx. For example, picture source data (picture data in fig. 2):
<div class="figure">
< p class = "figdisplacement" > composition of antenna vehicle >
<img alt="…" src="xxx.png"></img>
</div>
The corresponding S1000D picture node is converted into:
<figure id="fig-0001">
< title > antenna base </title >
<graphic infoEntityIdent=" "/>
</figure>
The icon transitions are as follows: if the source data does not have < div class = "configuration" >, only < img alt = "…" src = "xxx.png" >/img > appears, the default is to change to the icon < symbol src = ""/> "…" as the content of the fill-in.
For table transformation, according to the table style template, the source data table nodes and child nodes are transformed into the corresponding relationship with the S1000D node as follows:
tr- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -; if several source data tables cross columns, it is necessary to calculate the current start column and end column by processing the instruction, and take the corresponding result as the attribute values of the cells namest and nameend.
For example: table source data:
<table>
< Caption > composition of Servo Cabinet >
<colgroup width="25%"></colgroup><colgroup width="25%"></colgroup>
<tbody>
<tr>
<td><p><small>1</small></p></td>
< td > < p > < small > shaft angle transformation extension </small > </p > </td >
</tr>
</body>
</table>
Conversion to the corresponding S1000D format is as follows:
<table>
< title > Servo Cabinet composition </title >
<colspec width="25%"/><colspec width="25%"/>
<tbody>
<row>
<entry><para><small>1</small></para></entry>
< entry > < para > < small > axial angle transformation extension </small > </para > </entry >
</row>
</body>
</table>
For text operation, a processing instruction <. toxml-text > needs to be added when source data is processed, if the processing instruction exists, the processing instruction is used as a default to copy as a text, and if the processing instruction does not exist, the processing instruction is used as a para node to copy. After the nodes are converted into the standard S1000D nodes, whether the converted nodes can be inserted is judged according to the context constraint relation of the current cursor position node, if the converted nodes can be inserted, the nodes converted into the style templates are pasted to the corresponding positions, if the converted nodes cannot be inserted, prompting is needed, and finally the data shown in the figure 3 is obtained.
And then local cache pictures are obtained and uploaded to a server in batch through a multithreading technology so as to be reused at a later period. The method is compatible with different servers and is suitable for conversion of various different conditions.
Therefore, the invention can automatically convert the contents of the title, the emphasis, the subscript, the order, the disorder, the table, the picture, the icon, the text and the like in the word into the contents of the chapter number, the bold, the inclination, the underline, the subscript, the order, the disorder, the table, the graph, the text and the like corresponding to S1000D through the template engine, thereby greatly improving the editing efficiency.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (7)

  1. Batch copy of WORD content to fast conversion method based on S1000D standard DM, characterized by: the method comprises the following steps:
    step 1: html of the document: performing html tag language text conversion on the full text of the document, and outputting an html tag language text;
    step 2: judging whether the html tag language text obtained in the step 1 is from a word document, if so, entering a step 3, and if not, calling a common conversion operation;
    and step 3: checking whether the html tag language text is a registered word pasting command or not, if not, registering the word pasting command in a cached command set, and then entering a step 4, and if so, directly entering the step 4;
    and 4, step 4: judging whether the current cursor is positioned in the editing area node or not, if not, warning and prompting, and if so, entering the step 5;
    and 5: acquiring an object to be pasted of a word, executing a pasting command, and initializing an xslt style template;
    step 6: starting multi-thread operation, cutting, adjusting or deleting unnecessary contents of the pasted html label language text, and preparing for converting the S1000D node;
    and 7: if the current thread can process the source data within 1 second, a log dialog box is not popped up, and if the current thread is not executed within 1 second, the current thread is ended, and an operation log dialog box is popped up;
    and 8: acquiring a style template according to an xslt style file path, if style template information exists in a cache, acquiring default style template information from the cache, if no style template information exists in the cache, storing the style template information into the cache, and directly acquiring the style template information from the cache when the style template information is needed;
    and step 9: and acquiring a Transformer conversion object through the style template, setting necessary parameters in the Transformer conversion object, and converting the Xml file into an Xml file of S1000D standard DM.
  2. 2. The batch copy of WORD content to fast conversion method based on S1000D standard DM as claimed in claim 1, wherein: in the step 1, a copying/pasting command is triggered through a keyboard 'CTRL + C/CTRL + V', the document content is copied to the clipboard, the document data format is set to be html format, and html format content in the clipboard is obtained through a Transformer conversion object.
  3. 3. The batch copy of WORD content to fast conversion method based on S1000D standard DM as claimed in claim 1, wherein: in the step 2, whether the html tag language text is copied from the WORD document is judged according to the information of 'xmlns: w = urn: schema-microsoft-com: office: WORD' in the underlying html data.
  4. 4. The batch copy of WORD content to fast conversion method based on S1000D standard DM as claimed in claim 1, wherein: the specific execution process of the multi-thread operation in the step 6 is as follows:
    step 6.1: converting character strings in the html tag language text into Document objects for analysis, deleting a naming space and annotation content in the html tag language text, downloading picture content to a local temporary folder, storing common pictures in png format, storing VISIO format in EMF compression format, and numbering the file in image + format;
    step 6.2: deleting redundant meta tag content corresponding to the bottom layer in the html tag language text, and deleting a style related to the global font;
    step 6.3: deleting the original word plain text content style, the ordered list, the unordered list style and the text layout format, and modifying the language into a zh-CN format;
    step 6.4: deleting the link style in the whole text of the bottom layer original word document;
    step 6.5: modifying the column width represented by percentage according to the column width of the original word document bottom table, and adding a processing instruction for preprocessing the table represented by the standard label of S1000D;
    step 6.6: modifying a picture tag, changing a bottom layer < img > tag in an original word document into < figure >, adding an attribute ' class = ' figchoice ' to represent a picture, and if the attribute is not added, defaulting as an icon;
    step 6.7: judging whether a bottom < p > tag is an ordered list or an unordered list according to the attribute ' class = ' MsoNormal ' of the bottom < p > tag in an original word document, if so, converting the < p > tag into a corresponding ordered "< ol > < li >" tag, and if not, converting the < p > tag into a corresponding unordered "< ul > < li >" tag;
    step 6.8: processing the legend or the table notation, and filling the title of the picture or the table according to the "class = 'five choice' or the" class = 'choice' in the picture or the table label in the original word document;
    step 6.9: preprocessing the title label, converting the original word document into a corresponding "< div class = 'section1' > < h1> < h1> </div >" label according to a bottom layer label in the whole text of the original word document, such as a first-level title label "< h1> </h1 >", and converting the label into a corresponding S1000D node or step label according to "class = 'section 1'";
    step 6.10: deleting bottom-layer empty text labels, span labels and styles in the original word document, replacing contents in tags of < b >, < big >, < cite >, < em >, < i >, < small >, < strong > and < u > "in an outer < p > label, and deleting tags of < b >, < big >, < state >, < em >, < i >, < small >, < strong > and < u >" and styles thereof;
    step 6.11: generating corresponding label content as the input original Xml content converted into the S1000D label according to the steps 6.1-6.10;
    step 6.12: and (4) resolving the input original Xml content generated in the step 6.11 into a "Document" object, and preparing for converting into standard S1000D content.
  5. 5. The batch copy of WORD content to fast conversion method based on S1000D standard DM as claimed in claim 1, wherein: join (1000) ends the current thread in said step 7.
  6. 6. The batch copy of WORD content to fast conversion method based on S1000D standard DM as claimed in claim 1, wherein: the specific process of acquiring the style template through the singleton schema in the step 9 and setting the necessary parameters in the "Document" object includes:
    when hierarchy or step node conversion is carried out, setting the current DM type into a "Document" object, if the current DM type is a description type, converting an html label < div class = "section1" > into an S1000D standard level paragraph label < levelledPara >, converting the html label < div class = "section2" > into an S1000D standard level sub-level paragraph label < levelledPara >, representing a level paragraph label < levelledPara > style by 1, 1.1, 1.1.1, and determining the parent-child or brother relationship of the node according to the class attribute value; if the current DM type is a program type, the html tag < div class = "section1" > is converted into a level step node tag < procarallstep >, < div class = "section2" > is converted into a sub-level step node tag < leveledpara > or < procarallstep >, and the level style is represented by 1, 1.1, 1.1.1, and the parent-child or sibling relationship of the node is determined according to the attribute value of class;
    when the title node conversion is carried out, if the current DM type is a description type, the current DM type is converted into a < title > tag; if the current DM is a program type, converting html tags < h1> - < h6> into a para tag of S1000D standard Xml;
    when paragraph node conversion is carried out, converting the < P > tag of html into the corresponding < para > tag of Xml of S1000D standard, and converting the < P > tag of html into the < warning and automation Para > tag of Xml of S1000D standard if the parent node of the current node is a warning tag, an attention tag and a note tag during analysis according to the context relationship of the nodes;
    when performing ordered/unordered list conversion, converting the ordered tag < ol > < li > of html into < sequentialllist > < listItem > tag of S1000D standard Xml, and representing the hierarchical paragraph tag style by 1, 1.1, 1.1.1; expressing the unordered style by using a ". multidot.L" as an unordered tag < ul > < li > S1000D standard Xml < randomList > < listItem > tag;
    when icon conversion is carried out, if no < div class = ' figure ' > label exists in source data html, and only < img alt = ' … ' src = ' xxx.png ' >/img > appears, the default is changed to S1000D standard Xml icon label < symbol src = '/>;
    when table conversion is carried out, according to a table style template, converting the source data table node and the child node with the S1000D node according to the corresponding relation of tr < - - > row, td < - - > entry, cluster < - - > colspec and caption < - > title; if the source data table is crossed, calculating a current starting column and an ending column through a processing instruction, and taking a corresponding result as the attribute values of the cells namest and nameend;
    when text operation is performed, if the processing instruction <. toxml-text > exists, the text is regarded as a default to be copied, and if the processing instruction <. toxml-text > does not exist, the text is regarded as a para node to be copied.
  7. 7. The batch copy of WORD content to fast conversion method based on S1000D standard DM as claimed in claim 1, wherein: in step 9, after the template style file is converted into the standard S1000D node, whether the converted node can be inserted is determined according to the context constraint relationship of the current cursor position node, if the converted node can be inserted, the converted node according to the style template is pasted to the corresponding position, and if the converted node cannot be inserted, the prompt message description is performed.
CN202110316627.8A 2021-03-25 2021-03-25 Method for quickly converting batch copy of WORD content to DM based on S1000D standard Active CN112699641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110316627.8A CN112699641B (en) 2021-03-25 2021-03-25 Method for quickly converting batch copy of WORD content to DM based on S1000D standard

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110316627.8A CN112699641B (en) 2021-03-25 2021-03-25 Method for quickly converting batch copy of WORD content to DM based on S1000D standard

Publications (2)

Publication Number Publication Date
CN112699641A true CN112699641A (en) 2021-04-23
CN112699641B CN112699641B (en) 2021-07-20

Family

ID=75515678

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110316627.8A Active CN112699641B (en) 2021-03-25 2021-03-25 Method for quickly converting batch copy of WORD content to DM based on S1000D standard

Country Status (1)

Country Link
CN (1) CN112699641B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297425A (en) * 2021-06-22 2021-08-24 超凡知识产权服务股份有限公司 Document conversion method, device, server and storage medium
CN115688690A (en) * 2022-11-16 2023-02-03 金航数码科技有限责任公司 Dynamic conversion method for converting Word document content into XML fragment conforming to S1000D standard
CN115756437A (en) * 2022-11-30 2023-03-07 金航数码科技有限责任公司 Visual XML data compiling method and system based on SCHEMA file

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102207975A (en) * 2011-06-24 2011-10-05 天津大学 Method for manufacturing and displaying extensive makeup language (xml) data module based on ietm standard
CN102298575A (en) * 2010-06-28 2011-12-28 北大方正集团有限公司 Method and system for copying and pasting Word file content with format
CN104391655A (en) * 2014-11-17 2015-03-04 浪潮电子信息产业股份有限公司 Method for automatically copying files to multiple U disks
CN105786921A (en) * 2014-12-26 2016-07-20 北京航天测控技术有限公司 Data module conversion method and device for non-structured document
CN108363760A (en) * 2018-02-02 2018-08-03 东南大学 IETM display datas based on B/S models generate and Off-line control method
CN110083805A (en) * 2018-01-25 2019-08-02 北京大学 A kind of method and system that Word file is converted to EPUB file
CN111666747A (en) * 2020-05-29 2020-09-15 中国工程物理研究院计算机应用研究所 Method for generating WORD document into description class data module conforming to S1000D standard

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102298575A (en) * 2010-06-28 2011-12-28 北大方正集团有限公司 Method and system for copying and pasting Word file content with format
CN102207975A (en) * 2011-06-24 2011-10-05 天津大学 Method for manufacturing and displaying extensive makeup language (xml) data module based on ietm standard
CN104391655A (en) * 2014-11-17 2015-03-04 浪潮电子信息产业股份有限公司 Method for automatically copying files to multiple U disks
CN105786921A (en) * 2014-12-26 2016-07-20 北京航天测控技术有限公司 Data module conversion method and device for non-structured document
CN110083805A (en) * 2018-01-25 2019-08-02 北京大学 A kind of method and system that Word file is converted to EPUB file
CN108363760A (en) * 2018-02-02 2018-08-03 东南大学 IETM display datas based on B/S models generate and Off-line control method
CN111666747A (en) * 2020-05-29 2020-09-15 中国工程物理研究院计算机应用研究所 Method for generating WORD document into description class data module conforming to S1000D standard

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113297425A (en) * 2021-06-22 2021-08-24 超凡知识产权服务股份有限公司 Document conversion method, device, server and storage medium
CN113297425B (en) * 2021-06-22 2023-09-12 超凡知识产权服务股份有限公司 Document conversion method, device, server and storage medium
CN115688690A (en) * 2022-11-16 2023-02-03 金航数码科技有限责任公司 Dynamic conversion method for converting Word document content into XML fragment conforming to S1000D standard
CN115688690B (en) * 2022-11-16 2023-10-03 金航数码科技有限责任公司 Dynamic conversion method for converting Word document content into XML fragment conforming to S1000D standard
CN115756437A (en) * 2022-11-30 2023-03-07 金航数码科技有限责任公司 Visual XML data compiling method and system based on SCHEMA file
CN115756437B (en) * 2022-11-30 2023-10-03 金航数码科技有限责任公司 Visual XML data compiling method and system based on SCHEMA file

Also Published As

Publication number Publication date
CN112699641B (en) 2021-07-20

Similar Documents

Publication Publication Date Title
CN112699641B (en) Method for quickly converting batch copy of WORD content to DM based on S1000D standard
US8407585B2 (en) Context-aware content conversion and interpretation-specific views
US7627592B2 (en) Systems and methods for converting a formatted document to a web page
EP1920350B1 (en) Markup based extensibility for user interfaces
US9110877B2 (en) Method and apparatus for utilizing an extensible markup language schema for managing specific types of content in an electronic document
US7721195B2 (en) RTF template and XSL/FO conversion: a new way to create computer reports
RU2348064C2 (en) Method and system of extending functional capacity of insertion for computer software applications
US8484552B2 (en) Extensible stylesheet designs using meta-tag information
RU2422889C2 (en) Defining fields for presented files and extensible markup language scheme for bibliography and citation
US7617449B2 (en) Method and system for mapping content between a starting template and a target template
US20040015782A1 (en) Templating method for automated generation of print product catalogs
US20060075337A1 (en) Method, system, and computer-readable medium for creating, inserting, and reusing document parts in an electronic document
US20110078165A1 (en) Document-fragment transclusion
JPWO2007081017A1 (en) Document processing device
JP4566196B2 (en) Document processing method and apparatus
Racine Energy, economics, replication & reproduction
US20060095838A1 (en) Object-oriented processing of tab text
JPWO2007052680A1 (en) Document processing apparatus and document processing method
KR101251686B1 (en) Determining fields for presentable files and extensible markup language schemas for bibliographies and citations
JP2003345783A (en) Document preparing method
Racine Energy, Economics & Replication
Mittelbach et al. Enhancing LATEX to automatically produce tagged and accessible PDF
JP2004038496A (en) Xml document preparing system
CN116226035A (en) Method and device for converting OpenXML document into Web form
JP2000339307A (en) Typesetting device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant