CN113505572A - Method, device, equipment and medium for converting typesetting file into XML data - Google Patents

Method, device, equipment and medium for converting typesetting file into XML data Download PDF

Info

Publication number
CN113505572A
CN113505572A CN202110573661.3A CN202110573661A CN113505572A CN 113505572 A CN113505572 A CN 113505572A CN 202110573661 A CN202110573661 A CN 202110573661A CN 113505572 A CN113505572 A CN 113505572A
Authority
CN
China
Prior art keywords
file
xml
word
initial
converting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110573661.3A
Other languages
Chinese (zh)
Other versions
CN113505572B (en
Inventor
谭伟
王婷
王全鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Founder Electronics Co Ltd
Original Assignee
Beijing Founder Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Founder Electronics Co Ltd filed Critical Beijing Founder Electronics Co Ltd
Priority to CN202110573661.3A priority Critical patent/CN113505572B/en
Publication of CN113505572A publication Critical patent/CN113505572A/en
Application granted granted Critical
Publication of CN113505572B publication Critical patent/CN113505572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • G06F40/154Tree transformation for tree-structured or markup documents, e.g. XSLT, XSL-FO or stylesheets

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application provides a method, a device, equipment and a medium for converting a typesetting file into XML data, wherein the method comprises the following steps: converting the typesetting file to be converted into a Word file and an augmentation resource package, converting the Word file into an initial XML file and a resource mapping file through a Word structured engine, and augmenting the augmentation resource package into the initial XML file according to the preset positions of the resource mapping file and periodical metadata to generate a target XML file. In the technical scheme, the electronic equipment converts the typesetting file into the Word file and the supplement resource packet, and supplements the supplement resource packet to the initial XML file generated by converting the Word file, thereby obtaining the target XML file and effectively improving the integrity of the information of the typesetting file.

Description

Method, device, equipment and medium for converting typesetting file into XML data
Technical Field
The present application relates to the field of document processing technologies, and in particular, to a method, an apparatus, a device, and a medium for converting a composition file into Extensible Markup Language (XML) data.
Background
The typesetting file format is an electronic document format with a fixed layout presentation effect, the presentation of the typesetting file is irrelevant to equipment, and the presentation effects of the layouts are consistent when the layouts are read, printed or printed on various equipment. However, since the typeset file is relatively inconvenient to edit, the typeset file needs to be converted into XML data before archiving, typesetting again and generating a web periodical.
At present, the conversion of the typesetting document into the XML data is mainly performed by converting the typesetting document into a Word document and then converting the Word document into the XML data.
However, in the above scheme, when the Word file is converted into XML data, the content of the composition file may be lost, resulting in poor information integrity.
Disclosure of Invention
The application provides a method, a device, equipment and a medium for converting a typesetting file into XML data, which aim to solve the problem that the information integrity is poor because the information of the typesetting file is likely to be lost when a Word file is converted into the XML data.
In a first aspect, an embodiment of the present application provides a method for converting a typesetting file into XML data, including:
converting a typesetting file to be converted into a Word file and an addendum resource package, wherein the Word file comprises at least one picture object, at least one table object and at least one formula object, the addendum resource package comprises a plate type information set and periodical metadata, and each object in the Word file and plate type information corresponding to the object in the plate type information set are provided with the same initial id;
converting the Word file into an initial XML file and a resource mapping file through a Word structured engine, wherein the resource mapping file comprises a mapping relation between the initial id and the generated target id;
and according to the preset positions of the resource mapping file and the periodical metadata, the supplemented resource packet is supplemented to the initial XML file to generate a target XML file.
In a possible design of the first aspect, the converting the Word file into an initial XML file and a resource mapping file by a Word structuring engine includes:
converting the initial id in the Word file into the target id through the Word structured engine to generate the initial XML file;
and acquiring the mapping relation between the initial id and the target id, and generating the resource mapping file.
In another possible design of the first aspect, the appending the appended resource package to the initial XML file according to the preset location of the resource mapping file and the periodical metadata to generate a target XML file includes:
appending the plate information collection to the initial XML file according to the resource mapping file;
supplementing the periodical metadata to the initial XML file according to the preset position of the periodical metadata;
and generating the target XML file.
Optionally, the plate information set includes layout information corresponding to each picture object, layout information corresponding to each table object, and layout information corresponding to each formula object.
Optionally, the plate information corresponding to each picture object includes size data of the picture object and picture replacement map data;
the plate type information corresponding to each formula object comprises size data and formula substitution graph data of the formula object;
the plate information corresponding to each table object includes size data and table substitute map data of the table object.
Optionally, the metadata of the journal includes page number information, chapter digital object identification number DOI, and publisher information.
Optionally, the substitute map data includes high-definition substitute map data and non-high-definition substitute map data, where a pixel of the high-definition substitute map is higher than a first preset pixel, a pixel of the non-high-definition substitute map is lower than a second preset pixel, and the first preset pixel is larger than the second preset pixel;
wherein the substitute map data includes at least one of the picture substitute map data, the formula substitute map data, and the table substitute map.
In a second aspect, an embodiment of the present application provides an apparatus for converting a composition file into XML data, including:
the conversion module is used for converting a typesetting file to be converted into a Word file and an addendum resource package, wherein the Word file comprises at least one picture object, at least one table object and at least one formula object, the addendum resource package comprises a plate type information set and periodical metadata, and each object in the Word file and plate type information corresponding to the object in the plate type information set are provided with the same initial id;
the conversion module is also used for converting the Word file into an initial XML file and a resource mapping file through a Word structured engine, wherein the resource mapping file comprises a mapping relation between the initial id and the generated target id;
and the supplement module is used for supplementing the supplement resource packet to the initial XML file according to the preset positions of the resource mapping file and the periodical metadata to generate a target XML file.
In a possible design of the second aspect, the conversion module is further configured to:
converting the initial id in the Word file into the target id through the Word structured engine to generate the initial XML file;
and acquiring the mapping relation between the initial id and the target id, and generating the resource mapping file.
In another possible design of the second aspect, the supplementary module is further configured to:
appending the plate information collection to the initial XML file according to the resource mapping file;
supplementing the periodical metadata to the initial XML file according to the preset position of the periodical metadata;
and generating the target XML file.
Optionally, the plate information set includes layout information corresponding to each picture object, layout information corresponding to each table object, and layout information corresponding to each formula object.
Optionally, the plate information corresponding to each picture object includes size data of the picture object and picture replacement map data;
the plate type information corresponding to each formula object comprises size data and formula substitution graph data of the formula object;
the plate information corresponding to each table object includes size data and table substitute map data of the table object.
Optionally, the metadata of the journal includes page number information, chapter digital object identification number DOI, and publisher information.
Optionally, the substitute map data includes high-definition substitute map data and non-high-definition substitute map data, where a pixel of the high-definition substitute map is higher than a first preset pixel, a pixel of the non-high-definition substitute map is lower than a second preset pixel, and the first preset pixel is larger than the second preset pixel;
wherein the substitute map data includes at least one of the picture substitute map data, the formula substitute map data, and the table substitute map.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor, a memory and computer program instructions stored on the memory and executable on the processor for implementing the method of the first aspect and each possible design when the processor executes the computer program instructions.
In a fourth aspect, embodiments of the present application may provide a computer-readable storage medium having stored therein computer-executable instructions for implementing the method provided by the first aspect and each possible design when executed by a processor.
In a fifth aspect, embodiments of the present application provide a computer program product comprising a computer program that, when executed by a processor, is configured to implement the method provided by the first aspect and each possible design.
The method, the device, the equipment and the medium for converting the typesetting file into the XML data provided by the embodiment of the application comprise the following steps: the electronic equipment converts the typesetting file to be converted into a Word file and an augmentation resource package, converts the Word file into an initial XML file and a resource mapping file through a Word structured engine, and finally augments the augmentation resource package into the initial XML file according to the preset positions of the resource mapping file and the periodical metadata to generate a target XML file. The electronic equipment converts the typesetting file to be converted into the Word file and the supplement resource packet, and places the plate-type information set and the periodical metadata in the typesetting file into the supplement resource packet, so that the volume of the Word file can be effectively reduced. Meanwhile, the supplement resource packet is subsequently supplemented to the initial XML file, so that the condition that the content of the typesetting file is lost in the conversion process can be avoided, the integrity of the content of the typesetting file is improved, and the accuracy of the conversion of the typesetting file is further ensured.
Drawings
Fig. 1 is a schematic view of an application scenario of a method for converting a typesetting file into XML data according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a first embodiment of a method for converting a typesetting file into XML data according to an embodiment of the present application;
fig. 3 is a schematic diagram of an augmented resource package according to an embodiment of the present application;
fig. 4 is a flowchart illustrating a second embodiment of a method for converting a typesetting file into XML data according to an embodiment of the present application;
FIG. 5 is a schematic structural diagram of an embodiment of an apparatus for converting a composition file into XML data according to the embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
With the foregoing drawings in mind, certain embodiments of the disclosure have been shown and described in more detail below. These drawings and written description are not intended to limit the scope of the disclosed concepts in any way, but rather to illustrate the concepts of the disclosure to those skilled in the art by reference to specific embodiments.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Before introducing the embodiments of the present application, an application scenario of the embodiments of the present application is explained first:
scientific and technological papers are scientific experiments (or experiments) by scientific technicians or other researchers, which are used for scientific analysis and comprehensive research and elucidation of phenomena (or problems) in the fields of natural science, engineering technology science and human art research, further research, summarization and innovation of the phenomena and problems to obtain results and conclusions, and carry out electronic and written expression according to the requirements of various scientific and technological journals. The scientific thesis has the main functions of recording and summarizing scientific achievements and promoting completion of scientific research work, is an important means of scientific research, and is also a tool for scientific personnel to exchange academic ideas and scientific achievements.
And typesetting the contents of the scientific paper according to the requirements of each scientific journal to generate a typesetting file. The typesetting file is characterized by fixed layout and no run. In the process of using the typesetting document, the display effect of the typesetting document is not changed due to the changes of software and hardware environment and operators, and the typesetting document is completely consistent with the paper document in the aspects of format, layout, font size and the like. However, since the typeset file is relatively inconvenient to edit, the typeset file needs to be converted into XML data before archiving, typesetting again and generating a web periodical.
At present, there are several ways to convert the typesetting document into XML data:
firstly, directly converting a typesetting file into XML data, but the method is more complex and has higher development difficulty;
secondly, in order to solve the problems existing in the above schemes and simplify the conversion process, the typesetting file is converted into a Word file, the Word file is a streaming document, the text content, the elements such as pictures, tables and formulas in the Word file have no fixed positions, and the positions of the elements are changed greatly with the insertion of the elements and the change of the layout, so that the editing is convenient, but the change is severe when the editing is performed, so that the method is not suitable for printing. After being converted into a Word file, the Word file is converted into XML data.
However, in the above scheme, when the Word file is converted into XML data, the information of the layout file may be lost, resulting in poor information integrity.
In view of the above problems, the inventive concept of the present application is as follows: in the current scheme, when the typesetting document is converted into XML data, the integrity of the typesetting document information cannot be guaranteed because the Word document cannot bear all typesetting document elements, such as journal metadata, partial pictures, tables, formulas and the like. Based on this, the inventor finds that if elements which cannot be carried by the Word file can be obtained and added to the converted file after the Word file is converted, the problem that the integrity of the typesetting file information in the prior art cannot be guaranteed can be solved, and therefore the accuracy of typesetting file conversion is improved.
For example, the method for converting a typesetting file into XML data provided in the embodiment of the present application may be applied to an application scenario diagram shown in fig. 1. Fig. 1 is a schematic view of an application scenario of the method for converting a typesetting file into XML data according to the embodiment of the present application, so as to solve the above technical problem. As shown in fig. 1, the application scenario may include: the terminal device and the server, and the data storage device connected with the server can be further included.
For example, in the application scenario shown in fig. 1, the server may obtain, through the network, the type file to be converted sent by the user through the terminal device, and store the type file into the data storage device, so as to be directly used in the subsequent processing of the type file.
In the embodiment of the application, the data storage device may store the typesetting file to be converted, may also store the initial XML file, the resource mapping file, and may also store the generated target XML file. The server may process the typesetting file to be converted in the data storage device, thereby generating a target XML file.
It should be noted that fig. 1 is only a schematic diagram of an application scenario provided in this embodiment of the present application, and this embodiment of the present application does not limit the devices included in fig. 1, nor does it limit the positional relationship between the devices in fig. 1, for example, in fig. 1, the data storage device may be an external memory with respect to the server, and in other cases, the data storage device may also be placed in the server.
In practical applications, since the terminal device is also a processing device with data processing capability, the server in the application scenario shown in fig. 1 may also be implemented by the terminal device. In the embodiments of the present application, the terminal device and the server for data processing may be collectively referred to as an electronic device.
In summary, the electronic device may be any device with data processing, such as a computer, other intelligent terminal, or the like, or may be a cloud, or an entity with processing function, such as a server, and the application does not limit this.
The technical solution of the present application will be described in detail below with reference to specific examples.
It should be noted that the following specific embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments.
Fig. 2 is a flowchart illustrating a first embodiment of a method for converting a typesetting file into XML data according to an embodiment of the present application. As shown in fig. 2, the method for converting the composition file into XML data may include the following steps:
s101: and converting the typesetting file to be converted into a Word file and an addendum resource package.
In this step, in order to ensure that the picture object, the table object, the formula object and the periodical metadata of the composition file are not lost in the conversion process, the electronic device may obtain the composition file to be converted through a network or a data storage device storing the composition file, and convert the composition file to be converted into a Word file and an addendum resource package.
Illustratively, a user generates a typesetting file according to a selected typesetting mode according to an original of the scientific and technological paper, then the typesetting file is stored in the data storage device, and then the electronic device obtains the typesetting file to be converted from the data storage device. The typeset file can comprise relevant content of the thesis such as words, picture objects, table objects, formula objects and the like.
In a specific implementation, the electronic device may convert the composition file into a Word file and an addendum resource package by any one of the existing technologies. In the process of generating the Word file, an identity document (id) generator in the electronic equipment sets different initial ids for each picture object, table object and formula object.
The id generator may generate an initial id for each picture object, table object, and formula object according to a certain regularity, may also generate an initial id randomly, and may also generate an initial id in other manners, and may perform equipment according to actual requirements, which is not specifically limited in the embodiments of the present application.
Illustratively, the Word file comprises at least one picture object, at least one table object and at least one formula object. The Word file may further include a group drawing (combination of a plurality of picture objects), an Object Linking and Embedding (OLE) Object, a combination block (any combination of a picture Object, a table Object, and a formula Object), and an extra-large character (for displaying rare words), and the Word file may further include other thesis content information, which is not limited in the present scheme.
Illustratively, a Word file may be as follows.
Figure BDA0003083467560000081
Figure BDA0003083467560000091
Wherein, FX _ MAT _ ID80002BAF and FX _ MAT _ ID80002BC1 are initial IDs, it should be understood that only part of the content in the Word file and part of the initial ID are given above. In practical application, the Word file and the initial id may have other contents and forms, which may be determined according to practical requirements and are not described herein again.
The supplementary resource package comprises a plate-type information set and periodical metadata, wherein the plate-type information set comprises format information corresponding to each picture object, format information corresponding to each table object and format information corresponding to each formula object.
And setting the same initial identity identification number id for each object in the Word file and the plate type information corresponding to the object in the plate type information set.
Further, the plate type information corresponding to each picture object comprises size data of the picture object and picture replacement picture data; the plate type information corresponding to each formula object comprises size data of the formula object and formula substitution graph data; the plate information corresponding to each table object includes size data of the table object and table substitute map data.
The substitute map data comprises high-definition substitute map data and non-high-definition substitute map data, pixels of the high-definition substitute map are higher than first preset pixels, pixels of the non-high-definition substitute map are lower than second preset pixels, the first preset pixels are larger than the second preset pixels, the first preset pixels and the second preset pixels can be set according to actual requirements, and the scheme does not specifically limit the pixels.
Wherein the substitute map data includes at least one of picture substitute map data, formula substitute map data, and table substitute map. The substitute graph is a screenshot of a picture object, a formula object and a table object in the typesetting file, so that the problem that vector display modes of complex objects such as formulas and tables in an electronic reader have defects can be effectively solved, and the integrity of the complex objects can be saved.
Illustratively, the size data of the picture object may be as follows.
<ObjectBasicFmt id="FX_GRP_ID80004E8B"ref-alter-id="AG1"height="67.98436737"width="136.00296021"type="image"/>
For example, the picture replacement map data may be as follows.
Figure BDA0003083467560000101
Illustratively, the size data for the formula object may be as follows.
<ObjectBasicFmt id="FX_MAT_ID80002BAF"ref-alter-id="AG2"height="4.99533367"width="25.23066711"type="math"/>
<ObjectBasicFmt id="FX_MAT_ID80002BC1"ref-alter-id="AG3"height="3.80999994"width="4.82600021"type="math"/>
For example, formula substitution graph data may be as follows.
Figure BDA0003083467560000102
Illustratively, the size data of the table object may be as follows.
Figure BDA0003083467560000111
Exemplary, the table substitution graph data may be as follows.
Figure BDA0003083467560000112
It will be appreciated that the above only shows a partial set of plate information. In practical application, the plate information set may have other contents and forms, which may be determined according to actual requirements and are not described herein again.
The journal metadata includes page number information, Digital Object Identifier (DOI), and publisher information.
Illustratively, the journal metadata may be as follows.
Figure BDA0003083467560000113
Figure BDA0003083467560000121
Figure BDA0003083467560000131
It is to be understood that only a portion of the journal metadata is shown. In practical application, the metadata of the journal may have other contents and forms, which may be determined according to actual requirements and are not described herein again.
Fig. 3 is a schematic diagram of an augmented resource package according to an embodiment of the present application. As shown in fig. 3, the addendum resource package comprises in common folder 1, folder 2, and 1.XML file.
The XML file is used for storing the plate type information set. The folder 1 is used for storing the original image of the picture object, and the original image of the picture object is a high-definition substitute image of the picture object on the premise that the picture object does not need to be modified. The folder 2 is used for storing a high-definition alternative image and a non-high-definition alternative image of the picture (if the picture object does not need to be modified, the file 1 only needs to store the non-high-definition alternative image of the picture object), a high-definition alternative image and a non-high-definition alternative image of the formula, and a high-definition alternative image and a non-high-definition alternative image of the table.
S102: and converting the Word file into an initial XML file and a resource mapping file through a Word structuring engine.
In this step, after the electronic device obtains the Word file, the electronic device may process the Word file. The electronic device can convert the initial id in the Word file into a target id conforming to the Journal Article Tag Suite (JATS) standard through a Word structured engine, and convert the paper content in the Word file to generate an initial XML file.
Further, the electronic device obtains a mapping relationship between the initial id and the target id, and generates a resource mapping file, in other words, the resource mapping file includes a mapping relationship between the initial id and the generated target id.
Illustratively, the initial XML file may be as follows.
Figure BDA0003083467560000141
Figure BDA0003083467560000151
It will be appreciated that the above only gives a part of the content of the initial XML. In practical application, the initial XML may have other contents and forms, which may be determined according to practical requirements and will not be described herein again.
Illustratively, the resource mapping file may be as follows.
<ComponentResourceRel fxid="FX_GRP_ID80004E8B"id="Graphic1"/>
<ComponentResourceRel fxid="FX_MAT_ID80002BAF"id="M1"/>
<ComponentResourceRel fxid="FX_MAT_ID80002BC1"id="M2"/>
<ComponentResourceRel fxid="FX_MAT_ID80002BD1"id="M3"/>
<ComponentResourceRel fxid="FX_MAT_ID80002BE4"id="M4"/>
<ComponentResourceRel fxid="FX_MAT_ID80002BF3"id="M5"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C02"id="M6"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C11"id="M7"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C20"id="M8"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C2F"id="M9"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C3E"id="M10"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C4D"id="M11"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C5C"id="M12"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C6E"id="M13"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C82"id="M14"/>
<ComponentResourceRel fxid="FX_MAT_ID80002C91"id="M15"/>
<ComponentResourceRel fxid="FX_MAT_ID80002CA0"id="M16"/>
FX _ GRP _ ID80004E8B is the initial ID, and Graphic1 is the corresponding target ID. It is to be understood that the above only shows a part of the contents of the resource mapping file. In practical application, the resource mapping file may have other contents and forms, which may be determined according to actual requirements and are not described herein again.
S103: and according to the preset positions of the resource mapping file and the periodical metadata, the supplemented resource packet is supplemented to the initial XML file to generate a target XML file.
In this step, since the original XML file does not have the plate information set and the metadata of the periodical, the original XML file needs to be supplemented, so that the information becomes complete.
In a particular embodiment, the electronic device appends the slate information collection to the original XML file according to the resource mapping file. And the electronic equipment acquires the target id in the initial XML file and determines the initial id corresponding to the target id according to the mapping relation between the initial id and the target id in the resource mapping file. And then acquiring a substitution graph in the plate information corresponding to the initial id, and appending the substitution graph to the initial XML file.
The alternative graph may be a picture alternative graph, or a formula alternative graph or a table alternative graph, and needs to be determined according to the mapping relationship, and this scheme does not specifically limit this.
Furthermore, according to the preset position of the periodical metadata, the periodical metadata is supplemented to the initial XML file, and a target XML file is generated.
Fig. 4 is a flowchart illustrating a second embodiment of a method for converting a typesetting file into XML data according to the embodiment of the present application. As shown in fig. 4, the electronic device first converts the typesetting file into a Word file and an addendum resource package, and then converts the Word file into an initial XML file and a resource mapping file through a Word structuring engine. And then, the electronic equipment supplements the supplemented resource packet to the initial XML file according to the preset positions of the resource mapping file and the periodical metadata to generate a target XML file. The target XML file generated by the electronic equipment meets the archiving requirement of the data center, contains enough typesetting file information, and can be used for generating a network or sent to the data center for archiving. Meanwhile, if the plate effect of the target XML file needs to be modified or the data synthesis is carried out again, the target XML file can be converted into a typesetting file, and the subsequent processing of the typesetting file is facilitated.
In the method for converting a typesetting file into XML data provided in this embodiment, an electronic device converts a typesetting file to be converted into a Word file and an addendum resource package, converts the Word file into an initial XML file and a resource mapping file through a Word structuring engine, and finally appends the addendum resource package to the initial XML file according to the preset positions of the resource mapping file and metadata of a periodical to generate a target XML file. The electronic equipment converts the typesetting file to be converted into the Word file and the supplement resource packet, and places the plate-type information set and the periodical metadata in the typesetting file into the supplement resource packet, so that the volume of the Word file can be effectively reduced. Meanwhile, the supplement resource packet is subsequently supplemented to the initial XML file, so that the condition that the content of the typesetting file is lost in the conversion process can be avoided, the integrity of the content of the typesetting file is improved, and the accuracy of the conversion of the typesetting file is further ensured.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Fig. 5 is a schematic structural diagram of an embodiment of a device for converting a composition file into XML data according to an embodiment of the present application. As shown in fig. 5, the means for converting the composition file into XML data may include:
the conversion module 51 is configured to convert the typesetting file to be converted into a Word file and an addendum resource package, where the Word file includes at least one picture object, at least one table object, and at least one formula object, the addendum resource package includes a plate information set and periodical metadata, and each object in the Word file and plate information corresponding to an object in the plate information set have the same initial id;
the conversion module 51 is further configured to convert the Word file into an initial XML file and a resource mapping file through a Word structuring engine, where the resource mapping file includes a mapping relationship between an initial id and a generated target id;
and an appending module 52, configured to append the appended resource packet to the initial XML file according to the preset location of the resource mapping file and the metadata of the periodical, so as to generate a target XML file.
In a possible design of this embodiment, the conversion module 51 is further configured to:
converting the initial id in the Word file into a target id through a Word structured engine to generate an initial XML file;
and acquiring the mapping relation between the initial id and the target id to generate a resource mapping file.
In another possible design of this embodiment, the supplement module 52 is further configured to:
supplementing the plate type information set into an initial XML file according to the resource mapping file;
supplementing the periodical metadata to the initial XML file according to the preset position of the periodical metadata;
and generating a target XML file.
Optionally, the plate information set includes layout information corresponding to each picture object, layout information corresponding to each table object, and layout information corresponding to each formula object.
Optionally, the plate information corresponding to each picture object includes size data of the picture object and picture replacement map data;
the plate type information corresponding to each formula object comprises size data of the formula object and formula substitution graph data;
the plate information corresponding to each table object includes size data of the table object and table substitute map data.
Optionally, the journal metadata includes page number information, chapter digital object identification number DOI, and publisher information.
Optionally, the substitute map data includes high-definition substitute map data and non-high-definition substitute map data, pixels of the high-definition substitute map are higher than a first preset pixel, pixels of the non-high-definition substitute map are lower than a second preset pixel, and the first preset pixel is larger than the second preset pixel;
wherein the substitute map data includes at least one of picture substitute map data, formula substitute map data, and table substitute map.
The device for converting the typesetting file into the XML data provided by the embodiment of the application can be used for executing the method for converting the typesetting file into the XML data in the embodiment, and the implementation principle and the technical effect are similar, so that the detailed description is omitted.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application. As shown in fig. 6, the electronic device may include: a processor 61, a memory 62 and computer program instructions stored on the memory 62 and operable on the processor 61, wherein the processor 61 executes the computer program instructions to implement the method for converting a composition file into XML data provided in any of the previous embodiments.
Optionally, the electronic device may further include an interface for interacting with other devices.
Optionally, the above devices of the electronic device may be connected by a system bus.
The memory 62 may be a separate memory unit or a memory unit integrated into the processor. The number of processors is one or more.
It should be understood that the Processor 61 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor, or in a combination of the hardware and software modules in the processor.
The system bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The memory may comprise Random Access Memory (RAM) and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
All or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The aforementioned program may be stored in a readable memory. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned memory (storage medium) includes: read-only memory (ROM), RAM, flash memory, hard disk, solid state disk, magnetic tape, floppy disk, optical disk, and any combination thereof.
The electronic device provided in the embodiment of the present application may be configured to execute the method for converting a typesetting file into XML data provided in any of the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
An embodiment of the present application provides a computer-readable storage medium, in which computer instructions are stored, and when the computer instructions are executed on a computer, the computer is enabled to execute the method for converting the typesetting file into XML data.
The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.
Alternatively, a readable storage medium may be coupled to the processor such that the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.
An embodiment of the present application further provides a computer program product, where the computer program product includes a computer program, the computer program is stored in a computer-readable storage medium, and at least one processor can read the computer program from the computer-readable storage medium, and when the computer program is executed by the at least one processor, the method for converting a typeset file into XML data can be implemented.
It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (11)

1. A method for converting a composition file into XML data, comprising:
converting a typesetting file to be converted into a Word file and an addendum resource package, wherein the Word file comprises at least one picture object, at least one table object and at least one formula object, the addendum resource package comprises a plate type information set and periodical metadata, and each object in the Word file and plate type information corresponding to the object in the plate type information set are provided with the same initial identity identification number id;
converting the Word file into an initial extensible markup language (XML) file and a resource mapping file through a Word structured engine, wherein the resource mapping file comprises a mapping relation between the initial id and the generated target id;
and according to the preset positions of the resource mapping file and the periodical metadata, the supplemented resource packet is supplemented to the initial XML file to generate a target XML file.
2. The method of claim 1, wherein said converting said Word document into an initial extensible markup language (XML) document and a resource mapping document by a Word structuring engine comprises:
converting the initial id in the Word file into the target id through the Word structured engine to generate the initial XML file;
and acquiring the mapping relation between the initial id and the target id, and generating the resource mapping file.
3. The method of claim 1, wherein said appending the appended resource package into the initial XML file according to the preset location of the resource mapping file and the periodical metadata to generate a target XML file comprises:
appending the plate information collection to the initial XML file according to the resource mapping file;
supplementing the periodical metadata to the initial XML file according to the preset position of the periodical metadata;
and generating the target XML file.
4. The method according to any one of claims 1 to 3, wherein the set of plate information includes layout information corresponding to each picture object, layout information corresponding to each table object, and layout information corresponding to each formula object.
5. The method of claim 4, wherein the plate information corresponding to each picture object comprises size data and picture replacement map data of the picture object;
the plate type information corresponding to each formula object comprises size data and formula substitution graph data of the formula object;
the plate information corresponding to each table object includes size data and table substitute map data of the table object.
6. The method according to any one of claims 1 to 3, wherein the journal metadata comprises page number information, chapter digital object identification number DOI, publisher information.
7. The method of claim 5, wherein the substitute map data comprises high-definition substitute map data and non-high-definition substitute map data, wherein pixels of the high-definition substitute map are higher than a first preset pixel, pixels of the non-high-definition substitute map are lower than a second preset pixel, and the first preset pixel is larger than the second preset pixel;
wherein the substitute map data includes at least one of the picture substitute map data, the formula substitute map data, and the table substitute map.
8. An apparatus for converting a composition file into XML data, comprising:
the conversion module is used for converting the typesetting file to be converted into a Word file and an addendum resource package, wherein the Word file comprises at least one picture object, at least one table object and at least one formula object, the addendum resource package comprises a plate type information set and periodical metadata, and each object in the Word file and plate type information corresponding to the object in the plate type information set are provided with the same initial identity identification number id;
the conversion module is also used for converting the Word file into an initial extensible markup language (XML) file and a resource mapping file through a Word structured engine, wherein the resource mapping file comprises a mapping relation between the initial id and the generated target id;
and the supplement module is used for supplementing the supplement resource packet to the initial XML file according to the preset positions of the resource mapping file and the periodical metadata to generate a target XML file.
9. An electronic device, comprising: a processor, a memory, and computer program instructions stored on the memory and executable on the processor, wherein the processor, when executing the computer program instructions, is configured to implement the method of converting a composition file into XML data according to any one of claims 1 to 7.
10. A computer-readable storage medium having stored thereon computer-executable instructions for implementing the method of converting a composition file into XML data according to any one of claims 1 to 7 when executed by a processor.
11. A computer program product comprising a computer program for implementing a method of converting a composition file into XML data according to any one of claims 1 to 7 when the computer program is executed by a processor.
CN202110573661.3A 2021-05-25 2021-05-25 Method, device, equipment and medium for converting typesetting file into XML data Active CN113505572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110573661.3A CN113505572B (en) 2021-05-25 2021-05-25 Method, device, equipment and medium for converting typesetting file into XML data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110573661.3A CN113505572B (en) 2021-05-25 2021-05-25 Method, device, equipment and medium for converting typesetting file into XML data

Publications (2)

Publication Number Publication Date
CN113505572A true CN113505572A (en) 2021-10-15
CN113505572B CN113505572B (en) 2024-02-13

Family

ID=78008672

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110573661.3A Active CN113505572B (en) 2021-05-25 2021-05-25 Method, device, equipment and medium for converting typesetting file into XML data

Country Status (1)

Country Link
CN (1) CN113505572B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007004583A (en) * 2005-06-24 2007-01-11 Rococo:Kk Automatic composition system
CN110196965A (en) * 2018-02-26 2019-09-03 北大方正集团有限公司 The method and device of XML file conversion Word file
CN111274768A (en) * 2018-12-04 2020-06-12 北大方正集团有限公司 Method, device, equipment and storage medium for converting journal paper into XML data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007004583A (en) * 2005-06-24 2007-01-11 Rococo:Kk Automatic composition system
CN110196965A (en) * 2018-02-26 2019-09-03 北大方正集团有限公司 The method and device of XML file conversion Word file
CN111274768A (en) * 2018-12-04 2020-06-12 北大方正集团有限公司 Method, device, equipment and storage medium for converting journal paper into XML data

Also Published As

Publication number Publication date
CN113505572B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
US20110087959A1 (en) Method and device for processing the structure of a layout file
US20130174024A1 (en) Method and device for converting document format
JP6093873B2 (en) Typesetting method, apparatus, program and recording medium
CN111797595A (en) Method and device for generating OFD format page based on XML template
CN114118007B (en) Method for converting format data stream file into OFD file
WO2016023160A1 (en) Method and device for loading view of application and electronic terminal
CN114118011A (en) Document processing method, electronic device and storage medium
CN103885731A (en) Data printing method and device
CN112419136A (en) Picture watermark adding method and device, electronic equipment and storage medium
CN111859865A (en) Method, device, terminal and medium for converting PDF document
CN116719781B (en) Method for generating catalogue and labeling by elastic combination of independent files
CN115757272A (en) Method and system for converting HTML file into OFD file
CN116402020A (en) Signature imaging processing method, system and storage medium based on OFD document
JPS5968040A (en) Card format change processing system
CN113505572B (en) Method, device, equipment and medium for converting typesetting file into XML data
CN103488619B (en) Method and device for processing document file
CN109191379B (en) Panorama splicing semantic annotation method, system, terminal and storage medium
JPS58208865A (en) Document producing device
CN111444452B (en) Webpage conversion method and device and storage medium
CN109948123B (en) Image merging method and device
CN113515929A (en) Typesetting method, device, equipment, storage medium and program product of academic thesis
CN114676097A (en) OFD file processing method, device, equipment and medium
CN111274768B (en) Method, device, equipment and storage medium for converting journal paper into XML data
CN112686000A (en) Format conversion method of electronic book document, electronic equipment and storage medium
WO2019210573A1 (en) Method and apparatus for generating electronic order, and computer device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant