WO2009000141A1 - Procédé, système et dispositif de représentation d'informations de structure logique de fichier de mise en page - Google Patents

Procédé, système et dispositif de représentation d'informations de structure logique de fichier de mise en page Download PDF

Info

Publication number
WO2009000141A1
WO2009000141A1 PCT/CN2008/000910 CN2008000910W WO2009000141A1 WO 2009000141 A1 WO2009000141 A1 WO 2009000141A1 CN 2008000910 W CN2008000910 W CN 2008000910W WO 2009000141 A1 WO2009000141 A1 WO 2009000141A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
file
logical structure
structure information
description
Prior art date
Application number
PCT/CN2008/000910
Other languages
English (en)
Chinese (zh)
Inventor
Jing Qu
Zhensheng He
Yi Wang
Li Zhang
Original Assignee
Peking University Founder Group Co., Ltd.
Beijing Founder Apabi Technology Ltd.
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University Founder Group Co., Ltd., Beijing Founder Apabi Technology Ltd., Peking University filed Critical Peking University Founder Group Co., Ltd.
Publication of WO2009000141A1 publication Critical patent/WO2009000141A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents

Definitions

  • the present invention relates to a method and system for representing structural information of a computer electronic document, and more particularly to a method, system and apparatus for representing logical structure information of a layout file. Background technique
  • the layout file technology converts the original format of the original format of various formats into a unified format, and truly maintains the layout and information of the text, graphics, formulas and colors in the original file in the conversion, and realizes in different terminal devices and reading.
  • the software has consistent display results.
  • the layout file adopts an absolute description method. In the customized coordinate system, the position and size of each primitive (such as characters, pictures, tables, etc.) are clearly recorded, so that the printed results of the document and The results of browsing on the computer are consistent, and display consistency is achieved in any computer environment (such as Windows system or operating system of a PDA, a smart phone, etc.) to ensure that the original appearance of the document is truly reproduced.
  • the current layout file formats mainly include PDF (Portable Document Format) from Adobe, XPS (Xml Paper Specification) from Microsoft Corporation, and CEB (Chinese e-Paper Basic) from Beijing Founder Apabi Technology Co., Ltd. Electronic files in other formats (such as WPS, Microsoft Word, etc.) can also be easily converted into layout files.
  • PDF Portable Document Format
  • XPS Xml Paper Specification
  • CEB Choinese e-Paper Basic
  • the logical structure information of the document refers to: According to a certain understanding, the logical meaning of each part of the document, and the relationship between the parts, such as the title of the document, Hierarchical information on the content of documents such as text, paragraphs, and tables.
  • the logical structure information of the document includes the logical unit of the document and the hierarchical relationship between the logical units, wherein each document logical unit corresponds to a certain part of the document, the logical unit is an abstract concept that humans can understand, and the relationship between the logical units represents A logical combination of these concepts, as shown in Figure 1, the comic unit of an article may have a title, author, abstract, body, etc. These logics also form a tree structure, and these logical units are Corresponds to one or more text blocks.
  • This type of logical structure information is not included in a large number of layout files.
  • Adobe's Tagged PDF technology represents the logical structure information of the document in the layout file. It uses the method of adding special symbols in the content description instruction stream of the layout file to divide the logical unit, as shown in Figure 2, in the content data stream. Tag tags are added to them, and Tag... and End Tag are used to represent a logical unit.
  • This method has various drawbacks in practical applications: First, modify, add, and delete the logical structure of the document. The information requires modification of the content instruction stream of the layout file. This modification process is complicated and error-prone.
  • the granularity of the instruction stream partitioning (a granularity can be considered as a logical unit) is limited. The minimum granularity is the entire content of an output instruction, and there may be cases where a certain content fragment cannot be further divided.
  • Embodiments of the present invention provide a method, system, and apparatus for representing logical structure information of a layout file. It is used to solve the problem that the layout file in the prior art is inflexible to the logical structure information processing, is inconvenient to add and modify the layout file, and cannot meet the user's needs.
  • An embodiment of the present invention provides a method for expressing logical structure information of a layout file, including the following steps:
  • the step of obtaining the logical structure information of the layout file includes:
  • the steps of obtaining the content reference sequence of the layout file include:
  • the content of the layout file is read, and the content reference sequence is generated according to the order in which the primitives in the content of the layout file appear in the content data stream or the traversal order of the document tree.
  • the steps of dividing the content reference sequence into a plurality of content reference subsequences include:
  • the content reference sequence is divided into a plurality of content reference sub-sequences according to the primitives in the contents of the layout file at the offset position of the content reference sequence or the primitive symbols in the content reference sequence.
  • the step of associating the content division description file with the logical unit description file includes: associating the content division description file with the comic unit description file by the number of the content reference subsequence.
  • the content division description file or the logical unit description file is a data block in a separate file or a layout file on the storage device.
  • the content partitioning description file or logical unit description is described in a structured markup language.
  • An embodiment of the present invention further provides a system for expressing logical structure information of a layout file, including: a logical structure information obtaining system, configured to obtain logical structure information of the layout file; a logical structure description generating module, configured to acquire a content reference sequence, and divide the content reference sequence into a plurality of content reference indexes according to the logical structure information a sequence, generating a content partitioning file and a unit description file;
  • the logical structure description parsing module is configured to parse and associate the content partitioning description file and the logical unit description file.
  • the foregoing logical structure description generating module includes:
  • a content reference sequence generating module configured to read a layout file content, and generate a content reference sequence
  • a content division description generation module configured to divide the content reference sequence into a plurality of content reference sub-sequences according to the logical structure information, and generate Content division description file
  • the logical unit description generation module generates a logical unit description file according to the logical structure information.
  • the foregoing logical structure description generating module further includes: a storage device, configured to store a content reference sequence generated by the content reference sequence generating module, or a plurality of content reference sub-sequences divided by the content partitioning description generating module.
  • the foregoing logical structure description parsing module further includes:
  • a content reference sequence generating module configured to read the layout file content, and generate a content reference sequence
  • the content partitioning parsing module is configured to divide the content reference sequence into a plurality of content reference sub-sequences, and generate a content division description file.
  • the above logical structure description parsing module further includes:
  • the logical unit description parsing module is configured to read and parse the data in the logical unit description file
  • mapping module configured to associate the content division description file with the logical unit description file.
  • An embodiment of the present invention provides a device for displaying logical structure information of a layout file, including: a logical structure information acquiring module, configured to obtain logical structure information of a layout file;
  • a logical structure description generating module configured to acquire a content reference sequence, and according to the logical structure Decoding the content reference sequence into a plurality of content reference sub-sequences, and generating a content division description file according to the plurality of content reference sub-sequences; generating a logical unit description file according to the logical structure information;
  • the logical structure description parsing module is configured to parse and associate the content partitioning description file and the logical unit description file.
  • the above technical solution divides the content reference sequence of the layout file into a plurality of content reference sub-sequences, generates a corresponding content division description file, and generates a logical unit description file, and then associates the content division description file with the logical unit description file.
  • the logical structure information and the layout file are separated from each other, and any content in the layout file can be separately described and extracted, and can be described according to different document logical structure models, the description range is more accurate, and the logical structure information is represented. It is more flexible, and can also add multiple document logical structure information descriptions to the same layout file.
  • Figure 1 is a schematic diagram showing the structure of the structure information of the existing layout file
  • FIG. 2 is a schematic diagram showing the structure of the logical structure information of the document in the layout file by the existing Adobe Tagged PDF technology
  • FIG. 3 is a schematic diagram of a method for representing logical structure information of a layout file according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of relationship between logical structure information and a layout file of a layout file according to an embodiment of the present invention
  • FIG. 5 is a layout file according to an embodiment of the present invention. Schematic diagram of the reference sequence with its content;
  • FIG. 6 is a schematic diagram showing the structure of an offset position of the content reference sequence shown in FIG. 5;
  • FIG. 7 is a content division description file according to the content of the layout file document shown in FIG. 5;
  • FIG. 8 is a divisional description file according to another content of the layout file content shown in FIG. 5;
  • FIG. 9 is a diagram according to FIG. Or a logical unit description file of the layout file shown in FIG. 8;
  • 10 is another block description file according to the layout file shown in FIG. 6, FIG. 7, or FIG. 8.
  • FIG. 11 is another logical unit description file according to the layout file shown in FIG. 6, FIG. 7, or FIG. 8.
  • FIG. 13 is a schematic structural diagram of a logic structure description generation module in a logical structure information representation system of a layout file according to an embodiment of the present invention
  • FIG. 14 is a schematic structural diagram of a logical structure description parsing module in a logical structure information representation system of a layout file according to an embodiment of the present invention. detailed description
  • the method for representing the logical structure information of the layout file includes the following steps: Step 31: Obtain logical structure information and a content reference sequence of the layout file;
  • Step 32 Divide the content reference sequence into a plurality of content reference sub-sequences according to the logical structure information, and generate a content division description file;
  • Step 33 Generate a logical unit description file according to the logical structure information
  • Step 34 Associate the content division description file with a logical unit description file. Corresponding content division description file, and generating a logical unit description file, and then associating the content division description file with the logical unit description file, so that the logical structure information and the layout file are separated from each other, and any content in the layout file can be separately performed.
  • the description range is more accurate, the logical structure information representation is more flexible, and at the same time, multiple document logical structure information descriptions can be added to the same layout file, Or modify the logical structure information of the document, it is not necessary to modify the content description of the layout file, which reduces the possibility of error, and the flexible representation of the logical structure information of the layout file can describe a large number of existing layout files. Improve compatibility without affecting existing systems.
  • the logical file information of the layout file may be obtained by using the computer application to mark the layout file or the document analysis and the document understanding processing system by analyzing the electronic document that already contains the logical structure information.
  • the document processing system of the document can be utilized to extract logical structure information therein, such as for Microsoft Word documents. Office automation objects to get logical structure information.
  • the user can mark the logical unit of the layout file through a computer application with a graphical interface. It is also possible to obtain its logical structure information through a processing system based on document analysis and document understanding.
  • the content of the layout file may be read first, and then the content reference is generated according to the order in which the primitives (such as characters, pictures, tables, etc.) in the content of the layout file appear in the content data stream or the traversal order of the document tree. sequence.
  • a content reference sequence is a collection of multiple ordered meta-information information in a layout file.
  • the layout file 43 shown in FIG. 4, the CEB file Sample.ceb, generates a logical unit description file 41 and a content division description file 42 according to the logical structure information acquired above.
  • the layout file is described in an XML language.
  • the logical unit description file 41 and the content division description file 42 herein may also be described by other structured markup languages, such as the SGML language.
  • the content reference sequence may be divided into multiple content reference sub-sequences according to the offset position of the content reference sequence in the content of the content file or the primitive symbol in the content reference sequence, and Each of the content reference subsequences is assigned a number. This number can be saved in the content partitioning description file.
  • a layout file such as 51 has a document content data stream description 52, which contains text primitives.
  • Figure 6 is a specific embodiment of the logical structure in accordance with the layout file 51 of Figure 5.
  • 61 is a content reference sequence of the layout file, and the content reference sequence is arranged according to the order in which the primitives appear in the content description 52.
  • 62 represents the offset position of the primitive in the content reference sequence.
  • 71 or 81 is a content division description file, the description The file is divided by specifying the starting offset position of the content reference subsequence in the content reference sequence and the length of the subsequence.
  • Each division is given a unique number PID, as shown in Figure 7, number 8 corresponds to "before the bed, moonlight,” subsequence, number 9 corresponds to "suspicious ground frost, head to see the moon,,, subsequence.
  • PID unique number
  • FIG. 7 and FIG. 8 can exist at the same time.
  • 91, 101 or 111 in Fig. 9, Fig. 10, Fig. 11 are comma element description files in XML language, and the logical unit can be associated with the content reference subsequence through the PID of the content reference subsequence.
  • the logical unit description file in the above step 33 includes: a logical unit of the layout file and a relationship between the logical units. As shown in Figure 9, Figure 10, Figure 11. Structured description languages can be used to describe logical units and their relationships, such as XML, SGML, and the relationship between logical units can reflect the reading order of layout files.
  • the content division description file may be associated with the logical unit description file by the number given above for the content reference subsequence.
  • the logical unit and its corresponding content reference subsequence can be associated by the number of the content reference subsequence.
  • the 8"/> is associated with the "Before the Moon" content reference subsequence.
  • the content division description file or the logical unit description file in the above embodiment may be a separate file on the storage device, so that the logical structure information and the layout file are separated from each other, and the representation of the logical structure information is more flexible.
  • the content division description file or the logical unit description file in the above embodiment may also be a data block in the layout file.
  • the embodiment of the present invention further provides a system for expressing the logical structure information of the layout file, including: a logical structure information acquiring system, configured to obtain logical structure information of the layout file; a logical structure description generating module, configured to obtain a content reference sequence from the layout file parsing system, and divide the content reference sequence obtained by the logical reference information into multiple Content reference subsequences, generating a content division description file and a logical unit description file;
  • the logical structure description parsing module is configured to parse and associate the content partitioning description file and the logical unit description file.
  • the logical structure description generating module in FIG. 12 above includes:
  • a content reference sequence generating module configured to read the content of the layout file, and generate a content reference sequence in a specified order; the specified order may be a sequence in which the primitives in the content of the layout file appear in the content data stream, or may be a traversal of the document tree order.
  • a content division description generating module configured to divide the content reference sequence into a plurality of content reference sub-sequences according to the logical structure information, and generate a content division description file; the division manner may be in accordance with a primitive in the content of the layout file
  • the content references the offset position of the sequence or the primitive symbol in the content reference sequence, and assigns a number to each content reference subsequence; the number can be saved in the content division description file.
  • a logical unit description generating module configured to generate a unit description file according to the logical structure information, where the logical unit description file includes a plurality of logical units and a relationship between the logical units, and the logical description unit may be used to describe the logical unit And the relationship between them, such as the use of XML, SGML language, and the relationship between the units can reflect the reading order of the layout files.
  • the foregoing logical structure description generating module may further include: a storage device, configured to store a content reference sequence generated by the content reference sequence generating module, or a plurality of content reference sub-sequences divided by the content partitioning description generating module, or generated by the logic unit description generating module Logical unit description file.
  • a storage device configured to store a content reference sequence generated by the content reference sequence generating module, or a plurality of content reference sub-sequences divided by the content partitioning description generating module, or generated by the logic unit description generating module Logical unit description file.
  • the above content reference sequence and content reference subsequence may or may not be stored in the storage device.
  • the logical structure description parsing module in FIG. 12 above includes:
  • the logic unit describes a parsing module, configured to read and parse data in the logical unit description file, and a mapping module, configured to associate the content partitioning description file with the logical unit description file. Specifically, the unit can be edited according to the number of the content reference subsequence and its corresponding The reference subsequence is closed.
  • the logical structure description parsing module should also Includes the following modules:
  • a content reference sequence generating module configured to read a layout file content, and generate a content reference sequence
  • a content division description parsing module configured to divide the content reference sequence into a plurality of content reference sub-sequences according to the logical structure information, and generate Content division description file.
  • a content reference sequence generation module In an actual application, a content reference sequence generation module, a content division description parsing module, a content re-generating content reference sequence and a content division description file are used, and a large number of content reference sequences and content division description file data are read from the memory. The way, the operation speed is fast and the efficiency is high.
  • the logical structure description generation module works as follows:
  • the logical structure information acquisition system obtains the logical structure information of the layout file.
  • the document processing system of the document can be utilized to extract the logical structure information, for example, the Microsoft Word document can utilize Office.
  • Automate objects to get logical structure information the user can mark the logical unit of the layout file through a computer application with a graphical interface. It is also possible to obtain its logical structure information through a processing system based on document analysis and document understanding.
  • the content reference sequence generation module uses the layout file parsing system to arrange the contents of the layout file into an ordered sequence according to a certain order, and obtain a content reference sequence of the layout file.
  • the content division description generation module divides the content reference sequence according to the logical structure information obtained in the above-mentioned logical structure information acquisition system, and outputs a content division description file.
  • the logic unit description generation module acquires the logic node obtained in the system according to the above logical structure information The information output logical unit description file.
  • the content partitioning description file and the logical unit description file can be embedded in the layout file or saved separately.
  • the logical structure description parsing module works as follows:
  • the content reference sequence generation module is required to reuse the layout file parsing system to set the inner oblique data of the layout file.
  • the order is arranged as an ordered sequence, resulting in a content reference sequence.
  • the content division description parsing module reads the content division description file, and divides the content reference sequence obtained in the logic structure description generation module shown in FIG. 13 above.
  • the logical unit description parsing module reads the logical unit description file in the logical structure description generating module shown in Fig. 13 above and verifies its validity.
  • the mapping module associates the logical unit with the content reference subsequence according to the content reference sub-sequence number in the content partition description file and the logical unit description file.
  • an external system interacting with the system may have a layout file resolution system, a logical structure information acquisition system, and other document processing systems.
  • Other document processing systems may be format conversion systems, layout rearrangements, and the like. These systems use logical structure information to process layout files, such as extracting information, rearranging pages, converting to other formats, and so on.
  • the content division description file and the logical unit description file described above may be saved in the layout document or may be separately saved as a separate file from the layout file. For the same layout file, you can have multiple logical structure information descriptions.
  • the embodiment of the present invention further provides a device for expressing logical structure information of a layout file, where the device includes a logical structure information acquiring module, a logical structure description generating module, and a logical structure description parsing module, where:
  • a logical structure information obtaining module configured to obtain logical structure information of the layout file
  • a logical structure description generating module configured to acquire a content reference sequence, and according to the logical structure
  • the information is divided into a plurality of content reference sub-sequences, and a content division description file is generated according to the plurality of content reference sub-sequences; and a re-synthesis unit description file is generated according to the logical structure information;
  • the logical structure description parsing module is configured to parse and associate the content partitioning file and the logical unit description file.
  • the logical structure description generation module includes a content reference sequence generation module, a content division description generation module, and a logic unit description generation module, where:
  • a content reference sequence generating module configured to read the content of the layout file, and generate a content reference sequence in a specified order; the specified order may be a sequence in which the primitives in the content of the layout file appear in the content data stream, or may be a traversal of the document tree order.
  • a content division description generating module configured to divide the content reference sequence into a plurality of content reference sub-sequences according to the logical structure information, and generate a content division description file; the division manner may be in accordance with a primitive in the content of the layout file
  • the content references the offset position of the sequence or the primitive symbol in the content reference sequence, and assigns a number to each content reference subsequence; the number can be saved in the content division description file.
  • a logic unit description generating module configured to generate a logical unit description file according to the logical structure information, where the logical unit description file includes a plurality of logical units and a relationship between the logical units, and the structured description language may be used to describe the logic Units and their relationships, such as XML, SGML language, and the relationship between logical units can reflect the reading order of the layout files.
  • the foregoing logical structure description generating module may further include: a storage device, configured to store a content reference sequence generated by the content reference sequence generating module, or a plurality of content reference sub-sequences divided by the content partitioning description generating module, or generated by the logic unit description generating module Logical unit description file.
  • a storage device configured to store a content reference sequence generated by the content reference sequence generating module, or a plurality of content reference sub-sequences divided by the content partitioning description generating module, or generated by the logic unit description generating module Logical unit description file.
  • the above content reference sequence and content reference subsequence may or may not be stored in the storage device.
  • the logical structure description parsing module includes a logical unit description parsing module and a mapping module, where:
  • a logic unit description parsing module configured to read and parse data in the logic unit description file
  • a mapping module configured to perform the content division description file and the logic unit description file Association.
  • the logical unit and its corresponding content reference subsequence may be associated by the number of the content reference subsequence.
  • the logical structure description parsing module should also Includes the following modules:
  • a content reference sequence generating module configured to read the content of the layout file, to generate a content reference sequence
  • a content division description parsing module configured to divide the content reference sequence into a plurality of content reference sub-sequences according to the logical structure information, And generate a content division description file.
  • the method and system of the present invention divides a content reference sequence of a layout file into a plurality of content reference sub-sequences, generates a corresponding content division description file, and generates a logical unit description file, and then divides the content.
  • the description file is associated with the logical unit description file, so that the logical structure information and the layout file are separated from each other, and any content in the layout file can be separately described and extracted, and can be described according to different document logical structure models.
  • the description range is more accurate, the representation of logical structure information is more flexible, and multiple logical structure information descriptions can be added to the same layout file, that is, the same layout file can have multiple content division description files and logical unit description files, and is added.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé, un système et un dispositif de représentation, concernant le procédé et le système de représentation d'informations dans la technique du processus informatique. L'invention a pour objet de résoudre les problèmes antérieurs liés au fait que le fichier de mise en page n'est pas souple et pas facile à ajouter ou modifier. En obtenant les informations de structure logique et la séquence de référence de contenu d'un fichier de mise en page ; en divisant ladite séquence de référence de contenu en de multiples sous-séquences de référence de contenu selon lesdites informations de structure logique, et en créant un fichier de description de division de contenu ; en créant un fichier de description de l'unité logique selon lesdites informations de structure logique ; en associant ledit fichier de description de division de contenu au fichier de description de l'unité logique, la représentation des informations de structure logique du fichier de mise en page devient efficace et souple, et il n'est pas nécessaire de modifier le fichier de mise en page d'origine et tout contenu dans le fichier de mise en page peut être décrit séparément en informations de structure logique, extrait et réutilisé dans différents modules de structure logique de document.
PCT/CN2008/000910 2007-06-22 2008-05-08 Procédé, système et dispositif de représentation d'informations de structure logique de fichier de mise en page WO2009000141A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200710123338.6A CN101271463B (zh) 2007-06-22 2007-06-22 版式文件的结构处理方法和系统
CN200710123338.6 2007-06-22

Publications (1)

Publication Number Publication Date
WO2009000141A1 true WO2009000141A1 (fr) 2008-12-31

Family

ID=40005437

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2008/000910 WO2009000141A1 (fr) 2007-06-22 2008-05-08 Procédé, système et dispositif de représentation d'informations de structure logique de fichier de mise en page

Country Status (2)

Country Link
CN (1) CN101271463B (fr)
WO (1) WO2009000141A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116916047A (zh) * 2023-09-12 2023-10-20 北京点聚信息技术有限公司 一种版式文件识别数据智能存储方法

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887413B (zh) * 2009-05-14 2012-07-04 北大方正集团有限公司 版式表格的结构处理方法和系统
CN102087692B (zh) * 2009-12-02 2013-11-06 北大方正集团有限公司 一种版式文件数据防复制方法及系统
CN102122280B (zh) * 2009-12-17 2013-06-05 北大方正集团有限公司 一种智能提取内容对象的方法及系统
CN102541888A (zh) * 2010-12-20 2012-07-04 鸿富锦精密工业(深圳)有限公司 专利电子文件解析系统及方法
CN102567291B (zh) * 2010-12-31 2014-09-10 北大方正集团有限公司 一种删除版式文档中的花边字符的方法及装置
CN102411498A (zh) * 2011-07-26 2012-04-11 中兴通讯股份有限公司 一种实现数据模型的方法及图形化设计器
CN103186655A (zh) * 2011-12-31 2013-07-03 北大方正集团有限公司 版式文件的处理方法和装置
WO2014012565A1 (fr) * 2012-07-20 2014-01-23 Microsoft Corporation Codage couleur d'éléments de structure de mise en page dans un document de format flux
CN103970799B (zh) * 2013-02-04 2019-04-26 百度在线网络技术(北京)有限公司 一种电子文档的生成方法、装置和客户端
CN104090920A (zh) * 2014-06-17 2014-10-08 安徽教育网络出版有限公司 一种实现数字内容跨终端出版的系统
CN104199803B (zh) * 2014-07-21 2017-10-13 安徽华贞信息科技有限公司 一种基于组合理论的文本信息处理系统及方法
CN105760358B (zh) * 2014-12-19 2019-07-23 阿里巴巴集团控股有限公司 电子书版面重排和电子书展示的方法及其装置
CN105279254B (zh) * 2015-10-12 2018-10-23 江苏中威科技软件系统有限公司 版式数据流文件系统及其操作装置和其操作装置的实现方法
CN105701073A (zh) * 2015-12-31 2016-06-22 北京中科江南信息技术股份有限公司 版式文件的生成方法及装置
CN108287927B (zh) * 2018-03-05 2019-10-22 北京百度网讯科技有限公司 用于获取信息的方法及装置
CN109815243B (zh) * 2019-02-18 2020-03-03 北京仁和汇智信息技术有限公司 一种文档界面化修改时的结构化存储方法和装置
CN112612750A (zh) * 2020-12-15 2021-04-06 北京天融信网络安全技术有限公司 文件内容处理方法、装置、电子设备及可读存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6592628B1 (en) * 1999-02-23 2003-07-15 Sun Microsystems, Inc. Modular storage method and apparatus for use with software applications
CN1441929A (zh) * 2000-07-10 2003-09-10 佳能株式会社 传送多媒体描述
CN1604073A (zh) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 一种对报纸版面进行标题与正文逻辑关联的方法
US20050193327A1 (en) * 2004-02-27 2005-09-01 Hui Chao Method for determining logical components of a document
US20070092140A1 (en) * 2005-10-20 2007-04-26 Xerox Corporation Document analysis systems and methods

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100429643C (zh) * 2005-12-07 2008-10-29 段君雷 面向多媒体网络电子出版物制作的实现方法
CN100356372C (zh) * 2005-12-31 2007-12-19 无锡永中科技有限公司 计算机版式文件生成方法和打开方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6592628B1 (en) * 1999-02-23 2003-07-15 Sun Microsystems, Inc. Modular storage method and apparatus for use with software applications
CN1441929A (zh) * 2000-07-10 2003-09-10 佳能株式会社 传送多媒体描述
US20050193327A1 (en) * 2004-02-27 2005-09-01 Hui Chao Method for determining logical components of a document
CN1604073A (zh) * 2004-11-22 2005-04-06 北京北大方正技术研究院有限公司 一种对报纸版面进行标题与正文逻辑关联的方法
US20070092140A1 (en) * 2005-10-20 2007-04-26 Xerox Corporation Document analysis systems and methods

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116916047A (zh) * 2023-09-12 2023-10-20 北京点聚信息技术有限公司 一种版式文件识别数据智能存储方法
CN116916047B (zh) * 2023-09-12 2023-11-10 北京点聚信息技术有限公司 一种版式文件识别数据智能存储方法

Also Published As

Publication number Publication date
CN101271463B (zh) 2014-03-26
CN101271463A (zh) 2008-09-24

Similar Documents

Publication Publication Date Title
WO2009000141A1 (fr) Procédé, système et dispositif de représentation d'informations de structure logique de fichier de mise en page
CN109408783B (zh) 电子文档在线编辑方法及系统
CN110083805B (zh) 一种将Word文件转换为EPUB文件的方法及系统
CN1801149B (zh) 用于将格式化文档转化为网页的系统和方法
CN101937427B (zh) 一种基于浏览器的内容编辑及发布的系统及方法
CN101308488B (zh) 基于版式文件的文档流式信息处理方法及装置
CN101548273A (zh) 确定可演示文件的区域以及参考目录和引用的可扩展标记语言模式
CN102609400B (zh) 文件格式转换方法及转换工具
CN103049439A (zh) 标记语言文档的处理方法及浏览器和网络操作系统
CN112527291A (zh) 网页生成方法、装置、电子设备及存储介质
CN104090920A (zh) 一种实现数字内容跨终端出版的系统
CN102289497A (zh) 文档预览图生成系统及方法
CN111881651A (zh) 一种uot流式文档转换成ofd版式文档的方法
CN112433995B (zh) 文件格式转换方法、系统、计算机设备及存储介质
US8930808B2 (en) Processing rich text data for storing as legacy data records in a data storage system
CN116050370A (zh) 模板数据处理方法、系统及相关设备
CN107066437B (zh) 数字作品标注的方法及装置
CN107423271B (zh) 文档生成方法和装置
JPWO2006051974A1 (ja) 文書処理装置および文書処理方法
Mong et al. Using SVG as the rendering model for structured and graphically complex web material
CN113239670A (zh) 一种业务模板上传的方法、装置、计算机设备及存储介质
KR20070120965A (ko) 표시 가능 파일의 필드 및 참조 문헌 및 인용문에 대한확장형 마크업 언어 스키마의 결정
EP1377917A2 (fr) Conceptions de feuille de style extensible utilisant des informations meta-marquees
CN101151612A (zh) 对文档进行随机访问的方法和系统
Hughes et al. Encoding and presenting interlinear text using XML technologies

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08748468

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08748468

Country of ref document: EP

Kind code of ref document: A1