WO2016008347A1 - Layout document rearrangement method and system, and electronic reading terminal - Google Patents

Layout document rearrangement method and system, and electronic reading terminal Download PDF

Info

Publication number
WO2016008347A1
WO2016008347A1 PCT/CN2015/081626 CN2015081626W WO2016008347A1 WO 2016008347 A1 WO2016008347 A1 WO 2016008347A1 CN 2015081626 W CN2015081626 W CN 2015081626W WO 2016008347 A1 WO2016008347 A1 WO 2016008347A1
Authority
WO
WIPO (PCT)
Prior art keywords
layout
document
layout document
tag data
streaming
Prior art date
Application number
PCT/CN2015/081626
Other languages
French (fr)
Chinese (zh)
Inventor
刘孙亮
Original Assignee
阿里巴巴集团控股有限公司
刘孙亮
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 刘孙亮 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016008347A1 publication Critical patent/WO2016008347A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting

Definitions

  • the present application relates to digital reading technology, and in particular, to a rearrangement method and system for a layout document and an electronic reading terminal.
  • a streaming document is a character, which is a collection of ordered characters, the length of which is the number of characters contained in the file.
  • a Word file is a streaming document that mainly records streaming information, and some of the finalized objects (such as images floating, etc.) can also be added.
  • the layout document clearly records the position and size of each document in a custom coordinate system, so that the printed results of the document are consistent with the structure viewed on the computer, and have any computer environment. The consistency of the display is displayed, thus ensuring the true reproduction of the original document.
  • Documents such as pdf, xps, ceb, etc. are typical layout documents, which have the characteristics of what you see is what you get (WYSIWYG), so it is very suitable for document distribution, dissemination and Archive.
  • a rearrangement scheme for an existing layout document is: in order to remedy the need to read an electronic document on various electronic devices, mark the streaming display information of the layout when the corresponding layout document is created, and store the marked data in the original document. Released together.
  • This rearrangement scheme is based on a precisely positioned layout description in the layout document, in which sufficient streaming logic structure information is added to support streaming applications such as rearrangement and extraction of table structures.
  • HTML HyperText Markup Language
  • Founder Apabi has defined a multi-layer nestable tree-like logical structure containing articles, chapters, paragraphs, fragments, and blocks in the CEBXv1.1 specification released in 2010, where the blocks are directly referenced on the layout page.
  • Layout block or primitive (v1.2) to achieve data sharing, which can support real-time layout and screen adaptive display on electronic reading devices such as mobile terminals.
  • the specific standard manual and software can refer to the official of Founder Apabi. A related introduction on the website (http://www.apabi.cn/download/index.html).
  • Another rearrangement scheme of the existing layout document is: when opening a layout document, the layout information is parsed by some preset algorithms and rules, and according to the parsing result, the layout engine is given to perform real-time weighting. Row, that is, screen adaptive display by real-time typesetting.
  • the real-time rearrangement method of such a layout document is currently widely used in various electronic reading terminals.
  • the document content and the tag data data are located in the same file, and data synchronization of the layout electronic document without the over-display information may be difficult. If you find that the original document is marked with an error, you need to modify the document again, and the original document may be damaged when you modify the document. Especially when a large number of documents have been archived, synchronizing documents in this way may lead to more adverse consequences.
  • the second rearrangement scheme parses the layout document in real time when the document is opened, and the electronic reading terminal analyzes, marks, rearranges, and the like in real time through an algorithm every time, so that it is time consuming and power consuming.
  • the rearrangement scheme relies on the reliability of an algorithm, and thus there may be a problem that the rearrangement effect is not good.
  • the purpose of the present application is to provide a rearrangement method, system and electronic reading terminal for a layout document, which can effectively improve the rearrangement effect and the rearrangement efficiency.
  • the present application provides a method for rearranging a layout document, the method comprising:
  • the layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
  • the streaming tag data includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document.
  • the streaming tag data includes summary content of the layout document.
  • the layout document is marked according to a preset streaming logic information structure to obtain streaming tag data and store it.
  • the layout document is parsed by algorithm analysis or manual analysis or algorithm analysis combined with manual analysis, and the corresponding stream tag data is obtained after marking according to the preset flow logic information structure.
  • the streaming tag data is externally stored on the server side or locally in the form of a file or database record.
  • the layout document is rearranged by using the locally selected streaming logic information structure to find the corresponding document content in the layout document according to the streaming tag data.
  • the locally selected streaming logical information structure corresponds to a streamed logical information structure implemented by a local algorithm, locally pre-processed, user-specified, or newly tagged with a technology tag.
  • all or part of the stream tag is obtained from the stream tag data by using the corresponding relationship between the stream tag data determined by the locally selected stream logic information structure and the layout document, and the layout is found for each stream tag.
  • the corresponding document content in the document is re-formatted and displayed by the layout engine.
  • the present application also provides a rearrangement system for layout documents, the system comprising:
  • the stream tag extractor is configured to acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to the preset logical information structure;
  • a memory configured to store streaming tag data, the stream tag data being stored separately from the layout document;
  • the typesetting engine is configured to rearrange the layout document based on the corresponding document content in the layout document according to the streaming tag data.
  • the present application further provides an electronic reading terminal, which can rearrange the layout document, and the electronic reading terminal is configured to:
  • the layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
  • the present application utilizes the method of externally storing the streaming tag data, without modifying or breaking.
  • the present application can adapt to the layout size by real-time streaming logic marking and pre-processing mark on the layout document.
  • Rearrangement display which can achieve better typographic effect, and can shorten the reflow time very well; at the same time, through the layout analysis and externalization of the streaming logic information mark of the layout document, a large number of existing missing streams can be solved.
  • the problem of rearrangement of the layout document of the tagged data does not need to worry about the damage caused by the modification of the original document and the inconsistency of subsequent documents.
  • the layout document in this application only needs to be marked once, that is, it can be used for multi-user and multi-terminal. Sharing, from the perspective of the entire system, it not only consumes less power, but also helps to upgrade the technology.
  • FIG. 1 shows a flow chart of a rearrangement method of a layout document according to an embodiment of the present application
  • FIG. 2 is a more complete example of a rearrangement method according to the layout document shown in FIG. 1;
  • FIG. 3 is a block diagram showing the composition of a rearrangement system of a layout document according to an embodiment of the present application
  • FIG. 4 shows a block diagram of the composition of an electronic reading terminal according to an embodiment of the present application.
  • Logical information structure refers to the logical description relationship of document organization information, such as structured information such as title, paragraph, formula, table, comment, etc. to specify the logical structure relationship between these elements (such as a picture is centered, What is its picture title, etc.), these logical structure relationships constitute an ordered arrangement.
  • the logical information structure in the embodiment of the present application specifically specifies the relationship between the externalized storage document and the original document, for example, it can specify a paragraph in the externalized storage document, and how many spans in the paragraph (non-splitable) Text, such as a string that you don't want to be broken when displayed, and what text is in each span.
  • each text corresponds to the coordinates of the original layout document or the binary stream offset position of the document.
  • the logical information structure of the embodiment of the present application is different from the logical structure of the pure layout document information.
  • the pure-format document logic structure it only describes how many characters, images, and graphics are displayed at which coordinate position of the page.
  • This pure-form document logical information structure is for the entire document, because it emphasizes the presentation on the layout rather than the logical information, which may lead to the disorder relationship between the parts of the document, or it may be part of messy.
  • the above logical information structure describes the logical structure of the document structure and the layout, and the corresponding flow type tag data can be obtained by identifying and marking the document according to the logical information structure.
  • the stream tag data includes the result of tagging the document structure information in the layout document and/or the document layout information in the layout document, wherein the document layout information is an adaptive presentation layout information.
  • the electronic reading terminal can reconstruct the layout of the entire document, and finally the result of the rearrangement display of the layout document matches the size of the screen of the electronic reading terminal.
  • the part of the adaptive rendering layout that describes the layout document is the title, which is the paragraph (there may be 1000 words in the paragraph), and so on.
  • the corresponding content of the display can be adjusted according to the size of the screen on different reading devices: for example, 900 characters may be displayed on one screen on the computer, that is, the aforementioned paragraph adaptively displays a little more than one screen; on the mobile phone, Only one text is displayed on one screen, that is, the aforementioned paragraph is adaptively displayed as 10 screens. But no matter how adaptive display, some content will not be displayed in different screens. For example, the word "and" can be a span, and it can't be broken anyway.
  • the layout document can be parsed to obtain the stream tag data, and the stream tag data can also be identified to reconstruct the layout document.
  • the concept of the embodiment of the present application is obtained: acquiring the streaming tag data stored separately from the layout document, the streaming tag data establishing a correspondence relationship with the layout document according to the preset logical information structure; searching for the layout according to the streaming tag data The corresponding document content in the document to rearrange the layout document.
  • FIG. 1 there is shown one embodiment of a rearrangement method in accordance with a layout document of the present application.
  • the concept of the method focuses on the storage externalization of the stream tag data, so as to determine the document structure information and the document layout information in the layout document according to the correspondence between the stream tag data and the layout document, thereby achieving better implementation.
  • Rearrangement of layout documents This embodiment adopts the method of streaming tag data external storage, and can effectively improve the rearrangement efficiency and rearrangement effect of the layout document without modifying the original document without destroying the original document, which is described in further detail below.
  • the layout document of the embodiment of the present application may refer to the entire layout document, and may also refer to one or several pages in the layout document.
  • This type of document is in absolute description, which clearly records the position and size of each document in a custom coordinate system, so that the printed results of the document are consistent with the structure viewed on the computer, realizing what you see is what you get. effect.
  • the streaming tag data of the embodiment of the present application includes document structure information in the layout document and/or document layout information in the layout document.
  • the document structure information includes chapter information of the document, the internal content order of each chapter, and the sequence of each element in the content block.
  • the document layout information includes information for determining the final rendering effect of the primitives and other elements when the corresponding layout of the layout document, the layout information of the primitive itself or the content block itself, and between the primitives in the same content block or between the content blocks.
  • the relationship for example, the text setting mode of the specified picture or the column information of a plurality of content blocks.
  • the layout rearrangement here refers to reorganizing the layout according to certain rules when the layout size or layout content changes.
  • the streaming tag data of the embodiment of the present application may further include reading cue information.
  • the stream tag data may also be provided according to additional reading order information provided according to specific needs.
  • the reading clue information is optional reading order information provided to the user.
  • the streaming tag data of the embodiment of the present application includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document.
  • such streaming tag data may include summary information of a layout document, such as summary information of a layout document obtained based on the MD5 or SHA algorithm.
  • the layout document is marked in a predetermined streaming logical information structure in such a manner that the obtained streaming mark data can achieve a strong association between the streaming mark data and the layout document.
  • the embodiment of the present application analyzes the layout documents that need to be rearranged by using a certain logical labeling algorithm, and can effectively extract the words in the layout document to form words, which words can form paragraphs, which words are superscripts or subscripts, which
  • the object is a graph, which text is a graph title, etc., thereby enabling a full and effective description of the layout document, which ultimately facilitates the rearrangement of the layout document.
  • the usual layout document only describes the position of each text, graphic or image in the page, and does not logically describe the relationship between these objects, which will affect the rearrangement efficiency and display effect of the layout document.
  • the specific logical information structure of the stream tag data in the embodiment of the present application may be determined by referring to some prior art standards, such as the relevant technical manual of Founder Apabi; or the related logical information structure may be re-customized to ensure that there is a comparison. Good compatibility is good, no longer repeat them.
  • FIG. 1 is an embodiment of a rearrangement method of a layout document of the present application.
  • the rearrangement effect and the rearrangement efficiency of the layout document can be effectively improved without modifying the original document without destroying.
  • the specific technical solutions of the rearrangement method of the layout document described in the embodiment of the present application are described in detail below.
  • streaming tag data stored separately from the layout document is acquired.
  • the streaming tag The data is based on a preset logical information structure and a layout document, such as the original document summary method and other database key value pairs to establish a correspondence.
  • the streaming tag data is the result of the layout parsing of the layout document according to the preset streaming logic information structure; by using the streaming tag data, the structure information and layout of the corresponding document content in the layout document can be found. information. Since the stream tag data record has rich document structure information, document layout information, and the like in the layout document, a strong association relationship can be established well with the original layout document (abbreviated as the original document). In this way, according to the optimized stream tag data, not only the corresponding document content in the original document can be searched, but also the structural information and the layout information of the document content can be determined, thereby facilitating the rearrangement of the entire layout document conveniently.
  • the layout document is rearranged according to the corresponding content of the document in the layout document according to the streaming tag data.
  • the rearrangement of the layout document can be completed, and finally, the rearranged layout document (referred to as rearrangement document) adapted to the display interface such as the electronic reading terminal can be obtained.
  • the streaming tag data in the embodiment of the present application describes the layout document more fully and effectively, it helps to improve the rearrangement display effect of the layout document.
  • step S110 and step S120 are further described in detail as follows.
  • streaming tag data associated with the layout document in a streaming logical information structure is obtained, the streaming tag data being stored separately from the layout document.
  • the embodiment of the present application obtains the streaming tag data by means of a pre-processing method or a real-time tag layout document, but the pre-processed stream tag data or the real-time tag stream tag data can be stored separately from the layout document.
  • the original document is then rearranged by looking up the layout document content corresponding to the streaming tag data.
  • the step S110 may include two specific steps: one is to find out whether there is pre-processed stream tag data associated with the layout document, and the other is to mark the layout document in real time without the pre-processed stream tag data.
  • the basic process of step S110 is: pre-finding whether there is pre-processed streaming tag data corresponding to the layout document; if yes, acquiring the streaming Mark data; if it does not exist, the layout document is identified and marked according to the preset streaming logic information structure to obtain the stream tag data and store it.
  • these streaming tag data can be externally stored in a file or database record on a server side (for example, a cloud server) or locally, so that it can be conveniently stored separately from the layout document.
  • step S110 the basic process of acquiring the streaming tag data in the pre-processing manner or the real-time tagging manner is: performing parsing and marking on the layout document according to the preset streaming logical information structure, and collecting all the tag information obtained.
  • Stream tag data It can be understood that when the layout document is marked according to the above manner, the layout document can be parsed by algorithm analysis, manual analysis, or combination of algorithm analysis and manual analysis, and according to a preset flow logic information structure. Mark to get the corresponding stream tag data. A correspondence between the stream tag data and the original document is established according to a preset flow logic information structure.
  • a pdf layout document and its streaming tag data are listed below to illustrate the technical solution of the method described in the present application as a specific example of streaming tag data externalization.
  • the layout document is parsed according to the preset streaming logic information structure, and the tag data set of the parsing result is used as the stream tag data of the layout document.
  • the content of each part of the document is marked with rich stream structure information and layout information, so that it can correspond to the original layout document better, and finally can be conveniently used for rearrangement display.
  • the stream tag data described in this embodiment may not be limited to the description manner of the above tag instance, and it may adopt a binary description, an xml description, and the like.
  • the embodiments of the present application do not focus on specific description standards of a certain file format, and thus the detailed description of how to form the stream tag data is not described.
  • step S120 the corresponding document content in the layout document is searched according to the streaming tag data, and the structural information and the layout information of the document content are identified (for example, determining that some document content is a text, a graphic or a table, and determining between them The relationship, according to which the corresponding typesetting scheme is determined, to rearrange the layout documents.
  • the rearranged layout document can perform real-time layout and screen adaptive display on an electronic device such as a mobile terminal, thereby effectively improving the user's reading experience.
  • the screen here
  • the adaptive display includes acquiring screen size information of the device, and adaptively formatting the document content according to the screen size information.
  • the rearrangement of the layout document here includes the process of reorganizing the primitives and other elements in the layout according to certain rules when the layout size or the layout content changes, forming a layout presentation result.
  • the embodiment of the present application does not require specific requirements for the typesetting engine.
  • the mature typesetting engine (such as webkit) on the market can be selected as a selection object.
  • the user can also independently develop other suitable typesetting engines, and the description will not be repeated here.
  • the embodiment of the present application establishes a correspondence between the streaming tag data and the layout file by using a preset streaming logic information structure.
  • the layout document can be pre-marked or marked in real time, thereby obtaining corresponding stream tag data.
  • Pre-marking or real-time tagging of a layout file can be understood as a process of parsing a layout document.
  • the layout document may also be reconstructed according to the marked stream tag data, and the document structure information and the layout information in the specific tag data are used to find the corresponding document content in the layout document, and according to the current
  • the display requirements (such as font size requirements, adaptive display requirements according to screen size) are displayed in a typographical manner. Simply put, refactoring a layout document can be understood as a process of decomposing streaming tag data.
  • the streaming logic information structure and the reflowing process at the time of labeling should be guaranteed during re-typesetting.
  • the logical information structure remains matched. It can be understood that the preset streaming logic information structure when marking the layout document and the streaming logic information structure when rearranging may actually have a mismatch, so when the typesetting engine selects the streaming logic information structure, Streaming logic information structures corresponding to local algorithm implementations, locally pre-processed, user-specified, or up-to-date tagging technology tags should generally be prioritized.
  • the layout document can be parsed through the locally selected streaming logic information structure, that is, the corresponding document content in the layout document is searched according to the stream tag data, and the structural information of the content of the document is further identified and Layout information, ultimately to reorder the layout documents.
  • the reflow type can be established between the stream tag data and the layout document.
  • a valid correspondence which is consistent with the correspondence between the tagged data and the layout document at the time of marking.
  • all or part of the streaming tags (records) can be obtained from the streaming tag data when the layout document is rearranged, so that the corresponding document content in the layout document can be found for each streaming tag and recognized.
  • the structural information and layout information of the content can then be re-formatted and displayed by the layout engine.
  • parsing algorithms may have different schemes, but since the present application does not focus on how to parse a certain system algorithm in real time, the corresponding parsing algorithm is not specifically described.
  • FIG. 2 a more complete example of the rearrangement method according to the layout document shown in FIG. This example mainly includes the following steps 210 to 250, which are briefly described below.
  • Step S210 receiving a layout document.
  • the layout document may be rearranged according to current display conditions (eg, according to factors such as the size of the display screen).
  • step S220 it is found whether there is streaming tag data corresponding to the layout document.
  • streaming tag data corresponding to the layout document that is, whether there is pre-processed streaming tag data, which is obtained by performing stream tag pre-processing on the layout document.
  • the resulting streaming tag data can be stored separately from the layout document. If there is pre-processed stream tag data, the process proceeds to step S230, and if there is no pre-processed stream tag data, the process proceeds to step S240.
  • Step S230 acquiring the pre-processed stream tag data as a solution for rearrangement of the layout document Analyze the elements to achieve rearrangement of the layout document.
  • Step S240 Mark the layout document in real time to obtain the streaming tag data and store it, so as to update the stream tag data of the layout document.
  • Step S250 Searching the corresponding document content in the layout document according to the acquired streaming tag data, and identifying the structural information and the layout information of the document content, so as to implement rearrangement of the layout data.
  • FIG. 2 is a complete example of the rearrangement method of the layout document shown in FIG. 1, and the basic context of the technical solution described in the present application can be clearly shown, and most of the details are illustrated in FIG.
  • the content described in Figure 2 is not exhaustive, please refer further to the description of Figure 1.
  • the present application is directed to the shortcoming of the existing layout document rearrangement technology, and adopts an external storage method for streaming data, that is, by analyzing and externalizing the flow logic information mark of the layout. It can solve a large number of reordering problems that have no streaming tag data, and there is no need to worry about the damage caused by the original document and the inconsistency of subsequent document flooding.
  • the present application provides a more complete and effective description of the layout document by real-time streaming logic marking and pre-processing mark on the layout, thereby obtaining better typesetting effect and shortening the rearrangement time.
  • the present application adopts the method of streaming tag data external storage, and records the tag type, the electronic reading system version, the server identification system version, the manual identification version and the like in the stream tag data, so that the layout document only needs to be marked once. That is, it can be shared by multiple users and multiple terminals, which also helps to upgrade the electronic reading system.
  • the stream tag data in the present application can generally be marked by an algorithm, and the tag result needs to be externally stored for convenience for the next use.
  • this marking process can also be marked by manual means or by combining algorithms with labor.
  • the embodiment of the present application should obtain the flow mark data according to a specified standard.
  • the embodiments of the present application are not limited to a specific standard, and the stream tag data in the embodiment of the present application may be used in many different ways.
  • the logical information describes the standard. They can be described either in xml or in binary.
  • the result of these tags can also be stored directly in the database or cloud server.
  • the present application also correspondingly configures a rearrangement system (hereinafter referred to as a system) of the layout document, which will be described in detail below.
  • a system a rearrangement system
  • FIG. 3 a block diagram of a rearrangement system of a layout document according to an embodiment of the present application is shown.
  • the rearrangement system (referred to as system) 300 of the layout document is composed of a stream tag extractor 310, a memory 320, a typesetting engine 330, and a stream tag preprocessor 340, and the like, which is externally stored by streaming tag data.
  • the rearrangement effect and the rearrangement efficiency of the layout document can be effectively improved without modifying the original document without modification.
  • the structure and function of each part of the system 300 are further described below.
  • the 300 has a stream tag extractor 310, which can acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to a preset logical information structure, that is, the stream.
  • the tag data is the result of the layout parsing of the layout document according to the preset stream logic information structure tag.
  • the stream tag extractor 310 includes a stream tag lookup module 311, a stream tag read module 312, a real-time tag engine module 313, and the like, wherein: the stream tag lookup module 311 is configured to pre-follow whether or not there is a layout The stream tag data corresponding to the document; the stream tag reading module 312 is configured to acquire the stream tag data when there is streaming tag data corresponding to the layout document; the real-time tag engine module 313 is configured to not exist and When the layout document corresponds to the stream tag data, the layout document is marked according to a preset streaming logic information structure to obtain the stream tag data and store it.
  • the real-time tag engine module 313 can be configured locally or on the server side, and can perform layout analysis on the layout document by algorithm analysis or manual analysis or combination of algorithm and manual, according to preset flow logic information. After the structure is marked, Obtain the corresponding stream tag data.
  • the system 300 also has a memory 320, which may be a cloud memory or a local memory, which may store streaming tag data in the form of a file or database record, which is stored separately from the layout document.
  • the streaming tag data is a tag result of parsing the layout document according to a preset streaming logic information structure, wherein the rich streaming document structure information and the structural information and layout information of the corresponding document content are recorded. Therefore, it can better correspond to the original layout document, which is convenient for reconfiguring the layout document, that is, rearranging the layout document.
  • the system 300 also has a typesetting engine 330 that rearranges the layout documents based on the streamed tag data to find corresponding document content in the layout document.
  • the typesetting engine 330 can search for the corresponding document content in the layout document according to the stream tag data through the locally selected streaming logic information structure, so as to rearrange the layout document.
  • the basic process of rearrangement is to obtain the streaming tag (record) corresponding to the layout document from the stream tag data by the correspondence between the stream tag data determined by the locally selected stream logic information structure and the layout document. Find the corresponding content in the layout document for each streaming tag, and then re-type it by the layout engine.
  • the system 300 can also have a streaming tag pre-processor 340 that is configured locally or on the server side to pre-tag the layout documents and store them after streaming tagged data.
  • the pre-processing token obtained by the stream tag pre-processor 340 can process the document by using an algorithm on the server side, or can mark the document manually or by artificially combining the algorithm.
  • certain software tools can be provided to pre-install the manufacturer.
  • the stream tag data may be obtained by label preprocessing or by real-time tag processing. Either way, the resulting streaming tag data should be stored separately from the layout document.
  • the basic process of performing preprocessing on the layout document is: firstly, the layout document is parsed, and the layout analysis is not limited to algorithm analysis, manual analysis, and the like; Then, the result of the streamed content of the layout information is externally stored, and the storage method is not limited to cloud storage, database, or local external file storage.
  • the stream mark can be separated from the original layout document.
  • the process of performing stream tag real-time processing on the layout document is similar to that of the tag processing, and the difference between the time and the subject of the tag processing is not described herein.
  • real-time tagging layout documents to obtain streaming tag data such as the description standard of a certain file format and the corresponding algorithm problem of real-time parsing of the standard, please refer to the relevant literature in the prior art for details. , will not repeat them here.
  • the e-reader in the system can select the stream logic information structure that it considers to be optimal when displayed.
  • These selected stream logic information structures can be local algorithm-implemented, locally pre-processed documents, users.
  • the process of rearrangement of the layout document is: firstly, the streaming tag data/document corresponding to the original document is obtained through a certain correspondence, such as the original document summary mode (not limited to md5, sha, etc.) Other database key-value pairs to specify the corresponding relationship; and then a streamed tag obtained from the stream tag data structure, which records the correspondence between the stream tag and its related content of the original document, and the correspondence is not limited to the document position. Offset, object number, etc.; finally find the corresponding content in the original document through the stream tag data, and directly submit it to the typesetting engine for layout display.
  • a certain correspondence such as the original document summary mode (not limited to md5, sha, etc.) Other database key-value pairs to specify the corresponding relationship
  • a streamed tag obtained from the stream tag data structure, which records the correspondence between the stream tag and its related content of the original document, and the correspondence is not limited to the document position.
  • Offset, object number, etc. find the corresponding content in the original document through
  • the rearrangement system of the layout document of the present application can effectively improve the rearrangement effect and the rearrangement efficiency of the layout document without using the method of externally storing the data by using the streaming mark data without modifying the original document.
  • the system may need to identify a layout document of a plurality of different flow logic information structures. If the flow logic information structure is not recognized, the flow logic information structure is not the local flow logic information. structure. If the streaming logical information structure is a new version, information such as the version number, whether it is preprocessed, or the like can be described in the streaming logical information structure. In addition, the system can also notify the upgrade reader version accordingly to finally recognize and understand the flow logic information structure.
  • the better correspondence is md5, which uniquely corresponds to an original document by marking the data.
  • md5 Specific to the correspondence of the content, it can be described using the document position offset, the object number, etc., and the details can be referred to the foregoing tag example.
  • the system has no specific requirements for the typesetting engine.
  • the mature typesetting engine (such as webkit) on the market can be selected according to the situation.
  • the typesetting engine can also be developed by itself.
  • the problem of the typesetting engine is not the focus of the present application.
  • the typesetting engine can be considered as a convention.
  • the system has no special requirements for the real-time markup engine, as long as the real-time markup engine processes faster and the effect is acceptable.
  • the real-time markup engine is generally implemented by an algorithm.
  • the advantage is that the algorithm can be continuously upgraded, and the speed and effect are continuously improved.
  • the server has more powerful cluster computing and historical data statistics, machine learning, artificial intelligence, etc.
  • the real-time markup engine can also be considered on the server side, so the calculation speed is not a problem, Big data, machine learning, etc. can be used to get better markup results, just need to use the network to transfer the markup results.
  • the real-time markup engine on the server side: if the network is good, the reader terminal can use the server-side tag data; if the network is not good, the reader terminal can use its own lightweight tag system tag data.
  • the rearrangement system of the layout document can have different application examples, and it can be a certain network system or a certain single device (for example, a mobile intelligent terminal such as a mobile phone or a tablet computer).
  • An electronic reading terminal is specifically described as a product example.
  • the electronic reading terminal 400 can rearrange the layout document, and has a streaming tag data acquiring unit 410 and a layout document rearranging unit 420, wherein: the streaming tag data acquiring unit 410 can acquire the streaming tag stored separately from the layout document.
  • the streaming tag data is associated with the layout document according to a preset logical information structure.
  • the streaming tag data marks the result of the layout parsing of the layout document according to the preset streaming logic information structure;
  • the layout document rearranging unit 420 searches the corresponding document content in the layout document according to the streaming tag data. Rearrange the layout documents.
  • the above-described streaming tag data obtaining unit 410 may pre-search for whether there is pre-processed stream tag data corresponding to the layout document: when there is streaming tag data corresponding to the layout document, the streaming tag data is acquired; When the document corresponds to the stream tag data, the layout document is marked according to a preset stream logic information structure to obtain the stream tag data and store it. Thus, regardless of whether the layout document has been pre-marked, the electronic reading terminal 400 can effectively rearrange and then display.
  • the existing layout document rearrangement technology mainly adopts two schemes: one is to directly obtain the original layout document, analyze, understand, mark, and rearrange the real-time version; the other is to stream mark the original document. Save the original document and reorder it by getting the streaming tag from the original document.
  • Both of the prior art solutions have certain defects, and the specific reasons are as described above.
  • the rearrangement method, system and electronic reading terminal of the layout document proposed by the present application have obvious advantages, which overcome the defects of the above two prior art solutions in rearrangement effect and efficiency, and solve the document coverage. Incomplete, difficult to synchronize documents, etc.
  • the rearrangement method, system and electronic reading terminal of the layout document have but are not limited to the following features:
  • Such correspondences include, but are not limited to, various digests or other means that can store specified relationships with the original document in an external memory, thereby eliminating the need to strongly associate the original document with the streaming logical information structure by modifying the original document.
  • the flow logic information structure only describes the logical information, and does not store the substantive document information in the logical information structure. Through a certain correspondence in the flow logic information structure, such as specifying a document offset, an object number, and the like, a correspondence relationship with the original document content is generated, and the data amount is small.
  • Streaming logical structure information is the result of layout analysis of layout documents. These streaming logical structure information is not limited to marking these layout documents by means of algorithm analysis or manual analysis, and the specific mark forms and means are various.
  • a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
  • processors CPUs
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • the memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory.
  • RAM random access memory
  • ROM read only memory
  • Memory is an example of a computer readable medium.
  • Computer readable media including both permanent and non-persistent, removable and non-removable media may be stored by any system or technology.
  • the information can be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device.
  • computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
  • embodiments of the present application can be provided as a system, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Abstract

A layout document rearrangement method, which comprises: acquiring stream-type flag data stored separately from a layout document, wherein a correlation is established between the stream-type flag data and the layout document according to a preset logic information structure (S110); and according to the stream-type flag data, searching for the corresponding document content in the layout document to conduct rearrangement on the layout document (S120). A layout document rearrangement system and an electronic reading terminal, wherein by means of the layout document rearrangement system, the flagged stream-type flag data is stored separately from the layout document, and the layout document is parsed according to the stream-type flag data during rearrangement, so that the stream-type flag data may not influence an original document, and therefore, the rearrangement effect and rearrangement efficiency of the layout document can be effectively improved in the case where the original document is not modified or damaged; and at the same time, the flagged stream-type flag data can be shared by multiple users and multiple terminals easily, thereby contributing to the technological upgrade of electronic devices.

Description

版式文档的重排方法、系统及电子阅读终端Rearrangement method, system and electronic reading terminal of layout document 技术领域Technical field
本申请涉及数字阅读技术,尤其涉及一种版式文档的重排方法、系统及电子阅读终端。The present application relates to digital reading technology, and in particular, to a rearrangement method and system for a layout document and an electronic reading terminal.
背景技术Background technique
随着互联网的蓬勃发展及硬件水平的不断提升,电子文档正在逐步取代传统的图书和纸质文档。同时,人们的阅读习惯也不再局限于传统的纸质出版物,电子阅读(或称数字阅读)比重正在逐渐变大。由于各种手机、电子书等便携性电子设备的流行,使得人们可以利用生活中的碎片时间进行电子阅读,例如乘坐公交、地铁时就可以阅读电子图书。在巨大的市场需求下,对于电子阅读的信息提供及处理方式也提出了更高的要求。With the booming Internet and rising hardware levels, electronic documents are gradually replacing traditional books and paper documents. At the same time, people's reading habits are no longer limited to traditional paper publications, and the proportion of electronic reading (or digital reading) is gradually increasing. Due to the popularity of portable electronic devices such as mobile phones and e-books, people can use the time of debris in their lives to read e-books. For example, when they take the bus or subway, they can read e-books. Under the huge market demand, higher requirements are also put forward for the information provision and processing methods of electronic reading.
众所周知,电子文档分为流式文档和版式文档。流式文档的基本单位是字符,它是有序字符的集合,长度为该文件所包含的字符个数。如Word文件就是一种流式文档,其主要记录流式信息,其中也可以加入某些定版的对象(如图像漂浮等)。版式文档作为一种绝对描述方式,在自定义的坐标系中,明确记录每个文档的位置和尺寸等,从而使得文档打印出来的结果和计算机上浏览的结构一致,而且在任何计算机环境下具有显示一致性的特点,由此可以保证真实地重现文档的原貌。如pdf、xps、ceb等文件就是较为典型的版式文档,它们具有版面固定、所见即所得(What you see is what you get,WYSIWYG)等特点,因而非常适合于成文后的文件发布、传播和存档。As we all know, electronic documents are divided into streaming documents and layout documents. The basic unit of a streaming document is a character, which is a collection of ordered characters, the length of which is the number of characters contained in the file. For example, a Word file is a streaming document that mainly records streaming information, and some of the finalized objects (such as images floating, etc.) can also be added. As an absolute description, the layout document clearly records the position and size of each document in a custom coordinate system, so that the printed results of the document are consistent with the structure viewed on the computer, and have any computer environment. The consistency of the display is displayed, thus ensuring the true reproduction of the original document. Documents such as pdf, xps, ceb, etc. are typical layout documents, which have the characteristics of what you see is what you get (WYSIWYG), so it is very suitable for document distribution, dissemination and Archive.
流式文档不存在电子阅读的排版障碍,目前已经有成熟的排版引擎可以适用。对于版式文档而言,则经常会因为版面固定而在小屏幕设备下导致阅读不便。如果版式文档的一个页面的内容在设备的屏幕上显示,将受到文字、图像等太小而看不 清楚等诸多限制;若对页面进行放大或缩小操作,将势必影响用户的阅读体验。这就要求电子阅读终端能够突破版式文档显示固定的局限性,以便可以根据版式文档的内容重新进行排版,最终保证用户具有较好的阅读体验。Streaming documents do not have the e-reading barriers of e-reading, and mature typography engines are now available. For layout documents, it is often inconvenient to read under small screen devices because of the fixed layout. If the content of a page of a layout document is displayed on the screen of the device, it will be too small for text, images, etc. Clearly and many other restrictions; if you zoom in or out on the page, it will affect the user's reading experience. This requires the electronic reading terminal to break through the fixed limitations of the layout document display, so that the layout can be re-typed according to the content of the layout document, and finally the user has a better reading experience.
对于版式文档的重排问题,业界纷纷推出了各种解决方案。实现版式文档重排的现有方案主要有以下两种:For the rearrangement of layout documents, the industry has introduced various solutions. There are two main solutions for implementing layout document reflow:
一种现有版式文档的重排方案是:为了补救在各种电子设备上阅读电子文档的需要,在制作相应的版式文档时标记版面的流式显示信息,并把标记数据存储于原始文档中一起发布。这种重排方案中,以版式文档中精确定位的版面描述为基础,在其中附加足够的流式逻辑结构信息用来支持重排、抽取表格结构等流式应用。例如,Adobe在1999年推出的PDF 1.3规范中引入了Logical Structure(逻辑结构),并在2001年推出的PDF 1.4产品中引入了tagged PDF来完善流式信息的表达,之后又在其发布的MARS文档格式中使用XML对这部分信息进行结构化的描述。这种xml标记语言,在理论上可以描述一切格式,如Word新版本的Docx就是基于xml进行描述的。此外,方正阿帕比在2010年发布的CEBXv1.1规范中定义了包含文章、章节、段落、片段、块的多层可嵌套的树状逻辑结构,其中的块通过直接引用版式页面上的版面块或图元(v1.2)来实现数据共享,由此可以支持在移动终端等电子阅读设备上的实时排版和屏幕自适应显示,具体的标准手册及软件可参考方正阿帕比的官方网站(http://www.apabi.cn/download/index.html)上的有关介绍。A rearrangement scheme for an existing layout document is: in order to remedy the need to read an electronic document on various electronic devices, mark the streaming display information of the layout when the corresponding layout document is created, and store the marked data in the original document. Released together. This rearrangement scheme is based on a precisely positioned layout description in the layout document, in which sufficient streaming logic structure information is added to support streaming applications such as rearrangement and extraction of table structures. For example, Adobe introduced Logical Structure in the PDF 1.3 specification introduced in 1999, and introduced tagged PDF in the PDF 1.4 product introduced in 2001 to improve the expression of streaming information, and then published in its MARS. A structured description of this part of the information is done using XML in the document format. This xml markup language can theoretically describe all formats, such as the new version of Word, Docx is described based on xml. In addition, Founder Apabi has defined a multi-layer nestable tree-like logical structure containing articles, chapters, paragraphs, fragments, and blocks in the CEBXv1.1 specification released in 2010, where the blocks are directly referenced on the layout page. Layout block or primitive (v1.2) to achieve data sharing, which can support real-time layout and screen adaptive display on electronic reading devices such as mobile terminals. The specific standard manual and software can refer to the official of Founder Apabi. A related introduction on the website (http://www.apabi.cn/download/index.html).
另一种现有版式文档的重排方案是:在打开一个版式文档时,通过某些预设的算法和规则,对版面信息进行解析,并根据解析的结果,交给排版引擎来进行实时重排,即通过实时排版来进行屏幕自适应显示。这种版式文档的实时重排方法,目前在各种电子阅读终端上得到广泛使用。Another rearrangement scheme of the existing layout document is: when opening a layout document, the layout information is parsed by some preset algorithms and rules, and according to the parsing result, the layout engine is given to perform real-time weighting. Row, that is, screen adaptive display by real-time typesetting. The real-time rearrangement method of such a layout document is currently widely used in various electronic reading terminals.
上述两种方案均可对版式文档进行重排显示,但它们均存在一定的问题,简述如下: Both of the above schemes can rearrange the layout documents, but they all have certain problems, as follows:
第一种重排方案中,文档内容、标记数据数据位于同一个文件内,没有标记过流式显示信息的版式电子文档的数据同步可能会存在困难。如果发现原始文档标记有错误,需要再次修改文档,而修改文档时有可能对原始文档造成破坏。尤其在大量文档已经归档的情况下,采用这种方式对文档进行同步可能会引发更多不良后果。In the first rearrangement scheme, the document content and the tag data data are located in the same file, and data synchronization of the layout electronic document without the over-display information may be difficult. If you find that the original document is marked with an error, you need to modify the document again, and the original document may be damaged when you modify the document. Especially when a large number of documents have been archived, synchronizing documents in this way may lead to more adverse consequences.
第二种重排方案在打开文档的时候实时地解析该版式文档,电子阅读终端在每次阅读时通过算法实时分析、标记、重排等,因此比较耗时耗电。此外,该重排方案依赖某种算法的可靠性,因而可能存在重排效果不好的问题。The second rearrangement scheme parses the layout document in real time when the document is opened, and the electronic reading terminal analyzes, marks, rearranges, and the like in real time through an algorithm every time, so that it is time consuming and power consuming. In addition, the rearrangement scheme relies on the reliability of an algorithm, and thus there may be a problem that the rearrangement effect is not good.
由此可见,现有版式文档的重排技术仍然存在着较大的改进空间,这就有必要提出一种有效提高重排效果和重排效率的版式文档重排的技术方案。It can be seen that there is still room for improvement in the rearrangement technology of the existing layout documents, and it is necessary to propose a technical solution for rearranging the layout documents which effectively improves the rearrangement effect and the rearrangement efficiency.
发明内容Summary of the invention
针对现有技术存在的缺陷,本申请的目的在于提供一种版式文档的重排方法、系统和电子阅读终端,可以有效地改善重排效果和重排效率。In view of the deficiencies of the prior art, the purpose of the present application is to provide a rearrangement method, system and electronic reading terminal for a layout document, which can effectively improve the rearrangement effect and the rearrangement efficiency.
为解决以上技术问题,本申请提供一种版式文档的重排方法,该方法包括:To solve the above technical problem, the present application provides a method for rearranging a layout document, the method comprising:
获取与版式文档分开存储的流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;
根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
可选地,流式标记数据包括与版式文档的文档内容相对应的逻辑信息,未包括版式文档的实质内容。Optionally, the streaming tag data includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document.
可选地,流式标记数据包括版式文档的摘要内容。Optionally, the streaming tag data includes summary content of the layout document.
可选地,预先查找是否存在与版式文档对应的预处理的流式标记数据;Optionally, pre-finding whether there is pre-processed stream tag data corresponding to the layout document;
若是,获取该流式标记数据;If yes, obtaining the streaming tag data;
若否,按照预设的流式逻辑信息结构对版式文档进行标记,以获取流式标记数据并进行存储。 If not, the layout document is marked according to a preset streaming logic information structure to obtain streaming tag data and store it.
可选地,通过算法分析或人工分析或算法分析与人工分析相结合的方式来对版式文档进行版面解析,在按照预设的流式逻辑信息结构进行标记后获得相应的流式标记数据。Optionally, the layout document is parsed by algorithm analysis or manual analysis or algorithm analysis combined with manual analysis, and the corresponding stream tag data is obtained after marking according to the preset flow logic information structure.
可选地,流式标记数据以文件或数据库记录的形式外置存储于服务器端或本地。Optionally, the streaming tag data is externally stored on the server side or locally in the form of a file or database record.
可选地,通过本地选定的流式逻辑信息结构,根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。Optionally, the layout document is rearranged by using the locally selected streaming logic information structure to find the corresponding document content in the layout document according to the streaming tag data.
可选地,本地选定的流式逻辑信息结构对应于本地算法实现的、本地预处理的、用户指定的、或最新标记技术标记的流式逻辑信息结构。Optionally, the locally selected streaming logical information structure corresponds to a streamed logical information structure implemented by a local algorithm, locally pre-processed, user-specified, or newly tagged with a technology tag.
可选地,通过本地选定的流式逻辑信息结构确定的流式标记数据与版式文档的对应关系,从流式标记数据中获取全部或部分的流式标记,针对每一流式标记查找到版式文档中的对应文档内容,交由排版引擎重新排版及显示。Optionally, all or part of the stream tag is obtained from the stream tag data by using the corresponding relationship between the stream tag data determined by the locally selected stream logic information structure and the layout document, and the layout is found for each stream tag. The corresponding document content in the document is re-formatted and displayed by the layout engine.
与此对应地,本申请同时提供一种版式文档的重排系统,该系统包括:Correspondingly, the present application also provides a rearrangement system for layout documents, the system comprising:
流式标记析取器,被配置为获取与流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;The stream tag extractor is configured to acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to the preset logical information structure;
存储器,被配置为存储流式标记数据,该流式标记数据与版式文档分开存储;a memory configured to store streaming tag data, the stream tag data being stored separately from the layout document;
排版引擎,被配置为根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The typesetting engine is configured to rearrange the layout document based on the corresponding document content in the layout document according to the streaming tag data.
此外,本申请还相应提供一种电子阅读终端,可对版式文档进行重排,该电子阅读终端被配置为:In addition, the present application further provides an electronic reading terminal, which can rearrange the layout document, and the electronic reading terminal is configured to:
获取与版式文档分开存储的流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;以及Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;
根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
与现有技术相比,本申请利用将流式标记数据外置存储的方式,在不修改不破 坏原始文档的情况下,可以实现有效地提高版式文档的重排效果和重排效率,具体而言:本申请通过对版式文档的实时流式逻辑标记及预处理标记,可以适应版面大小而进行重排显示,这既可以获得较好的排版效果,又可以很好地缩短重排时间;同时,通过版面分析并把版式文档的流式逻辑信息标记外置化,可以解决大量已有缺少流式标记数据的版式文档的重排问题,不需要担心修改对原始文档造成的破坏及其后续文档泛滥不统一的问题;此外,本申请中版式文档只需标记一次,即可供多用户多终端共享,从整个系统来讲,其不仅耗电耗时少,也有助于进行技术升级。Compared with the prior art, the present application utilizes the method of externally storing the streaming tag data, without modifying or breaking. In the case of a bad original document, it is possible to effectively improve the rearrangement effect and rearrangement efficiency of the layout document. Specifically, the present application can adapt to the layout size by real-time streaming logic marking and pre-processing mark on the layout document. Rearrangement display, which can achieve better typographic effect, and can shorten the reflow time very well; at the same time, through the layout analysis and externalization of the streaming logic information mark of the layout document, a large number of existing missing streams can be solved. The problem of rearrangement of the layout document of the tagged data does not need to worry about the damage caused by the modification of the original document and the inconsistency of subsequent documents. In addition, the layout document in this application only needs to be marked once, that is, it can be used for multi-user and multi-terminal. Sharing, from the perspective of the entire system, it not only consumes less power, but also helps to upgrade the technology.
附图说明DRAWINGS
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号来表示相同的部件。在以下附图中:Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not intended to be limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the following figures:
图1示出了根据本申请一个实施例的版式文档的重排方法的流程图;1 shows a flow chart of a rearrangement method of a layout document according to an embodiment of the present application;
图2为根据图1所示版式文档的重排方法的较为完整的实例;2 is a more complete example of a rearrangement method according to the layout document shown in FIG. 1;
图3示出了根据本申请一个实施例的版式文档的重排系统的组成框图;3 is a block diagram showing the composition of a rearrangement system of a layout document according to an embodiment of the present application;
图4示出了根据本申请一个实施例的电子阅读终端的组成框图。FIG. 4 shows a block diagram of the composition of an electronic reading terminal according to an embodiment of the present application.
具体实施方式detailed description
本申请的以下实施例阐述了一种版式文档的重排方法、系统和电子阅读终端,其利用流式标记数据外置存储的方式,在不修改不破坏原始文档的情况下,可以提高版式文档的重排效果和重排效率,以下结合附图与具体实施例来进行详细描述。The following embodiments of the present application describe a rearrangement method, system, and electronic reading terminal of a layout document, which can improve the layout document without modifying the original document without modifying the original storage by means of streaming tag data external storage. The rearrangement effect and the rearrangement efficiency are described in detail below with reference to the accompanying drawings and specific embodiments.
为了更好地理解本申请实施例的技术方案,先对有关的术语进行阐述。In order to better understand the technical solutions of the embodiments of the present application, related terms are first explained.
1、逻辑信息结构1, logical information structure
逻辑信息结构指文档组织信息的逻辑描述关系,比如对标题、段落、公式、表格、注释等结构化的信息来指定这些要素之间的逻辑结构关系(如一个图是居中的, 它的图题是什么内容等),这些逻辑结构关系构成一个个的有序排列。Logical information structure refers to the logical description relationship of document organization information, such as structured information such as title, paragraph, formula, table, comment, etc. to specify the logical structure relationship between these elements (such as a picture is centered, What is its picture title, etc.), these logical structure relationships constitute an ordered arrangement.
本申请实施例中的逻辑信息结构,特别地指定外置化存储文档和原始文档二者之间的关系,例如:它可以在外置化存储文档中指定段落,段落里面有多少span(不可拆分文字,比如不希望在显示的时候被断行的字符串),每个span有什么文字。当然也可以没有span,而直接说明这个段落里面都有什么文字,每个文字对应着原始版式文档的里面坐标或者文档二进制流偏移位置。The logical information structure in the embodiment of the present application specifically specifies the relationship between the externalized storage document and the original document, for example, it can specify a paragraph in the externalized storage document, and how many spans in the paragraph (non-splitable) Text, such as a string that you don't want to be broken when displayed, and what text is in each span. Of course, there is no span, but directly explain what text is in this paragraph, each text corresponds to the coordinates of the original layout document or the binary stream offset position of the document.
由此可见,本申请实施例的逻辑信息结构与纯版式文档信息逻辑结构有所不同。在纯版式文档逻辑信息结构中,它仅描述一个个字符、图像、图形有多大,在页面的哪个坐标位置显示。这种纯版式文档逻辑信息结构对于整篇文档而言,因为强调的是版面上的呈现而不是逻辑信息,由此可能导致文档各部分之间的顺序关系可能是全乱的,也可能是部分乱的。It can be seen that the logical information structure of the embodiment of the present application is different from the logical structure of the pure layout document information. In the pure-format document logic structure, it only describes how many characters, images, and graphics are displayed at which coordinate position of the page. This pure-form document logical information structure is for the entire document, because it emphasizes the presentation on the layout rather than the logical information, which may lead to the disorder relationship between the parts of the document, or it may be part of messy.
2、自适应呈现版面信息2, adaptive rendering layout information
上述的逻辑信息结构描述的是文档结构及版面的逻辑信息,按照该逻辑信息结构来识别并标记文档就可得到相应的流式标记数据。换而言之,流式标记数据中包括版式文档中的文档结构信息和/或版式文档中文档版面信息进行标记的结果,其中的文档版面信息是一种自适应呈现版面信息。The above logical information structure describes the logical structure of the document structure and the layout, and the corresponding flow type tag data can be obtained by identifying and marking the document according to the logical information structure. In other words, the stream tag data includes the result of tagging the document structure information in the layout document and/or the document layout information in the layout document, wherein the document layout information is an adaptive presentation layout information.
根据这种自适应呈现版面信息,电子阅读终端可以重构整个文档的版面,最终使得版式文档重排显示的结果与电子阅读终端屏幕的大小相匹配。例如,自适应呈现版面信息中描述版式文档的这部分是标题,那部分是段落(段落里面可能有1000个文字),等等。根据这种描述,在不同阅读设备上可以根据屏幕的大小调整显示的相应内容:例如,在电脑上可能一屏显示900个文字,即前述段落自适应显示为一屏多一点;在手机上可能一屏只显示100个文字,即前述段落自适应显示为10屏。但无论怎么自适应显示,某些内容不会显示到不同屏里面去,例如“and”这个词可以是一个span,它无论如何自适应都不能被断行。 According to the adaptive presentation layout information, the electronic reading terminal can reconstruct the layout of the entire document, and finally the result of the rearrangement display of the layout document matches the size of the screen of the electronic reading terminal. For example, the part of the adaptive rendering layout that describes the layout document is the title, which is the paragraph (there may be 1000 words in the paragraph), and so on. According to this description, the corresponding content of the display can be adjusted according to the size of the screen on different reading devices: for example, 900 characters may be displayed on one screen on the computer, that is, the aforementioned paragraph adaptively displays a little more than one screen; on the mobile phone, Only one text is displayed on one screen, that is, the aforementioned paragraph is adaptively displayed as 10 screens. But no matter how adaptive display, some content will not be displayed in different screens. For example, the word "and" can be a span, and it can't be broken anyway.
按照上述的逻辑信息结构可以解析版式文档而得到流式标记数据,也可以识别流式标记数据来重构版式文档。由此得到本申请实施例的构思:获取与版式文档分开存储的流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。以下进一步结合附图来进行详细阐述。According to the above logical information structure, the layout document can be parsed to obtain the stream tag data, and the stream tag data can also be identified to reconstruct the layout document. The concept of the embodiment of the present application is obtained: acquiring the streaming tag data stored separately from the layout document, the streaming tag data establishing a correspondence relationship with the layout document according to the preset logical information structure; searching for the layout according to the streaming tag data The corresponding document content in the document to rearrange the layout document. The following is further elaborated in conjunction with the drawings.
在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施,本领域技术人员可以在不违背本申请内涵的情况下做类似推广,因此本申请不受下面公开的具体实施例的限制。参见图1,其示出根据本申请版式文档的重排方法的一个实施例。该方法的构思侧重于流式标记数据的存储外置化,以便根据流式标记数据与版式文档之间的对应关系,来确定版式文档中的文档结构信息以及文档版面信息,从而较好地实现版式文档的重排显示。这一实施例采用了流式标记数据外置存储的方式,可以在不修改不破坏原始文档的前提下,有效改善版式文档的重排效率和重排效果,以下进一步详细描述。Numerous specific details are set forth in the description below in order to provide a thorough understanding of the application. However, the present application can be implemented in many other ways than those described herein, and those skilled in the art can make similar promotion without departing from the scope of the present application, and thus the present application is not limited by the specific embodiments disclosed below. Referring to Figure 1, there is shown one embodiment of a rearrangement method in accordance with a layout document of the present application. The concept of the method focuses on the storage externalization of the stream tag data, so as to determine the document structure information and the document layout information in the layout document according to the correspondence between the stream tag data and the layout document, thereby achieving better implementation. Rearrangement of layout documents. This embodiment adopts the method of streaming tag data external storage, and can effectively improve the rearrangement efficiency and rearrangement effect of the layout document without modifying the original document without destroying the original document, which is described in further detail below.
本申请实施例的版式文档可以指整个的版式文档,也可以指版式文档中的一页或几页。这种版式文档采用绝对描述方式,其在自定义的坐标系中明确记录每个文档的位置和尺寸等,从而可使得文档打印出来的结果和计算机上浏览的结构一致,实现所见即所得的效果。The layout document of the embodiment of the present application may refer to the entire layout document, and may also refer to one or several pages in the layout document. This type of document is in absolute description, which clearly records the position and size of each document in a custom coordinate system, so that the printed results of the document are consistent with the structure viewed on the computer, realizing what you see is what you get. effect.
如上所述,本申请实施例的流式标记数据包括版式文档中文档结构信息和/或版式文档中文档版面信息。文档结构信息包括文档的章节信息、各章节内部内容顺序以及内容块中各图元顺序等。文档版面信息包括版式文档相应版面重排时决定图元及其它要素最终呈现效果的信息、图元自身或者内容块自身的版面信息、以及同一内容块中各图元之间或者各内容块之间的关系,例如指定图片的文字衬托方式或者多个内容块的分栏信息等。此处的版面重排指的是由于版面大小或版面内容发生变化时,根据一定规则重新组织版面 中的各图元及其它要素,以形成版面展现的结果。此外,本申请实施例的流式标记数据还可以包括阅读线索信息。除了上述文档结构信息提供的阅读顺序外,所述流式标记数据还可以据根据具体需要提供的额外的阅读顺序信息。所述阅读线索信息是提供给用户的可选的阅读顺序信息。As described above, the streaming tag data of the embodiment of the present application includes document structure information in the layout document and/or document layout information in the layout document. The document structure information includes chapter information of the document, the internal content order of each chapter, and the sequence of each element in the content block. The document layout information includes information for determining the final rendering effect of the primitives and other elements when the corresponding layout of the layout document, the layout information of the primitive itself or the content block itself, and between the primitives in the same content block or between the content blocks. The relationship, for example, the text setting mode of the specified picture or the column information of a plurality of content blocks. The layout rearrangement here refers to reorganizing the layout according to certain rules when the layout size or layout content changes. Each element and other elements in the form to form the result of the layout. In addition, the streaming tag data of the embodiment of the present application may further include reading cue information. In addition to the reading order provided by the above document structure information, the stream tag data may also be provided according to additional reading order information provided according to specific needs. The reading clue information is optional reading order information provided to the user.
值得注意的是,本申请实施例的流式标记数据包括与版式文档的文档内容相对应的逻辑信息,而未包括版式文档的实质内容。特别地,这种流式标记数据可以包括版式文档的摘要信息,例如基于MD5或SHA算法得到的版式文档的摘要信息。按照这种方式以预定的流式逻辑信息结构来对版式文档进行标记,由此获得的流式标记数据可以实现流式标记数据与版式文档之间的强关联。It should be noted that the streaming tag data of the embodiment of the present application includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document. In particular, such streaming tag data may include summary information of a layout document, such as summary information of a layout document obtained based on the MD5 or SHA algorithm. The layout document is marked in a predetermined streaming logical information structure in such a manner that the obtained streaming mark data can achieve a strong association between the streaming mark data and the layout document.
本申请实施例通过以一定的逻辑标记算法来分析需要重排的版式文档,可以有效地析取版式文档中那些字可以组成词,哪些词可以组成段落,哪些字是上标或者下标,哪些对象是图,哪些文字是图题等等,由此就能够更好地对版式文档进行充分、有效的描述,最终有利于版式文档的重排显示。与此不同,通常的版式文档中仅描述页面中每个文字、图形或图像的位置,而没有从逻辑上描述这些对象之间的关系,由此会影响版式文档的重排效率及显示效果。The embodiment of the present application analyzes the layout documents that need to be rearranged by using a certain logical labeling algorithm, and can effectively extract the words in the layout document to form words, which words can form paragraphs, which words are superscripts or subscripts, which The object is a graph, which text is a graph title, etc., thereby enabling a full and effective description of the layout document, which ultimately facilitates the rearrangement of the layout document. In contrast, the usual layout document only describes the position of each text, graphic or image in the page, and does not logically describe the relationship between these objects, which will affect the rearrangement efficiency and display effect of the layout document.
本申请实施例的流式标记数据的具体逻辑信息结构可以参照某些现有技术标准确定,如方正阿帕比的有关技术手册确定;也可以重新自定义有关的逻辑信息结构,以保证具有较好的兼容性为好,不再赘述。The specific logical information structure of the stream tag data in the embodiment of the present application may be determined by referring to some prior art standards, such as the relevant technical manual of Founder Apabi; or the related logical information structure may be re-customized to ensure that there is a comparison. Good compatibility is good, no longer repeat them.
请参见图1,为本申请版式文档的重排方法的实施例。该实施例通过利用外置存储的流式标记数据对版式文档进行重排的方式,可以在不修改不破坏原始文档的情况下,有效地提高版式文档的重排效果和重排效率。下面对本申请实施例所述版式文档的重排方法的具体技术方案进行详细的描述。Please refer to FIG. 1 , which is an embodiment of a rearrangement method of a layout document of the present application. In this embodiment, by using the externally stored streaming tag data to rearrange the layout document, the rearrangement effect and the rearrangement efficiency of the layout document can be effectively improved without modifying the original document without destroying. The specific technical solutions of the rearrangement method of the layout document described in the embodiment of the present application are described in detail below.
在步骤S110处,获取与版式文档分开存储的流式标记数据。该流式标记 数据根据预设的逻辑信息结构与版式文档之间建立对应关系,如采用原始文档摘要方式及其他数据库键值对来建立对应关系。换而言之,这些流式标记数据是按照预设的流式逻辑信息结构对版式文档进行版面解析的结果;通过这些流式标记数据,可以查找版式文档中的相应文档内容的结构信息及版面信息。由于该流式标记数据记录有丰富的版式文档中的文档结构信息、文档版面信息等,因而可以较好地与原始的版式文档(简称原始文档)之间建立强关联的对应关系。这样,根据这些优化的流式标记数据不仅可以查找原始文档中的对应文档内容,同时可以确定这些文档内容的结构信息及版面信息,由此有助于方便地对整个版式文档进行重排显示。At step S110, streaming tag data stored separately from the layout document is acquired. The streaming tag The data is based on a preset logical information structure and a layout document, such as the original document summary method and other database key value pairs to establish a correspondence. In other words, the streaming tag data is the result of the layout parsing of the layout document according to the preset streaming logic information structure; by using the streaming tag data, the structure information and layout of the corresponding document content in the layout document can be found. information. Since the stream tag data record has rich document structure information, document layout information, and the like in the layout document, a strong association relationship can be established well with the original layout document (abbreviated as the original document). In this way, according to the optimized stream tag data, not only the corresponding document content in the original document can be searched, but also the structural information and the layout information of the document content can be determined, thereby facilitating the rearrangement of the entire layout document conveniently.
在步骤S120处,根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。通过该步骤可以完成版式文档的重排,最终得到与电子阅读终端等显示界面相适应的、重排的版式文档(简称重排文档)。由于本申请实施例中的流式标记数据对版式文档进行了更充分、有效的描述,因而有助于提高版式文档的重排显示效果。At step S120, the layout document is rearranged according to the corresponding content of the document in the layout document according to the streaming tag data. Through this step, the rearrangement of the layout document can be completed, and finally, the rearranged layout document (referred to as rearrangement document) adapted to the display interface such as the electronic reading terminal can be obtained. Since the streaming tag data in the embodiment of the present application describes the layout document more fully and effectively, it helps to improve the rearrangement display effect of the layout document.
为了深入理解本申请实施例的版式文档的重排方法的技术方案,进一步对步骤S110和步骤S120详细叙述如下。In order to further understand the technical solution of the rearrangement method of the layout document of the embodiment of the present application, step S110 and step S120 are further described in detail as follows.
在步骤S110,获取以流式逻辑信息结构与版式文档相关联的流式标记数据,该流式标记数据与版式文档分开存储。本申请实施例通过预处理方式或实时标记版式文档的方式来获得流式标记数据,但无论是预处理的流式标记数据还是实时标记的流式标记数据,都可以与版式文档分开进行存储,之后通过查找与该流式标记数据相对应的版式文档内容来对原始文档进行重排。At step S110, streaming tag data associated with the layout document in a streaming logical information structure is obtained, the streaming tag data being stored separately from the layout document. The embodiment of the present application obtains the streaming tag data by means of a pre-processing method or a real-time tag layout document, but the pre-processed stream tag data or the real-time tag stream tag data can be stored separately from the layout document. The original document is then rearranged by looking up the layout document content corresponding to the streaming tag data.
该步骤S110可以包含两个具体步骤:一是查找是否有与版式文档关联的预处理过的流式标记数据,二是在没有预处理的流式标记数据的情况下对版式文档进行实时的标记。具体而言,该步骤S110的基本流程是:预先查找是否存在与版式文档对应的预处理的流式标记数据;存在的话,获取该流式 标记数据;不存在的话,则按照预设的流式逻辑信息结构识别版式文档并进行标记,以获取流式标记数据并进行存储。具体地,这些流式标记数据可以以文件或数据库记录的形式外置存储于服务器端(例如,云端服务器)或本地,这样可以方便地做到与版式文档分开存储。The step S110 may include two specific steps: one is to find out whether there is pre-processed stream tag data associated with the layout document, and the other is to mark the layout document in real time without the pre-processed stream tag data. . Specifically, the basic process of step S110 is: pre-finding whether there is pre-processed streaming tag data corresponding to the layout document; if yes, acquiring the streaming Mark data; if it does not exist, the layout document is identified and marked according to the preset streaming logic information structure to obtain the stream tag data and store it. Specifically, these streaming tag data can be externally stored in a file or database record on a server side (for example, a cloud server) or locally, so that it can be conveniently stored separately from the layout document.
在步骤S110中,以预处理方式或实时标记方式获取流式标记数据的基本过程是:按照预设的流式逻辑信息结构对版式文档进行版面解析并进行标记,得到的所有标记信息的集合构成流式标记数据。可以理解的是,在根据上述方式标记版式文档时,可通过算法分析、人工分析、或算法分析与人工分析相结合的方式来对版式文档进行版面解析,并按照预设的流式逻辑信息结构进行标记,以获得相应的流式标记数据。这些流式标记数据与原始文档之间按照预设的流式逻辑信息结构建立了强关联的对应关系。In step S110, the basic process of acquiring the streaming tag data in the pre-processing manner or the real-time tagging manner is: performing parsing and marking on the layout document according to the preset streaming logical information structure, and collecting all the tag information obtained. Stream tag data. It can be understood that when the layout document is marked according to the above manner, the layout document can be parsed by algorithm analysis, manual analysis, or combination of algorithm analysis and manual analysis, and according to a preset flow logic information structure. Mark to get the corresponding stream tag data. A correspondence between the stream tag data and the original document is established according to a preset flow logic information structure.
下面列出一个pdf版式文档及其流式标记数据,来作为流式标记数据外置化的具体例子来阐述本申请所述方法的技术方案。A pdf layout document and its streaming tag data are listed below to illustrate the technical solution of the method described in the present application as a specific example of streaming tag data externalization.
(1)原始的版式文档如下:(1) The original layout document is as follows:
a.pdfA.pdf
2 0 obj<</Type/Page2 0 obj<</Type/Page
/Contents 30R/Contents 30R
...>>...>>
endobjEndobj
3 0 obj<</Length...>>3 0 obj<</Length...>>
stream...Stream...
...(Here is some text1)...//对应一个词,坐标x=100,y=100,内容是:你好,...(Here is some text1)...// corresponds to a word, coordinates x=100, y=100, the content is: Hello,
...(Here is some text2)...//对应一个词,坐标x=110,y=200,内容是:标题...(Here is some text2)...// corresponds to a word, coordinates x=110, y=200, the content is: title
...(Here is some text3)...//对应一个词,坐标x=130,y=100,内容是:中国。...(Here is some text3)...// corresponds to a word, coordinates x=130, y=100, the content is: China.
...endstream ...endstream
endobjEndobj
(2)流式标记数据如下:(2) The streaming tag data is as follows:
a.mark,A.mark,
<SrcDoc>“xxx”</SrcDoc>//xxx对应a.pdf整个文档的摘要<SrcDoc> "xxx"</SrcDoc>//xxx corresponds to the abstract of the entire document a.pdf
<Head>//一个标题<Head>//a title
<obj=3,offset=xxx,length=xxx>//对应内容是:标题<obj=3, offset=xxx, length=xxx>//The corresponding content is: title
</Head></Head>
<P>//一个段落<P>//One paragraph
<obj=3,offset=xxx,length=xxx>//对应内容是:你好,<obj=3,offset=xxx,length=xxx>//The corresponding content is: Hello,
<obj=3,offset=xxx,length=xxx>//对应内容是,中国。<obj=3, offset=xxx, length=xxx>// The corresponding content is, China.
</P></P>
由此可以看出,该版式文档的标记实例中,按照预设的流式逻辑信息结构对版式文档进行版面解析,并将解析结果的标记数据集合作为版式文档的流式标记数据。其中对各部分文档内容标记有丰富的流式结构信息及版面信息,因而可以较好地对应于原始的版式文档,最终可以方便地用来进行重排显示。It can be seen that in the tag instance of the layout document, the layout document is parsed according to the preset streaming logic information structure, and the tag data set of the parsing result is used as the stream tag data of the layout document. The content of each part of the document is marked with rich stream structure information and layout information, so that it can correspond to the original layout document better, and finally can be conveniently used for rearrangement display.
需要指出的是,本实施例所述的流式标记数据也可不局限于上述标记实例的描述方式,它完全可以采用二进制的描述、xml描述等等。实际上,本申请实施例并不侧重某种文件格式的具体描述标准,因而对如何形成所述的流式标记数据不再进行详细说明。It should be noted that the stream tag data described in this embodiment may not be limited to the description manner of the above tag instance, and it may adopt a binary description, an xml description, and the like. In fact, the embodiments of the present application do not focus on specific description standards of a certain file format, and thus the detailed description of how to form the stream tag data is not described.
在步骤S120中,根据流式标记数据查找版式文档中对应的文档内容,并识别所述文档内容的结构信息及版面信息(例如确定某些文档内容是文字、图形或表格,并确定它们之间的关系,据此确定相应的排版方案),来对版式文档进行重排。经过重排的版式文档可以在移动终端等电子设备上进行实时排版和屏幕自适应显示,由此可以有效提高用户的阅读体验。这里的屏幕 自适应显示包括获取设备的屏幕尺寸信息,并根据屏幕尺寸信息自适应对文档内容进行排版。In step S120, the corresponding document content in the layout document is searched according to the streaming tag data, and the structural information and the layout information of the document content are identified (for example, determining that some document content is a text, a graphic or a table, and determining between them The relationship, according to which the corresponding typesetting scheme is determined, to rearrange the layout documents. The rearranged layout document can perform real-time layout and screen adaptive display on an electronic device such as a mobile terminal, thereby effectively improving the user's reading experience. The screen here The adaptive display includes acquiring screen size information of the device, and adaptively formatting the document content according to the screen size information.
此处对版式文档的重排包括:在版面大小或版面内容发生变化时,根据一定规则重新组织版面中的各图元及其它要素,形成版面展现结果的过程。本申请实施例对排版引擎可不做特定要求,现在市面上的成熟的排版引擎(如webkit)均可作为选择对象,当然用户也可自主开发其它合适的排版引擎,在此不再展开说明。The rearrangement of the layout document here includes the process of reorganizing the primitives and other elements in the layout according to certain rules when the layout size or the layout content changes, forming a layout presentation result. The embodiment of the present application does not require specific requirements for the typesetting engine. The mature typesetting engine (such as webkit) on the market can be selected as a selection object. Of course, the user can also independently develop other suitable typesetting engines, and the description will not be repeated here.
如前所述,本申请实施例通过预设的流式逻辑信息结构建立起了流式标记数据与版式文件之间的对应关系。根据该流式逻辑信息结构,可以对版式文档进行预先标记或实时标记,由此得到相应的流式标记数据。对版式文件进行预先标记或实时标记可以理解成是对版式文档进行解析的过程。根据该流式逻辑信息结构,也可以根据标记好的流式标记数据来重构版式文档,具体流式标记数据中的文档结构信息及版面信息来查找版式文档中的对应文档内容,并根据当前的显示要求(例如字体大小的要求、根据屏幕尺寸自适应显示的要求)对这些文档内容进行排版显示。简单地讲,重构版式文档可以理解成是一个对流式标记数据进行反解析的过程。As described above, the embodiment of the present application establishes a correspondence between the streaming tag data and the layout file by using a preset streaming logic information structure. According to the streaming logic information structure, the layout document can be pre-marked or marked in real time, thereby obtaining corresponding stream tag data. Pre-marking or real-time tagging of a layout file can be understood as a process of parsing a layout document. According to the streaming logic information structure, the layout document may also be reconstructed according to the marked stream tag data, and the document structure information and the layout information in the specific tag data are used to find the corresponding document content in the layout document, and according to the current The display requirements (such as font size requirements, adaptive display requirements according to screen size) are displayed in a typographical manner. Simply put, refactoring a layout document can be understood as a process of decomposing streaming tag data.
由于本申请实施例通过某种流式逻辑信息结构在版式文档与流式标记数据之间建立了对应关系,因此在重新排版时应当保证标记时的流式逻辑信息结构与重排时的流式逻辑信息结构保持匹配。可以理解的是,标记版式文档时的预设流式逻辑信息结构与重排时的流式逻辑信息结构实际上可能存在并不匹配的情况,因此排版引擎在选定流式逻辑信息结构时,一般应优先考虑对应于本地算法实现的、本地预处理的、用户指定的、或最新标记技术标记的流式逻辑信息结构。Since the embodiment of the present application establishes a correspondence between the layout document and the streaming tag data through a certain flow logic information structure, the streaming logic information structure and the reflowing process at the time of labeling should be guaranteed during re-typesetting. The logical information structure remains matched. It can be understood that the preset streaming logic information structure when marking the layout document and the streaming logic information structure when rearranging may actually have a mismatch, so when the typesetting engine selects the streaming logic information structure, Streaming logic information structures corresponding to local algorithm implementations, locally pre-processed, user-specified, or up-to-date tagging technology tags should generally be prioritized.
本实施例中通过上述规则在本地选定某一流式逻辑信息结构之后,如果该流式逻辑信息结构与标记版式文档时的逻辑信息结构能够匹配,则在版式 文档重排时就可通过该本地选定的流式逻辑信息结构,来对版式文档进行解析,即根据流式标记数据查找版式文档中的对应文档内容,并进一步识别这些文档内容的结构信息及版面信息,最终来实现对版式文档进行重排。In this embodiment, after a certain flow logic information structure is locally selected by the above rule, if the logical information structure of the flow logic information structure and the mark layout document can match, the layout is When the document is rearranged, the layout document can be parsed through the locally selected streaming logic information structure, that is, the corresponding document content in the layout document is searched according to the stream tag data, and the structural information of the content of the document is further identified and Layout information, ultimately to reorder the layout documents.
由此可见,对于预先处理的流式标记数据,如果本地选定的流式逻辑信息结构与标记时的流式逻辑信息结构相匹配,则重排时流式标记数据与版式文档之间可以建立起有效的对应关系,它与标记时流式标记数据与版式文档的对应关系一致。这样,在版式文档重排时就可从流式标记数据中获取全部或部分的流式标记(记录),由此可针对每一流式标记查找到版式文档中的对应文档内容,并识别这些文档内容的结构信息及版面信息,之后就可交由排版引擎重新排版及显示。It can be seen that for the pre-processed stream tag data, if the locally selected stream logic information structure matches the stream logic information structure at the time of the tag, the reflow type can be established between the stream tag data and the layout document. A valid correspondence, which is consistent with the correspondence between the tagged data and the layout document at the time of marking. In this way, all or part of the streaming tags (records) can be obtained from the streaming tag data when the layout document is rearranged, so that the corresponding document content in the layout document can be found for each streaming tag and recognized. The structural information and layout information of the content can then be re-formatted and displayed by the layout engine.
可以理解的是,针对本地选定的流式逻辑信息结构一般需要确定某种解析算法对版式文档进行重排。这些解析算法可以有不同的方案,但由于本申请并不侧重如何实时解析某种系统算法,故此对相应的解析算法亦不具体展开描述。It can be understood that for a locally selected streaming logic information structure, it is generally necessary to determine a certain parsing algorithm to rearrange the layout document. These parsing algorithms may have different schemes, but since the present application does not focus on how to parse a certain system algorithm in real time, the corresponding parsing algorithm is not specifically described.
参见图2,为根据图1所示版式文档的重排方法的一个较完整的实例。该实例主要包括以下步骤210~步骤250,以下简要进行描述。Referring to FIG. 2, a more complete example of the rearrangement method according to the layout document shown in FIG. This example mainly includes the following steps 210 to 250, which are briefly described below.
步骤S210,接收版式文档。所述版式文档可以根据当前的显示条件(例如根据显示屏幕的大小等因素)进行重排。Step S210, receiving a layout document. The layout document may be rearranged according to current display conditions (eg, according to factors such as the size of the display screen).
步骤S220,查找是否存在与版式文档对应的流式标记数据。In step S220, it is found whether there is streaming tag data corresponding to the layout document.
查找是否存在与版式文档对应的流式标记数据也就是判断是否存在预处理的流式标记数据,该流式标记数据为对版式文档进行流式标记预处理而得到的。得到的流式标记数据与版式文档可以分开进行存储。若存在预处理的流式标记数据,则进入步骤S230,若不存在预处理的流式标记数据进入步骤S240。It is found whether there is streaming tag data corresponding to the layout document, that is, whether there is pre-processed streaming tag data, which is obtained by performing stream tag pre-processing on the layout document. The resulting streaming tag data can be stored separately from the layout document. If there is pre-processed stream tag data, the process proceeds to step S230, and if there is no pre-processed stream tag data, the process proceeds to step S240.
步骤S230,获取该预处理的流式标记数据,作为版式文档的重排时的解 析要素,来实现对版式文档的重排。Step S230, acquiring the pre-processed stream tag data as a solution for rearrangement of the layout document Analyze the elements to achieve rearrangement of the layout document.
步骤S240,实时标记该版式文档,以获取该流式标记数据并进行存储,实现对版式文档的流式标记数据进行更新。Step S240: Mark the layout document in real time to obtain the streaming tag data and store it, so as to update the stream tag data of the layout document.
步骤S250,根据获取的流式标记数据,来查找版式文档中的对应文档内容,并识别所述文档内容的结构信息及版面信息,来实现对版式数据进行重排。Step S250: Searching the corresponding document content in the layout document according to the acquired streaming tag data, and identifying the structural information and the layout information of the document content, so as to implement rearrangement of the layout data.
图2作为图1所示版式文档的重排方法的一个完整实例,可以清晰地显示本申请所阐述技术方案的基本脉络,其中的大多数细节已在图1中进行阐述。如图2中所描述的内容有不详尽之处,请进一步参见对图1的描述部分。FIG. 2 is a complete example of the rearrangement method of the layout document shown in FIG. 1, and the basic context of the technical solution described in the present application can be clearly shown, and most of the details are illustrated in FIG. The content described in Figure 2 is not exhaustive, please refer further to the description of Figure 1.
通过以上对图1和图2的描述可知,本申请针对现有版式文档重排技术的缺点,采用外部存储流式标记数据的方式,即通过分析并把版面的流式逻辑信息标记外置化,可以解决大量已有缺少流式标记数据的重排问题,不需要担心修改对原始文档造成的破坏及其后续文档泛滥不统一的问题。同时,本申请通过对版面的实时流式逻辑标记及预处理标记,对版式文档进行了更充分、有效的描述,因而既可以获得较好的排版效果,也可以很好地缩短重排时间。此外,本申请采用流式标记数据外置存储的方式,在流式标记数据中记录标记类型、电子阅读系统版本、服务器识别系统版本、人工识别版本等内容,可使得版式文档只需标记一次,即可供多用户多终端共享,因而也有助于对电子阅读系统进行技术升级。As can be seen from the description of FIG. 1 and FIG. 2, the present application is directed to the shortcoming of the existing layout document rearrangement technology, and adopts an external storage method for streaming data, that is, by analyzing and externalizing the flow logic information mark of the layout. It can solve a large number of reordering problems that have no streaming tag data, and there is no need to worry about the damage caused by the original document and the inconsistency of subsequent document flooding. At the same time, the present application provides a more complete and effective description of the layout document by real-time streaming logic marking and pre-processing mark on the layout, thereby obtaining better typesetting effect and shortening the rearrangement time. In addition, the present application adopts the method of streaming tag data external storage, and records the tag type, the electronic reading system version, the server identification system version, the manual identification version and the like in the stream tag data, so that the layout document only needs to be marked once. That is, it can be shared by multiple users and multiple terminals, which also helps to upgrade the electronic reading system.
需要指出的是,本申请中的流式标记数据一般可通过某种算法来标记,标记后需再将标记结果外置存储,以方便下一次使用。当然,这种标记过程也可以通过人工方式或者算法与人工相结合的方式来进行标记。但不管是采用算法标记,还是采用人工标记,抑或人工与算法相结合的方式来标记版式文档,本申请实施例都应该按照某种指定的标准来获得这些流式标记数据。但是,本申请实施例并不拘泥于某种特定的标准,本申请实施例中的流式标记数据可以采用很多种不同 的逻辑信息描述标准,它们既可以用xml描述,也可以用二进制描述,而这些标记结果还可以直接存储在数据库或云端服务器,在此不展开说明。It should be noted that the stream tag data in the present application can generally be marked by an algorithm, and the tag result needs to be externally stored for convenience for the next use. Of course, this marking process can also be marked by manual means or by combining algorithms with labor. However, whether the algorithm mark is used, the manual mark is used, or the manual and the algorithm are combined to mark the layout document, the embodiment of the present application should obtain the flow mark data according to a specified standard. However, the embodiments of the present application are not limited to a specific standard, and the stream tag data in the embodiment of the present application may be used in many different ways. The logical information describes the standard. They can be described either in xml or in binary. The result of these tags can also be stored directly in the database or cloud server.
以上对版式文档的重排方法进行了详细的描述。在此基础上,本申请还相应地构设了版式文档的重排系统(以下简称系统),以下进行详细的描述。The above describes the rearrangement method of the layout document in detail. On this basis, the present application also correspondingly configures a rearrangement system (hereinafter referred to as a system) of the layout document, which will be described in detail below.
顺便指出的是,本实施例系统中如有描述不尽之处,请参见前文方法部分的描述内容;同样地,前述方法部分中如涉及到系统,也可引见以下系统部分的描述内容。Incidentally, if there is a description in the system of the present embodiment, please refer to the description of the method section in the foregoing. Similarly, if the system is involved in the foregoing method, the description of the following system parts can also be introduced.
参见图3,示出了根据本申请一个实施例的版式文档的重排系统的组成框图。该版式文档的重排系统(简称系统)300由流式标记析取器310、存储器320、排版引擎330及流式标记预处理器340等部分构成,其通过流式标记数据外置存储的方式,在不修改不破坏原始文档的情况下,可以实现有效地提高版式文档的重排效果和重排效率,以下进一步对该系统300的各部分结构及功能进行描述。Referring to FIG. 3, a block diagram of a rearrangement system of a layout document according to an embodiment of the present application is shown. The rearrangement system (referred to as system) 300 of the layout document is composed of a stream tag extractor 310, a memory 320, a typesetting engine 330, and a stream tag preprocessor 340, and the like, which is externally stored by streaming tag data. The rearrangement effect and the rearrangement efficiency of the layout document can be effectively improved without modifying the original document without modification. The structure and function of each part of the system 300 are further described below.
如图3所示,该300具有流式标记析取器310,其可以获取与流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系,即该流式标记数据是按照预设的流式逻辑信息结构标记对版式文档进行版面解析的结果。具体地,流式标记析取器310包括流式标记查找模块311、流式标记读取模块312及实时标记引擎模块313等,其中:流式标记查找模块311被配置为预先查找是否存在与版式文档对应的流式标记数据;流式标记读取模块312,被配置为存在与版式文档对应的流式标记数据时,获取该流式标记数据;实时标记引擎模块313,被配置为不存在与版式文档对应的流式标记数据时,按照预设的流式逻辑信息结构对版式文档进行标记,以获得流式标记数据并进行存储。所述的实时标记引擎模块313具体可以配置在本地或服务器端,其可以通过算法分析或人工分析或算法与人工相结合的方式来对版式文档进行版面解析,在按照预设的流式逻辑信息结构进行标记后, 获得相应的流式标记数据。As shown in FIG. 3, the 300 has a stream tag extractor 310, which can acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to a preset logical information structure, that is, the stream. The tag data is the result of the layout parsing of the layout document according to the preset stream logic information structure tag. Specifically, the stream tag extractor 310 includes a stream tag lookup module 311, a stream tag read module 312, a real-time tag engine module 313, and the like, wherein: the stream tag lookup module 311 is configured to pre-follow whether or not there is a layout The stream tag data corresponding to the document; the stream tag reading module 312 is configured to acquire the stream tag data when there is streaming tag data corresponding to the layout document; the real-time tag engine module 313 is configured to not exist and When the layout document corresponds to the stream tag data, the layout document is marked according to a preset streaming logic information structure to obtain the stream tag data and store it. The real-time tag engine module 313 can be configured locally or on the server side, and can perform layout analysis on the layout document by algorithm analysis or manual analysis or combination of algorithm and manual, according to preset flow logic information. After the structure is marked, Obtain the corresponding stream tag data.
如图3所示,该系统300还具有存储器320,它可以是云存储器或本地存储器,可以以文件或数据库记录的形式存储流式标记数据,这些流式标记数据与版式文档分开存储。在本实施例中,流式标记数据是按照预设的流式逻辑信息结构对版式文档进行版面解析的标记结果,其中记录有丰富的流式文档结构信息及相应文档内容的结构信息与版面信息,因而可以较好地对应于原始的版式文档,有利于方便地重构版式文档,即对版式文档进行重排。As shown in FIG. 3, the system 300 also has a memory 320, which may be a cloud memory or a local memory, which may store streaming tag data in the form of a file or database record, which is stored separately from the layout document. In this embodiment, the streaming tag data is a tag result of parsing the layout document according to a preset streaming logic information structure, wherein the rich streaming document structure information and the structural information and layout information of the corresponding document content are recorded. Therefore, it can better correspond to the original layout document, which is convenient for reconfiguring the layout document, that is, rearranging the layout document.
该系统300还具有排版引擎330,根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。具体地,排版引擎330可通过本地选定的流式逻辑信息结构,根据流式标记数据查找版式文档中对应的文档内容,以便来对版式文档进行重排。重排的基本过程是,通过本地选定的流式逻辑信息结构确定的流式标记数据与版式文档之间的对应关系,从流式标记数据中获取和版式文档对应的流式标记(记录),针对每一流式标记查找到版式文档中的对应内容,之后交由排版引擎重新排版。The system 300 also has a typesetting engine 330 that rearranges the layout documents based on the streamed tag data to find corresponding document content in the layout document. Specifically, the typesetting engine 330 can search for the corresponding document content in the layout document according to the stream tag data through the locally selected streaming logic information structure, so as to rearrange the layout document. The basic process of rearrangement is to obtain the streaming tag (record) corresponding to the layout document from the stream tag data by the correspondence between the stream tag data determined by the locally selected stream logic information structure and the layout document. Find the corresponding content in the layout document for each streaming tag, and then re-type it by the layout engine.
进一步地,该系统300还可以具有流式标记预处理器340,它被配置在本地或服务器端,可以对版式文档预先进行标记,并在得到流式标记数据后进行存储。通常地,流式标记预处理器340获得的预处理标记可在服务器端用算法对文档进行处理,也可通过人工方式或算法与人工结合的方式对文档进行标记。通常的,预处理标记情况下,可提供一定的软件工具给厂商进行预装。Further, the system 300 can also have a streaming tag pre-processor 340 that is configured locally or on the server side to pre-tag the layout documents and store them after streaming tagged data. Generally, the pre-processing token obtained by the stream tag pre-processor 340 can process the document by using an algorithm on the server side, or can mark the document manually or by artificially combining the algorithm. Usually, in the case of pre-processing tags, certain software tools can be provided to pre-install the manufacturer.
在本实施例中,上述流式标记数据可以通过标记预处理而得,也可以通过实时标记处理而得。但无论哪种方式,所得到的流式标记数据都应与版式文档分开进行存储。In this embodiment, the stream tag data may be obtained by label preprocessing or by real-time tag processing. Either way, the resulting streaming tag data should be stored separately from the layout document.
本实施例中,对版式文档进行流式标记进行预处理的基本过程为:首先对版式文档进行版面解析,其中的版面分析不限于算法分析、人工分析等等; 然后,对版面信息流式标记后的结果外置存储,其中的存储方式不限于云存储、数据库或本地外置文件存储。由此,通过这种预处理标记方式,就可以使得流式标记与原始的版式文档相分离。In this embodiment, the basic process of performing preprocessing on the layout document is: firstly, the layout document is parsed, and the layout analysis is not limited to algorithm analysis, manual analysis, and the like; Then, the result of the streamed content of the layout information is externally stored, and the storage method is not limited to cloud storage, database, or local external file storage. Thus, by this pre-processing mark, the stream mark can be separated from the original layout document.
本实施例中,对版式文档进行流式标记实时处理的过程与此类似,主要在于标记处理的时间及主体存在差别,在此不再展开说明。顺便提及的是,在实时标记版式文档来获得流式标记数据时,如涉及到某种文件格式的描述标准以及实时解析该标准的相应算法问题,请详细参考习知技术中的相关文献资料,在此不再赘述。In this embodiment, the process of performing stream tag real-time processing on the layout document is similar to that of the tag processing, and the difference between the time and the subject of the tag processing is not described herein. Incidentally, when real-time tagging layout documents to obtain streaming tag data, such as the description standard of a certain file format and the corresponding algorithm problem of real-time parsing of the standard, please refer to the relevant literature in the prior art for details. , will not repeat them here.
参考图3,同时结合图1和图2,上述版式文档的重排系统300的基本工作过程如下:Referring to FIG. 3, in conjunction with FIG. 1 and FIG. 2, the basic working process of the rearrangement system 300 of the above-mentioned layout document is as follows:
(1)对版式文档进行版面解析,对版面信息流式标记后的结果外置存储;其中版面分析的算法不限于算法分析、人工分析等,存储的方式不限于以文件或数据库记录形式进行云存储、本地外置文件存储。(1) Perform layout analysis on the layout document, and externally store the result after the layout information is marked; the algorithm of the layout analysis is not limited to algorithm analysis, manual analysis, etc., and the storage method is not limited to the cloud in the form of file or database record. Storage, local external file storage.
(2)系统中的电子阅读器在显示的时候可以选定自己认为最优的流式逻辑信息结构,这些选定的流式逻辑信息结构可以是本地算法实现的、本地预处理的文档、用户指定的、最新标记技术标记的流式逻辑信息结构。(2) The e-reader in the system can select the stream logic information structure that it considers to be optimal when displayed. These selected stream logic information structures can be local algorithm-implemented, locally pre-processed documents, users. The streamed logical information structure of the specified, most recently tagged technology tag.
(3)版式文档重排的过程为:先通过某种对应关系获取到和原始文档对应的流式标记数据/文档,如采用原始文档摘要方式(不限于md5、sha等各种摘要方式)及其他数据库键值对来指定对应关系;再从流式标记数据结构获取到的一个个流式标记,它们记录着流式标记及其和原始文档相关内容的对应关系,这种对应不限于文档位置偏移、对象号等等;最后通过流式标记数据找到原始文档中的对应内容,直接交给排版引擎排版显示。(3) The process of rearrangement of the layout document is: firstly, the streaming tag data/document corresponding to the original document is obtained through a certain correspondence, such as the original document summary mode (not limited to md5, sha, etc.) Other database key-value pairs to specify the corresponding relationship; and then a streamed tag obtained from the stream tag data structure, which records the correspondence between the stream tag and its related content of the original document, and the correspondence is not limited to the document position. Offset, object number, etc.; finally find the corresponding content in the original document through the stream tag data, and directly submit it to the typesetting engine for layout display.
(4)如果原始文档找不到相应外置的流式逻辑标记数据,则通过实时版面解析系统进行分析、标记,之后交给排版引擎排版并把标记结果进行外置 存储。(4) If the original document cannot find the corresponding external stream logic tag data, it is analyzed and marked by the real-time layout analysis system, and then submitted to the typesetting engine to typeset and externally mark the result. storage.
这样,本申请的版式文档的重排系统通过利用流式标记数据外置存储的方式,在不修改不破坏原始文档的情况下,可以实现有效提高版式文档的重排效果和重排效率。In this way, the rearrangement system of the layout document of the present application can effectively improve the rearrangement effect and the rearrangement efficiency of the layout document without using the method of externally storing the data by using the streaming mark data without modifying the original document.
针对上述版式文档的重排系统,此处需要进一步补充说明以下问若干题:For the rearrangement system of the above layout documents, here are some additional questions to be added:
其一,系统中可能需要识别多种不同流式逻辑信息结构的版式文档,在不识别某种流式逻辑信息结构的情况下,则认为该流式逻辑信息结构不是本地所要的流式逻辑信息结构。如果该流式逻辑信息结构是新的版本,可以在该流式逻辑信息结构里面描述版本号、是否经预处理等信息。此外,系统也可以相应通知升级阅读器版本,以便最终来识别理解该流式逻辑信息结构。First, the system may need to identify a layout document of a plurality of different flow logic information structures. If the flow logic information structure is not recognized, the flow logic information structure is not the local flow logic information. structure. If the streaming logical information structure is a new version, information such as the version number, whether it is preprocessed, or the like can be described in the streaming logical information structure. In addition, the system can also notify the upgrade reader version accordingly to finally recognize and understand the flow logic information structure.
其二,系统在选定流式逻辑信息结构时,可以选用较好的对应关系是md5,它通过标记数据来唯一对应一个原始文档。具体到内容的对应关系上,其可以使用文档位置偏移、对象号等来进行描述,其详细情况可参照前述标记实例所述。Second, when the system selects the flow logic information structure, the better correspondence is md5, which uniquely corresponds to an original document by marking the data. Specific to the correspondence of the content, it can be described using the document position offset, the object number, etc., and the details can be referred to the foregoing tag example.
其三、系统对排版引擎没有特定要求,现在市面上的成熟排版引擎(如webkit)均可视情况选用,当然也可以自行开发排版引擎。总之,排版引擎的问题不是本申请的重点,实施本申请的技术方案时可以认为排版引擎为约定俗成的即可。Third, the system has no specific requirements for the typesetting engine. Now the mature typesetting engine (such as webkit) on the market can be selected according to the situation. Of course, the typesetting engine can also be developed by itself. In short, the problem of the typesetting engine is not the focus of the present application. When implementing the technical solution of the present application, the typesetting engine can be considered as a convention.
其四,系统对实时标记引擎也没有特殊要求,只要实时标记引擎处理速度较快,效果可以接受即可。具体实施时,实时标记引擎一般通过算法来实现,好处在于算法可以不停地升级,不断地改进速度和效果。考虑到服务器端有更强大的集群计算和历史数据统计、机器学习、人工智能等能力,因而实时标记引擎也可以考虑做在服务器端,这样计算速度不成问题,还 可以借助大数据、机器学习等方式来获取更好的标记结果,只是需要用网络来传输标记结果罢了。在服务器端同时设置实时标记引擎的情况下:若网络较好,阅读器终端可以使用服务器端的标记数据;若网络不好,阅读器终端则可以使用自身的轻量级标记系统的标记数据。Fourth, the system has no special requirements for the real-time markup engine, as long as the real-time markup engine processes faster and the effect is acceptable. In the specific implementation, the real-time markup engine is generally implemented by an algorithm. The advantage is that the algorithm can be continuously upgraded, and the speed and effect are continuously improved. Considering that the server has more powerful cluster computing and historical data statistics, machine learning, artificial intelligence, etc., the real-time markup engine can also be considered on the server side, so the calculation speed is not a problem, Big data, machine learning, etc. can be used to get better markup results, just need to use the network to transfer the markup results. In the case of setting the real-time markup engine on the server side: if the network is good, the reader terminal can use the server-side tag data; if the network is not good, the reader terminal can use its own lightweight tag system tag data.
可以理解的是,这种版式文档的重排系统可以有不同的应用实例,它可以是某一网络系统,也可以是某一种单机设备(例如手机、平板电脑等移动智能终端),以下提供一种电子阅读终端作为产品实例来进行具体说明。It can be understood that the rearrangement system of the layout document can have different application examples, and it can be a certain network system or a certain single device (for example, a mobile intelligent terminal such as a mobile phone or a tablet computer). An electronic reading terminal is specifically described as a product example.
为方便起见,本申请中对于版式文档的重排系统及电子阅读终端等结构而言,分别用模块、装置或单元等词汇来表示类似的功能结构,以下简要进行描述。For the sake of convenience, in the present application, for the structure of the rearrangement system and the electronic reading terminal of the layout document, similar functional structures are denoted by words such as modules, devices or units, which are briefly described below.
请参见图4,表示本申请电子阅读终端实施例的组成框图。该电子阅读终端400可对版式文档进行重排,其具有流式标记数据获取单元410及版式文档重排单元420,其中:流式标记数据获取单元410可获取与版式文档分开存储的流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系。换而言之,该流式标记数据按照预设的流式逻辑信息结构标记对版式文档进行版面解析的结果;版式文档重排单元420根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。Referring to FIG. 4, a block diagram of a component of an embodiment of an electronic reading terminal of the present application is shown. The electronic reading terminal 400 can rearrange the layout document, and has a streaming tag data acquiring unit 410 and a layout document rearranging unit 420, wherein: the streaming tag data acquiring unit 410 can acquire the streaming tag stored separately from the layout document. Data, the streaming tag data is associated with the layout document according to a preset logical information structure. In other words, the streaming tag data marks the result of the layout parsing of the layout document according to the preset streaming logic information structure; the layout document rearranging unit 420 searches the corresponding document content in the layout document according to the streaming tag data. Rearrange the layout documents.
上述的流式标记数据获取单元410可以预先查找是否存在与版式文档对应的预处理的流式标记数据:存在与版式文档对应的流式标记数据时,获取该流式标记数据;不存在与版式文档对应的流式标记数据时,按照预设的流式逻辑信息结构对版式文档进行标记,以获得流式标记数据并进行存储。这样,无论版式文档是否已经预先标记,该电子阅读终端400都能够有效地进行重排,之后进行显示。 The above-described streaming tag data obtaining unit 410 may pre-search for whether there is pre-processed stream tag data corresponding to the layout document: when there is streaming tag data corresponding to the layout document, the streaming tag data is acquired; When the document corresponds to the stream tag data, the layout document is marked according to a preset stream logic information structure to obtain the stream tag data and store it. Thus, regardless of whether the layout document has been pre-marked, the electronic reading terminal 400 can effectively rearrange and then display.
以上对本申请的有关实施例进行了详细的描述,其中的版式文档重排技术方案与现有技术方案相比具有明显的优势,简要总结如下。The related embodiments of the present application have been described in detail above, and the layout document rearrangement technical solution has obvious advantages compared with the prior art solutions, and is briefly summarized as follows.
如前所述,现有的版式文档重排技术主要采用两种方案:一种是直接获取原始版式文档,实时版本分析、理解、标记、重排;另一种是对原始文档进行流式标记存入原始文档,显示的时候从原始文档获取流式标记进行重排。这两种现有技术方案都存在一定的缺陷,具体理由请参见前文所述。As mentioned earlier, the existing layout document rearrangement technology mainly adopts two schemes: one is to directly obtain the original layout document, analyze, understand, mark, and rearrange the real-time version; the other is to stream mark the original document. Save the original document and reorder it by getting the streaming tag from the original document. Both of the prior art solutions have certain defects, and the specific reasons are as described above.
与此不同,本申请提出的版式文档的重排方法、系统及电子阅读终端具有明显的优势,其克服决了上述两种现有技术方案在重排效果、效率方面的缺陷,解决了文档覆盖不全、文档同步难等问题。这种版式文档的重排方法、系统及电子阅读终端具有但不仅限于以下特点:Different from this, the rearrangement method, system and electronic reading terminal of the layout document proposed by the present application have obvious advantages, which overcome the defects of the above two prior art solutions in rearrangement effect and efficiency, and solve the document coverage. Incomplete, difficult to synchronize documents, etc. The rearrangement method, system and electronic reading terminal of the layout document have but are not limited to the following features:
(1)不限于以云存储、数据库或本地外置文件存储等方式外置存储流式标记数据,因而不会破坏原始文档,有助于进行专业的预处理、技术升级及数据更新等。(1) It is not limited to external storage of streaming tag data in the form of cloud storage, database or local external file storage, so it does not destroy the original document, and is helpful for professional preprocessing, technology upgrade and data update.
(2)通过某种对应关系来获取流式标记数据,并对版式文档进行解析。这种对应关系包括但不限于各种摘要或者其他方式,它们可以在外置存储器中存储和原始文档的指定关系,因而无需通过修改原始文档的方式来把原始文档和流式逻辑信息结构强关联。(2) Obtain the streaming tag data through a certain correspondence and parse the layout document. Such correspondences include, but are not limited to, various digests or other means that can store specified relationships with the original document in an external memory, thereby eliminating the need to strongly associate the original document with the streaming logical information structure by modifying the original document.
(3)流式逻辑信息结构只描述逻辑信息,不在逻辑信息结构中存储实质文档信息。通过流式逻辑信息结构中的某种对应关系,如指定文档偏移、对象号等等来和原始文档内容产生对应关系,具有数据量小的特点。(3) The flow logic information structure only describes the logical information, and does not store the substantive document information in the logical information structure. Through a certain correspondence in the flow logic information structure, such as specifying a document offset, an object number, and the like, a correspondence relationship with the original document content is generated, and the data amount is small.
(4)流式逻辑结构信息是对版式文档进行版面分析的结果,这些流式逻辑结构信息不限于用算法分析或人工分析的方式来标记这些版式文档,具体的标记形式及手段较为多样。(4) Streaming logical structure information is the result of layout analysis of layout documents. These streaming logical structure information is not limited to marking these layout documents by means of algorithm analysis or manual analysis, and the specific mark forms and means are various.
本申请虽然以较佳实施例公开如上,但其并不是用来限定本申请,任何本领域技术人员在不脱离本申请的精神和范围内,都可以做出可能的变动和修改,因 此本申请的保护范围应当以本申请权利要求所界定的范围为准。The present application is disclosed in the above preferred embodiments, but it is not intended to limit the application, and any person skilled in the art can make possible changes and modifications without departing from the spirit and scope of the present application. The scope of protection of this application shall be determined by the scope defined by the claims of the present application.
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.
1、计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何系统或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括非暂存电脑可读媒体(transitory media),如调制的数据信号和载波。1. Computer readable media including both permanent and non-persistent, removable and non-removable media may be stored by any system or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.
2、本领域技术人员应明白,本申请的实施例可提供为系统、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。 2. Those skilled in the art will appreciate that embodiments of the present application can be provided as a system, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Claims (20)

  1. 一种版式文档的重排方法,其特征在于,包括:A method for rearranging layout documents, comprising:
    获取与版式文档分开存储的流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;
    根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
  2. 如权利要求1所述的版式文档的重排方法,其特征在于,流式标记数据包括与版式文档的文档内容相对应的逻辑信息,未包括版式文档的实质内容。The rearrangement method of a layout document according to claim 1, wherein the stream tag data includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document.
  3. 如权利要求2所述的版式文档的重排方法,其特征在于,流式标记数据包括版式文档的摘要内容。The method of rearranging a layout document according to claim 2, wherein the stream tag data comprises summary content of the layout document.
  4. 如权利要求1所述的版式文档的重排方法,其特征在于,预先查找是否存在与版式文档对应的预处理的流式标记数据;The method for rearranging a layout document according to claim 1, wherein pre-finding whether there is pre-processed stream tag data corresponding to the layout document;
    若是,获取该流式标记数据;If yes, obtaining the streaming tag data;
    若否,按照预设的流式逻辑信息结构对版式文档进行标记,以获取流式标记数据并进行存储。If not, the layout document is marked according to a preset streaming logic information structure to obtain streaming tag data and store it.
  5. 如权利要求4所述的版式文档的重排方法,其特征在于,通过算法分析或人工分析或算法分析与人工分析相结合的方式来对版式文档进行版面解析,在按照预设的流式逻辑信息结构进行标记后获得相应的流式标记数据。The method for rearranging a layout document according to claim 4, wherein the layout document is parsed by algorithm analysis or manual analysis or algorithm analysis combined with manual analysis, according to preset flow logic. After the information structure is marked, the corresponding stream tag data is obtained.
  6. 如权利要求4所述的版式文档的重排方法,其特征在于,流式标记数据以文件或数据库记录的形式外置存储于服务器端或本地。The method for rearranging a layout document according to claim 4, wherein the stream tag data is externally stored in the form of a file or a database record on the server side or locally.
  7. 如权利要求1所述的版式文档的重排方法,其特征在于,通过本地选定的流式逻辑信息结构,根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The method for rearranging a layout document according to claim 1, wherein the layout document is rearranged by using a locally selected streaming logic information structure to find a corresponding document content in the layout document according to the streaming tag data. .
  8. 如权利要求7所述的版式文档的重排方法,其特征在于,本地选定的流式逻辑信息结构对应于本地算法实现的、本地预处理的、用户指定的、或最新标记技术 标记的流式逻辑信息结构。The method for rearranging a layout document according to claim 7, wherein the locally selected streaming logic information structure corresponds to a local algorithm implemented, a locally preprocessed, a user specified, or a latest marking technology. The streamed logical information structure of the tag.
  9. 如权利要求7所述的版式文档的重排方法,其特征在于,通过本地选定的流式逻辑信息结构确定的流式标记数据与版式文档的对应关系,从流式标记数据中获取全部或部分的流式标记,针对每一流式标记查找到版式文档中的对应文档内容,交由排版引擎重新排版及显示。The method for rearranging a layout document according to claim 7, wherein the correspondence between the stream tag data determined by the locally selected stream logic information structure and the layout document is obtained from the stream tag data or Part of the stream tag, find the corresponding document content in the layout document for each stream tag, and then re-type and display it by the typesetting engine.
  10. 一种版式文档的重排系统,其特征在于,包括:A rearrangement system for a layout document, characterized in that it comprises:
    流式标记析取器,被配置为获取与流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;The stream tag extractor is configured to acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to the preset logical information structure;
    存储器,被配置为存储流式标记数据,该流式标记数据与版式文档分开存储;a memory configured to store streaming tag data, the stream tag data being stored separately from the layout document;
    排版引擎,被配置为根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The typesetting engine is configured to rearrange the layout document based on the corresponding document content in the layout document according to the streaming tag data.
  11. 如权利要求10所述的版式文档的重排系统,其特征在于,所述流式标记析取器包括流式标记查找模块、流式标记读取模块及实时标记引擎模块,其中:The rearrangement system of a layout document according to claim 10, wherein the streaming tag extractor comprises a streaming tag lookup module, a streaming tag reading module, and a real-time tag engine module, wherein:
    所述流式标记查找模块,被配置为预先查找是否存在与版式文档对应的流式标记数据;The streaming tag lookup module is configured to pre-find whether there is streaming tag data corresponding to the layout document;
    所述流式标记读取模块,被配置为存在与版式文档对应的流式标记数据时,获取该流式标记数据;The streaming tag reading module is configured to acquire the streaming tag data when there is streaming tag data corresponding to the layout document;
    所述实时标记引擎模块,被配置为不存在与版式文档对应的流式标记数据时,按照预设的流式逻辑信息结构对版式文档进行标记,以获得流式标记数据并进行存储。The real-time markup engine module is configured to mark the layout document according to a preset flow logic information structure when the flow mark data corresponding to the layout document does not exist, to obtain the flow mark data and store the file.
  12. 如权利要求11所述的版式文档的重排系统,其特征在于,所述实时标记引擎模块,被配置为通过算法分析或人工分析或算法分析与人工分析相结合的方式来对版式文档进行版面解析,在按照预设的流式逻辑信息结构进行标记后获得相应的流式标记数据。 The rearrangement system of a layout document according to claim 11, wherein the real-time markup engine module is configured to perform layout on a layout document by means of algorithm analysis or manual analysis or combination of algorithm analysis and manual analysis. Parsing, the corresponding stream tag data is obtained after marking according to the preset stream logic information structure.
  13. 如权利要求11所述的版式文档的重排系统,其特征在于,所述实时标记引擎模块被配置在本地或服务器端。The rearrangement system of a layout document according to claim 11, wherein the real-time markup engine module is configured at a local or server end.
  14. 如权利要求10所述的版式文档的重排系统,其特征在于,所述存储器为云存储器或本地存储器,以文件或数据库记录的形式外置存储流式标记数据。The rearrangement system of a layout document according to claim 10, wherein the memory is a cloud storage or a local storage, and the streaming tag data is externally stored in the form of a file or a database record.
  15. 如权利要求10所述的版式文档的重排系统,其特征在于,所述排版引擎被配置为,通过本地选定的流式逻辑信息结构,根据流式标记数据查找版式文档中对应的文档内容,以便来对版式文档进行重排。The rearrangement system of a layout document according to claim 10, wherein the typesetting engine is configured to search for a corresponding document content in the layout document according to the streaming tag data through a locally selected streaming logic information structure. In order to rearrange the layout documents.
  16. 如权利要求15所述的版式文档的重排系统,其特征在于,所述排版引擎被配置为,通过本地选定的流式逻辑信息结构确定的流式标记数据与版式文档的对应关系,从流式标记数据中获取全部或部分的流式标记,针对每一流式标记查找到版式文档中的对应文档内容,交由排版引擎重新排版及显示。The rearrangement system of a layout document according to claim 15, wherein the typesetting engine is configured to correspond to the relationship between the stream tag data and the layout document determined by the locally selected streaming logic information structure. All or part of the stream tag is obtained in the stream tag data, and the corresponding document content in the layout document is found for each stream tag, and is re-formatted and displayed by the typesetting engine.
  17. 如权利要求10所述的版式文档的重排系统,其特征在于,包括流式标记预处理器,被配置为对版式文档预先进行标记,在得到流式标记数据后进行存储。A rearrangement system for a layout document according to claim 10, comprising a streaming mark preprocessor configured to pre-tag the layout document and store the streamed tag data.
  18. 如权利要求17所述的版式文档的重排系统,其特征在于,所述流式标记预处理器被配置在本地或服务器端。A rearrangement system for a layout document according to claim 17, wherein said streaming tag preprocessor is configured locally or on the server side.
  19. 一种电子阅读终端,可对版式文档进行重排,其特征在于,所述电子阅读终端被配置为:An electronic reading terminal capable of rearranging a layout document, wherein the electronic reading terminal is configured to:
    获取与版式文档分开存储的流式标记数据,该流式标记数据根据预设的逻辑信息结构与版式文档之间建立对应关系;以及Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;
    根据流式标记数据查找版式文档中对应的文档内容,来对版式文档进行重排。The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
  20. 如权利要求19所述的电子阅读终端,其特征在于,所述电子阅读终端被配置为:The electronic reading terminal according to claim 19, wherein said electronic reading terminal is configured to:
    预先查找是否存在与版式文档对应的预处理的流式标记数据;Pre-fetching whether there is pre-processed streaming tag data corresponding to the layout document;
    存在与版式文档对应的流式标记数据时,获取该流式标记数据;以及 Obtaining the streaming tag data when there is streaming tag data corresponding to the layout document;
    不存在与版式文档对应的流式标记数据时,按照预设的流式逻辑信息结构对版式文档进行标记,以获得流式标记数据并进行存储。 When there is no streaming tag data corresponding to the layout document, the layout document is marked according to a preset streaming logic information structure to obtain the stream tag data and store it.
PCT/CN2015/081626 2014-07-17 2015-06-17 Layout document rearrangement method and system, and electronic reading terminal WO2016008347A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410341665.9A CN105446946B (en) 2014-07-17 2014-07-17 Rearrangement method, system and the electronic reading terminal of format document
CN201410341665.9 2014-07-17

Publications (1)

Publication Number Publication Date
WO2016008347A1 true WO2016008347A1 (en) 2016-01-21

Family

ID=55077898

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081626 WO2016008347A1 (en) 2014-07-17 2015-06-17 Layout document rearrangement method and system, and electronic reading terminal

Country Status (3)

Country Link
CN (1) CN105446946B (en)
HK (1) HK1221296A1 (en)
WO (1) WO2016008347A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625643A (en) * 2019-02-28 2020-09-04 阿里巴巴集团控股有限公司 Data processing method and device and reading object processing method
CN113408251A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Layout document processing method and device, electronic equipment and readable storage medium
WO2023284588A1 (en) * 2021-07-13 2023-01-19 北京字节跳动网络技术有限公司 Electronic text generation method and apparatus, device, and medium

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670160B (en) * 2017-10-13 2021-04-09 北大方正集团有限公司 Typesetting processing method and device for files
CN108492172A (en) * 2018-03-13 2018-09-04 四川享宇金信金融服务外包有限公司 loan material packaging method and device
CN109408778A (en) * 2018-10-19 2019-03-01 成都信息工程大学 A kind of document structure tree control system and method based on visual configuration
CN109582934B (en) * 2018-12-04 2023-02-10 万兴科技股份有限公司 Format document conversion method and device
CN111611776B (en) * 2020-05-22 2023-07-25 北京信息科技大学 Method and device for compatible edition flow document content and supporting synchronous reading
CN112883249B (en) * 2021-03-26 2022-10-14 瀚高基础软件股份有限公司 Layout document processing method and device and application method of device
CN112988668B (en) * 2021-03-26 2022-10-14 瀚高基础软件股份有限公司 PostgreSQL-based streaming document processing method and device and application method of device
CN113239661A (en) * 2021-04-30 2021-08-10 北京方正阿帕比技术有限公司 Edition-stream combination based multi-terminal electronic document editing method and device
CN113221507B (en) * 2021-05-28 2022-02-11 掌阅科技股份有限公司 Document editing operation synchronization method, computing device and storage medium
CN113569532B (en) * 2021-09-22 2022-01-25 北京仁和汇智信息技术有限公司 HTML editing method and device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308488A (en) * 2008-06-05 2008-11-19 北大方正集团有限公司 Document stream type information processing method based on format document and device therefor
CN101887413A (en) * 2009-05-14 2010-11-17 北大方正集团有限公司 Structure processing method and system of plate type table
CN101923723A (en) * 2009-06-16 2010-12-22 汉王科技股份有限公司 Method for realizing display of electronic document
CN102591849A (en) * 2011-01-07 2012-07-18 北大方正集团有限公司 Document format conversion method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8196029B1 (en) * 2000-06-21 2012-06-05 Microsoft Corporation System and method for enabling simultaneous multi-user electronic document editing

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101308488A (en) * 2008-06-05 2008-11-19 北大方正集团有限公司 Document stream type information processing method based on format document and device therefor
CN101887413A (en) * 2009-05-14 2010-11-17 北大方正集团有限公司 Structure processing method and system of plate type table
CN101923723A (en) * 2009-06-16 2010-12-22 汉王科技股份有限公司 Method for realizing display of electronic document
CN102591849A (en) * 2011-01-07 2012-07-18 北大方正集团有限公司 Document format conversion method and device

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625643A (en) * 2019-02-28 2020-09-04 阿里巴巴集团控股有限公司 Data processing method and device and reading object processing method
CN111625643B (en) * 2019-02-28 2023-06-20 阿里巴巴集团控股有限公司 Data processing method and device and reading object processing method
CN113408251A (en) * 2021-06-30 2021-09-17 北京百度网讯科技有限公司 Layout document processing method and device, electronic equipment and readable storage medium
CN113408251B (en) * 2021-06-30 2023-08-18 北京百度网讯科技有限公司 Layout document processing method and device, electronic equipment and readable storage medium
WO2023284588A1 (en) * 2021-07-13 2023-01-19 北京字节跳动网络技术有限公司 Electronic text generation method and apparatus, device, and medium

Also Published As

Publication number Publication date
CN105446946A (en) 2016-03-30
CN105446946B (en) 2019-08-02
HK1221296A1 (en) 2017-05-26

Similar Documents

Publication Publication Date Title
WO2016008347A1 (en) Layout document rearrangement method and system, and electronic reading terminal
WO2019200783A1 (en) Method for data crawling in page containing dynamic image or table, device, terminal, and storage medium
US8788529B2 (en) Information sharing between images
US9098505B2 (en) Framework for media presentation playback
CN104735468B (en) A kind of method and system that image is synthesized to new video based on semantic analysis
US8577882B2 (en) Method and system for searching multilingual documents
US10515142B2 (en) Method and apparatus for extracting webpage information
US20140101527A1 (en) Electronic Media Reader with a Conceptual Information Tagging and Retrieval System
EP3109775A1 (en) Multimedia content providing method and device
CN109492177B (en) web page blocking method based on web page semantic structure
CN108334508B (en) Webpage information extraction method and device
CN109582945A (en) Article generation method, device and storage medium
KR20150015062A (en) Apparatus for recommending image and method thereof
CN107748780B (en) Recovery method and device for file of recycle bin
US20160103915A1 (en) Linking thumbnail of image to web page
US10867119B1 (en) Thumbnail image generation
US20220309249A1 (en) Data Processing Method, Apparatus, Electronic Device, and Computer Storage Medium
WO2015154680A1 (en) File processing method, device, and network system
US20180330156A1 (en) Detection of caption elements in documents
US20230297618A1 (en) Information display method and electronic apparatus
US20120192046A1 (en) Generation of a source complex document to facilitate content access in complex document creation
CN104090878A (en) Multimedia checking method, terminal, server and system
US20160283444A1 (en) Human input to relate separate scanned objects
US20220301285A1 (en) Processing picture-text data
Jones et al. MementoEmbed and Raintale for web archive storytelling

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15822279

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15822279

Country of ref document: EP

Kind code of ref document: A1