WO2016008347A1

WO2016008347A1 - Layout document rearrangement method and system, and electronic reading terminal

Info

Publication number: WO2016008347A1
Application number: PCT/CN2015/081626
Authority: WO
Inventors: 刘孙亮
Original assignee: 阿里巴巴集团控股有限公司; 刘孙亮
Priority date: 2014-07-17
Filing date: 2015-06-17
Publication date: 2016-01-21
Also published as: CN105446946A; CN105446946B; HK1221296A1

Abstract

A layout document rearrangement method, which comprises: acquiring stream-type flag data stored separately from a layout document, wherein a correlation is established between the stream-type flag data and the layout document according to a preset logic information structure (S110); and according to the stream-type flag data, searching for the corresponding document content in the layout document to conduct rearrangement on the layout document (S120). A layout document rearrangement system and an electronic reading terminal, wherein by means of the layout document rearrangement system, the flagged stream-type flag data is stored separately from the layout document, and the layout document is parsed according to the stream-type flag data during rearrangement, so that the stream-type flag data may not influence an original document, and therefore, the rearrangement effect and rearrangement efficiency of the layout document can be effectively improved in the case where the original document is not modified or damaged; and at the same time, the flagged stream-type flag data can be shared by multiple users and multiple terminals easily, thereby contributing to the technological upgrade of electronic devices.

Description

Rearrangement method, system and electronic reading terminal of layout document

Technical field

The present application relates to digital reading technology, and in particular, to a rearrangement method and system for a layout document and an electronic reading terminal.

Background technique

With the booming Internet and rising hardware levels, electronic documents are gradually replacing traditional books and paper documents. At the same time, people's reading habits are no longer limited to traditional paper publications, and the proportion of electronic reading (or digital reading) is gradually increasing. Due to the popularity of portable electronic devices such as mobile phones and e-books, people can use the time of debris in their lives to read e-books. For example, when they take the bus or subway, they can read e-books. Under the huge market demand, higher requirements are also put forward for the information provision and processing methods of electronic reading.

As we all know, electronic documents are divided into streaming documents and layout documents. The basic unit of a streaming document is a character, which is a collection of ordered characters, the length of which is the number of characters contained in the file. For example, a Word file is a streaming document that mainly records streaming information, and some of the finalized objects (such as images floating, etc.) can also be added. As an absolute description, the layout document clearly records the position and size of each document in a custom coordinate system, so that the printed results of the document are consistent with the structure viewed on the computer, and have any computer environment. The consistency of the display is displayed, thus ensuring the true reproduction of the original document. Documents such as pdf, xps, ceb, etc. are typical layout documents, which have the characteristics of what you see is what you get (WYSIWYG), so it is very suitable for document distribution, dissemination and Archive.

Streaming documents do not have the e-reading barriers of e-reading, and mature typography engines are now available. For layout documents, it is often inconvenient to read under small screen devices because of the fixed layout. If the content of a page of a layout document is displayed on the screen of the device, it will be too small for text, images, etc. Clearly and many other restrictions; if you zoom in or out on the page, it will affect the user's reading experience. This requires the electronic reading terminal to break through the fixed limitations of the layout document display, so that the layout can be re-typed according to the content of the layout document, and finally the user has a better reading experience.

For the rearrangement of layout documents, the industry has introduced various solutions. There are two main solutions for implementing layout document reflow:

A rearrangement scheme for an existing layout document is: in order to remedy the need to read an electronic document on various electronic devices, mark the streaming display information of the layout when the corresponding layout document is created, and store the marked data in the original document. Released together. This rearrangement scheme is based on a precisely positioned layout description in the layout document, in which sufficient streaming logic structure information is added to support streaming applications such as rearrangement and extraction of table structures. For example, Adobe introduced Logical Structure in the PDF 1.3 specification introduced in 1999, and introduced tagged PDF in the PDF 1.4 product introduced in 2001 to improve the expression of streaming information, and then published in its MARS. A structured description of this part of the information is done using XML in the document format. This xml markup language can theoretically describe all formats, such as the new version of Word, Docx is described based on xml. In addition, Founder Apabi has defined a multi-layer nestable tree-like logical structure containing articles, chapters, paragraphs, fragments, and blocks in the CEBXv1.1 specification released in 2010, where the blocks are directly referenced on the layout page. Layout block or primitive (v1.2) to achieve data sharing, which can support real-time layout and screen adaptive display on electronic reading devices such as mobile terminals. The specific standard manual and software can refer to the official of Founder Apabi. A related introduction on the website (http://www.apabi.cn/download/index.html).

Another rearrangement scheme of the existing layout document is: when opening a layout document, the layout information is parsed by some preset algorithms and rules, and according to the parsing result, the layout engine is given to perform real-time weighting. Row, that is, screen adaptive display by real-time typesetting. The real-time rearrangement method of such a layout document is currently widely used in various electronic reading terminals.

Both of the above schemes can rearrange the layout documents, but they all have certain problems, as follows:

In the first rearrangement scheme, the document content and the tag data data are located in the same file, and data synchronization of the layout electronic document without the over-display information may be difficult. If you find that the original document is marked with an error, you need to modify the document again, and the original document may be damaged when you modify the document. Especially when a large number of documents have been archived, synchronizing documents in this way may lead to more adverse consequences.

The second rearrangement scheme parses the layout document in real time when the document is opened, and the electronic reading terminal analyzes, marks, rearranges, and the like in real time through an algorithm every time, so that it is time consuming and power consuming. In addition, the rearrangement scheme relies on the reliability of an algorithm, and thus there may be a problem that the rearrangement effect is not good.

It can be seen that there is still room for improvement in the rearrangement technology of the existing layout documents, and it is necessary to propose a technical solution for rearranging the layout documents which effectively improves the rearrangement effect and the rearrangement efficiency.

Summary of the invention

In view of the deficiencies of the prior art, the purpose of the present application is to provide a rearrangement method, system and electronic reading terminal for a layout document, which can effectively improve the rearrangement effect and the rearrangement efficiency.

To solve the above technical problem, the present application provides a method for rearranging a layout document, the method comprising:

Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;

The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.

Optionally, the streaming tag data includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document.

Optionally, the streaming tag data includes summary content of the layout document.

Optionally, pre-finding whether there is pre-processed stream tag data corresponding to the layout document;

If yes, obtaining the streaming tag data;

If not, the layout document is marked according to a preset streaming logic information structure to obtain streaming tag data and store it.

Optionally, the layout document is parsed by algorithm analysis or manual analysis or algorithm analysis combined with manual analysis, and the corresponding stream tag data is obtained after marking according to the preset flow logic information structure.

Optionally, the streaming tag data is externally stored on the server side or locally in the form of a file or database record.

Optionally, the layout document is rearranged by using the locally selected streaming logic information structure to find the corresponding document content in the layout document according to the streaming tag data.

Optionally, the locally selected streaming logical information structure corresponds to a streamed logical information structure implemented by a local algorithm, locally pre-processed, user-specified, or newly tagged with a technology tag.

Optionally, all or part of the stream tag is obtained from the stream tag data by using the corresponding relationship between the stream tag data determined by the locally selected stream logic information structure and the layout document, and the layout is found for each stream tag. The corresponding document content in the document is re-formatted and displayed by the layout engine.

Correspondingly, the present application also provides a rearrangement system for layout documents, the system comprising:

The stream tag extractor is configured to acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to the preset logical information structure;

a memory configured to store streaming tag data, the stream tag data being stored separately from the layout document;

The typesetting engine is configured to rearrange the layout document based on the corresponding document content in the layout document according to the streaming tag data.

In addition, the present application further provides an electronic reading terminal, which can rearrange the layout document, and the electronic reading terminal is configured to:

Compared with the prior art, the present application utilizes the method of externally storing the streaming tag data, without modifying or breaking. In the case of a bad original document, it is possible to effectively improve the rearrangement effect and rearrangement efficiency of the layout document. Specifically, the present application can adapt to the layout size by real-time streaming logic marking and pre-processing mark on the layout document. Rearrangement display, which can achieve better typographic effect, and can shorten the reflow time very well; at the same time, through the layout analysis and externalization of the streaming logic information mark of the layout document, a large number of existing missing streams can be solved. The problem of rearrangement of the layout document of the tagged data does not need to worry about the damage caused by the modification of the original document and the inconsistency of subsequent documents. In addition, the layout document in this application only needs to be marked once, that is, it can be used for multi-user and multi-terminal. Sharing, from the perspective of the entire system, it not only consumes less power, but also helps to upgrade the technology.

DRAWINGS

Various other advantages and benefits will become apparent to those skilled in the art from a The drawings are only for the purpose of illustrating the preferred embodiments and are not intended to be limiting. Throughout the drawings, the same reference numerals are used to refer to the same parts. In the following figures:

1 shows a flow chart of a rearrangement method of a layout document according to an embodiment of the present application;

2 is a more complete example of a rearrangement method according to the layout document shown in FIG. 1;

3 is a block diagram showing the composition of a rearrangement system of a layout document according to an embodiment of the present application;

FIG. 4 shows a block diagram of the composition of an electronic reading terminal according to an embodiment of the present application.

detailed description

The following embodiments of the present application describe a rearrangement method, system, and electronic reading terminal of a layout document, which can improve the layout document without modifying the original document without modifying the original storage by means of streaming tag data external storage. The rearrangement effect and the rearrangement efficiency are described in detail below with reference to the accompanying drawings and specific embodiments.

In order to better understand the technical solutions of the embodiments of the present application, related terms are first explained.

1, logical information structure

Logical information structure refers to the logical description relationship of document organization information, such as structured information such as title, paragraph, formula, table, comment, etc. to specify the logical structure relationship between these elements (such as a picture is centered, What is its picture title, etc.), these logical structure relationships constitute an ordered arrangement.

The logical information structure in the embodiment of the present application specifically specifies the relationship between the externalized storage document and the original document, for example, it can specify a paragraph in the externalized storage document, and how many spans in the paragraph (non-splitable) Text, such as a string that you don't want to be broken when displayed, and what text is in each span. Of course, there is no span, but directly explain what text is in this paragraph, each text corresponds to the coordinates of the original layout document or the binary stream offset position of the document.

It can be seen that the logical information structure of the embodiment of the present application is different from the logical structure of the pure layout document information. In the pure-format document logic structure, it only describes how many characters, images, and graphics are displayed at which coordinate position of the page. This pure-form document logical information structure is for the entire document, because it emphasizes the presentation on the layout rather than the logical information, which may lead to the disorder relationship between the parts of the document, or it may be part of messy.

2, adaptive rendering layout information

The above logical information structure describes the logical structure of the document structure and the layout, and the corresponding flow type tag data can be obtained by identifying and marking the document according to the logical information structure. In other words, the stream tag data includes the result of tagging the document structure information in the layout document and/or the document layout information in the layout document, wherein the document layout information is an adaptive presentation layout information.

According to the adaptive presentation layout information, the electronic reading terminal can reconstruct the layout of the entire document, and finally the result of the rearrangement display of the layout document matches the size of the screen of the electronic reading terminal. For example, the part of the adaptive rendering layout that describes the layout document is the title, which is the paragraph (there may be 1000 words in the paragraph), and so on. According to this description, the corresponding content of the display can be adjusted according to the size of the screen on different reading devices: for example, 900 characters may be displayed on one screen on the computer, that is, the aforementioned paragraph adaptively displays a little more than one screen; on the mobile phone, Only one text is displayed on one screen, that is, the aforementioned paragraph is adaptively displayed as 10 screens. But no matter how adaptive display, some content will not be displayed in different screens. For example, the word "and" can be a span, and it can't be broken anyway.

According to the above logical information structure, the layout document can be parsed to obtain the stream tag data, and the stream tag data can also be identified to reconstruct the layout document. The concept of the embodiment of the present application is obtained: acquiring the streaming tag data stored separately from the layout document, the streaming tag data establishing a correspondence relationship with the layout document according to the preset logical information structure; searching for the layout according to the streaming tag data The corresponding document content in the document to rearrange the layout document. The following is further elaborated in conjunction with the drawings.

Numerous specific details are set forth in the description below in order to provide a thorough understanding of the application. However, the present application can be implemented in many other ways than those described herein, and those skilled in the art can make similar promotion without departing from the scope of the present application, and thus the present application is not limited by the specific embodiments disclosed below. Referring to Figure 1, there is shown one embodiment of a rearrangement method in accordance with a layout document of the present application. The concept of the method focuses on the storage externalization of the stream tag data, so as to determine the document structure information and the document layout information in the layout document according to the correspondence between the stream tag data and the layout document, thereby achieving better implementation. Rearrangement of layout documents. This embodiment adopts the method of streaming tag data external storage, and can effectively improve the rearrangement efficiency and rearrangement effect of the layout document without modifying the original document without destroying the original document, which is described in further detail below.

The layout document of the embodiment of the present application may refer to the entire layout document, and may also refer to one or several pages in the layout document. This type of document is in absolute description, which clearly records the position and size of each document in a custom coordinate system, so that the printed results of the document are consistent with the structure viewed on the computer, realizing what you see is what you get. effect.

As described above, the streaming tag data of the embodiment of the present application includes document structure information in the layout document and/or document layout information in the layout document. The document structure information includes chapter information of the document, the internal content order of each chapter, and the sequence of each element in the content block. The document layout information includes information for determining the final rendering effect of the primitives and other elements when the corresponding layout of the layout document, the layout information of the primitive itself or the content block itself, and between the primitives in the same content block or between the content blocks. The relationship, for example, the text setting mode of the specified picture or the column information of a plurality of content blocks. The layout rearrangement here refers to reorganizing the layout according to certain rules when the layout size or layout content changes. Each element and other elements in the form to form the result of the layout. In addition, the streaming tag data of the embodiment of the present application may further include reading cue information. In addition to the reading order provided by the above document structure information, the stream tag data may also be provided according to additional reading order information provided according to specific needs. The reading clue information is optional reading order information provided to the user.

It should be noted that the streaming tag data of the embodiment of the present application includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document. In particular, such streaming tag data may include summary information of a layout document, such as summary information of a layout document obtained based on the MD5 or SHA algorithm. The layout document is marked in a predetermined streaming logical information structure in such a manner that the obtained streaming mark data can achieve a strong association between the streaming mark data and the layout document.

The embodiment of the present application analyzes the layout documents that need to be rearranged by using a certain logical labeling algorithm, and can effectively extract the words in the layout document to form words, which words can form paragraphs, which words are superscripts or subscripts, which The object is a graph, which text is a graph title, etc., thereby enabling a full and effective description of the layout document, which ultimately facilitates the rearrangement of the layout document. In contrast, the usual layout document only describes the position of each text, graphic or image in the page, and does not logically describe the relationship between these objects, which will affect the rearrangement efficiency and display effect of the layout document.

The specific logical information structure of the stream tag data in the embodiment of the present application may be determined by referring to some prior art standards, such as the relevant technical manual of Founder Apabi; or the related logical information structure may be re-customized to ensure that there is a comparison. Good compatibility is good, no longer repeat them.

Please refer to FIG. 1 , which is an embodiment of a rearrangement method of a layout document of the present application. In this embodiment, by using the externally stored streaming tag data to rearrange the layout document, the rearrangement effect and the rearrangement efficiency of the layout document can be effectively improved without modifying the original document without destroying. The specific technical solutions of the rearrangement method of the layout document described in the embodiment of the present application are described in detail below.

At step S110, streaming tag data stored separately from the layout document is acquired. The streaming tag The data is based on a preset logical information structure and a layout document, such as the original document summary method and other database key value pairs to establish a correspondence. In other words, the streaming tag data is the result of the layout parsing of the layout document according to the preset streaming logic information structure; by using the streaming tag data, the structure information and layout of the corresponding document content in the layout document can be found. information. Since the stream tag data record has rich document structure information, document layout information, and the like in the layout document, a strong association relationship can be established well with the original layout document (abbreviated as the original document). In this way, according to the optimized stream tag data, not only the corresponding document content in the original document can be searched, but also the structural information and the layout information of the document content can be determined, thereby facilitating the rearrangement of the entire layout document conveniently.

At step S120, the layout document is rearranged according to the corresponding content of the document in the layout document according to the streaming tag data. Through this step, the rearrangement of the layout document can be completed, and finally, the rearranged layout document (referred to as rearrangement document) adapted to the display interface such as the electronic reading terminal can be obtained. Since the streaming tag data in the embodiment of the present application describes the layout document more fully and effectively, it helps to improve the rearrangement display effect of the layout document.

In order to further understand the technical solution of the rearrangement method of the layout document of the embodiment of the present application, step S110 and step S120 are further described in detail as follows.

At step S110, streaming tag data associated with the layout document in a streaming logical information structure is obtained, the streaming tag data being stored separately from the layout document. The embodiment of the present application obtains the streaming tag data by means of a pre-processing method or a real-time tag layout document, but the pre-processed stream tag data or the real-time tag stream tag data can be stored separately from the layout document. The original document is then rearranged by looking up the layout document content corresponding to the streaming tag data.

The step S110 may include two specific steps: one is to find out whether there is pre-processed stream tag data associated with the layout document, and the other is to mark the layout document in real time without the pre-processed stream tag data. . Specifically, the basic process of step S110 is: pre-finding whether there is pre-processed streaming tag data corresponding to the layout document; if yes, acquiring the streaming Mark data; if it does not exist, the layout document is identified and marked according to the preset streaming logic information structure to obtain the stream tag data and store it. Specifically, these streaming tag data can be externally stored in a file or database record on a server side (for example, a cloud server) or locally, so that it can be conveniently stored separately from the layout document.

In step S110, the basic process of acquiring the streaming tag data in the pre-processing manner or the real-time tagging manner is: performing parsing and marking on the layout document according to the preset streaming logical information structure, and collecting all the tag information obtained. Stream tag data. It can be understood that when the layout document is marked according to the above manner, the layout document can be parsed by algorithm analysis, manual analysis, or combination of algorithm analysis and manual analysis, and according to a preset flow logic information structure. Mark to get the corresponding stream tag data. A correspondence between the stream tag data and the original document is established according to a preset flow logic information structure.

A pdf layout document and its streaming tag data are listed below to illustrate the technical solution of the method described in the present application as a specific example of streaming tag data externalization.

(1) The original layout document is as follows:

A.pdf

2 0 obj<</Type/Page

/Contents 30R

...>>

Endobj

3 0 obj<</Length...>>

Stream...

...(Here is some text1)...// corresponds to a word, coordinates x=100, y=100, the content is: Hello,

...(Here is some text2)...// corresponds to a word, coordinates x=110, y=200, the content is: title

...(Here is some text3)...// corresponds to a word, coordinates x=130, y=100, the content is: China.

...endstream

Endobj

(2) The streaming tag data is as follows:

A.mark,

<SrcDoc> "xxx"</SrcDoc>//xxx corresponds to the abstract of the entire document a.pdf

<Head>//a title

<obj=3, offset=xxx, length=xxx>//The corresponding content is: title

</Head>

<P>//One paragraph

<obj=3,offset=xxx,length=xxx>//The corresponding content is: Hello,

<obj=3, offset=xxx, length=xxx>// The corresponding content is, China.

</P>

It can be seen that in the tag instance of the layout document, the layout document is parsed according to the preset streaming logic information structure, and the tag data set of the parsing result is used as the stream tag data of the layout document. The content of each part of the document is marked with rich stream structure information and layout information, so that it can correspond to the original layout document better, and finally can be conveniently used for rearrangement display.

It should be noted that the stream tag data described in this embodiment may not be limited to the description manner of the above tag instance, and it may adopt a binary description, an xml description, and the like. In fact, the embodiments of the present application do not focus on specific description standards of a certain file format, and thus the detailed description of how to form the stream tag data is not described.

In step S120, the corresponding document content in the layout document is searched according to the streaming tag data, and the structural information and the layout information of the document content are identified (for example, determining that some document content is a text, a graphic or a table, and determining between them The relationship, according to which the corresponding typesetting scheme is determined, to rearrange the layout documents. The rearranged layout document can perform real-time layout and screen adaptive display on an electronic device such as a mobile terminal, thereby effectively improving the user's reading experience. The screen here The adaptive display includes acquiring screen size information of the device, and adaptively formatting the document content according to the screen size information.

The rearrangement of the layout document here includes the process of reorganizing the primitives and other elements in the layout according to certain rules when the layout size or the layout content changes, forming a layout presentation result. The embodiment of the present application does not require specific requirements for the typesetting engine. The mature typesetting engine (such as webkit) on the market can be selected as a selection object. Of course, the user can also independently develop other suitable typesetting engines, and the description will not be repeated here.

As described above, the embodiment of the present application establishes a correspondence between the streaming tag data and the layout file by using a preset streaming logic information structure. According to the streaming logic information structure, the layout document can be pre-marked or marked in real time, thereby obtaining corresponding stream tag data. Pre-marking or real-time tagging of a layout file can be understood as a process of parsing a layout document. According to the streaming logic information structure, the layout document may also be reconstructed according to the marked stream tag data, and the document structure information and the layout information in the specific tag data are used to find the corresponding document content in the layout document, and according to the current The display requirements (such as font size requirements, adaptive display requirements according to screen size) are displayed in a typographical manner. Simply put, refactoring a layout document can be understood as a process of decomposing streaming tag data.

Since the embodiment of the present application establishes a correspondence between the layout document and the streaming tag data through a certain flow logic information structure, the streaming logic information structure and the reflowing process at the time of labeling should be guaranteed during re-typesetting. The logical information structure remains matched. It can be understood that the preset streaming logic information structure when marking the layout document and the streaming logic information structure when rearranging may actually have a mismatch, so when the typesetting engine selects the streaming logic information structure, Streaming logic information structures corresponding to local algorithm implementations, locally pre-processed, user-specified, or up-to-date tagging technology tags should generally be prioritized.

In this embodiment, after a certain flow logic information structure is locally selected by the above rule, if the logical information structure of the flow logic information structure and the mark layout document can match, the layout is When the document is rearranged, the layout document can be parsed through the locally selected streaming logic information structure, that is, the corresponding document content in the layout document is searched according to the stream tag data, and the structural information of the content of the document is further identified and Layout information, ultimately to reorder the layout documents.

It can be seen that for the pre-processed stream tag data, if the locally selected stream logic information structure matches the stream logic information structure at the time of the tag, the reflow type can be established between the stream tag data and the layout document. A valid correspondence, which is consistent with the correspondence between the tagged data and the layout document at the time of marking. In this way, all or part of the streaming tags (records) can be obtained from the streaming tag data when the layout document is rearranged, so that the corresponding document content in the layout document can be found for each streaming tag and recognized. The structural information and layout information of the content can then be re-formatted and displayed by the layout engine.

It can be understood that for a locally selected streaming logic information structure, it is generally necessary to determine a certain parsing algorithm to rearrange the layout document. These parsing algorithms may have different schemes, but since the present application does not focus on how to parse a certain system algorithm in real time, the corresponding parsing algorithm is not specifically described.

Referring to FIG. 2, a more complete example of the rearrangement method according to the layout document shown in FIG. This example mainly includes the following steps 210 to 250, which are briefly described below.

Step S210, receiving a layout document. The layout document may be rearranged according to current display conditions (eg, according to factors such as the size of the display screen).

In step S220, it is found whether there is streaming tag data corresponding to the layout document.

It is found whether there is streaming tag data corresponding to the layout document, that is, whether there is pre-processed streaming tag data, which is obtained by performing stream tag pre-processing on the layout document. The resulting streaming tag data can be stored separately from the layout document. If there is pre-processed stream tag data, the process proceeds to step S230, and if there is no pre-processed stream tag data, the process proceeds to step S240.

Step S230, acquiring the pre-processed stream tag data as a solution for rearrangement of the layout document Analyze the elements to achieve rearrangement of the layout document.

Step S240: Mark the layout document in real time to obtain the streaming tag data and store it, so as to update the stream tag data of the layout document.

Step S250: Searching the corresponding document content in the layout document according to the acquired streaming tag data, and identifying the structural information and the layout information of the document content, so as to implement rearrangement of the layout data.

FIG. 2 is a complete example of the rearrangement method of the layout document shown in FIG. 1, and the basic context of the technical solution described in the present application can be clearly shown, and most of the details are illustrated in FIG. The content described in Figure 2 is not exhaustive, please refer further to the description of Figure 1.

As can be seen from the description of FIG. 1 and FIG. 2, the present application is directed to the shortcoming of the existing layout document rearrangement technology, and adopts an external storage method for streaming data, that is, by analyzing and externalizing the flow logic information mark of the layout. It can solve a large number of reordering problems that have no streaming tag data, and there is no need to worry about the damage caused by the original document and the inconsistency of subsequent document flooding. At the same time, the present application provides a more complete and effective description of the layout document by real-time streaming logic marking and pre-processing mark on the layout, thereby obtaining better typesetting effect and shortening the rearrangement time. In addition, the present application adopts the method of streaming tag data external storage, and records the tag type, the electronic reading system version, the server identification system version, the manual identification version and the like in the stream tag data, so that the layout document only needs to be marked once. That is, it can be shared by multiple users and multiple terminals, which also helps to upgrade the electronic reading system.

It should be noted that the stream tag data in the present application can generally be marked by an algorithm, and the tag result needs to be externally stored for convenience for the next use. Of course, this marking process can also be marked by manual means or by combining algorithms with labor. However, whether the algorithm mark is used, the manual mark is used, or the manual and the algorithm are combined to mark the layout document, the embodiment of the present application should obtain the flow mark data according to a specified standard. However, the embodiments of the present application are not limited to a specific standard, and the stream tag data in the embodiment of the present application may be used in many different ways. The logical information describes the standard. They can be described either in xml or in binary. The result of these tags can also be stored directly in the database or cloud server.

The above describes the rearrangement method of the layout document in detail. On this basis, the present application also correspondingly configures a rearrangement system (hereinafter referred to as a system) of the layout document, which will be described in detail below.

Incidentally, if there is a description in the system of the present embodiment, please refer to the description of the method section in the foregoing. Similarly, if the system is involved in the foregoing method, the description of the following system parts can also be introduced.

Referring to FIG. 3, a block diagram of a rearrangement system of a layout document according to an embodiment of the present application is shown. The rearrangement system (referred to as system) 300 of the layout document is composed of a stream tag extractor 310, a memory 320, a typesetting engine 330, and a stream tag preprocessor 340, and the like, which is externally stored by streaming tag data. The rearrangement effect and the rearrangement efficiency of the layout document can be effectively improved without modifying the original document without modification. The structure and function of each part of the system 300 are further described below.

As shown in FIG. 3, the 300 has a stream tag extractor 310, which can acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to a preset logical information structure, that is, the stream. The tag data is the result of the layout parsing of the layout document according to the preset stream logic information structure tag. Specifically, the stream tag extractor 310 includes a stream tag lookup module 311, a stream tag read module 312, a real-time tag engine module 313, and the like, wherein: the stream tag lookup module 311 is configured to pre-follow whether or not there is a layout The stream tag data corresponding to the document; the stream tag reading module 312 is configured to acquire the stream tag data when there is streaming tag data corresponding to the layout document; the real-time tag engine module 313 is configured to not exist and When the layout document corresponds to the stream tag data, the layout document is marked according to a preset streaming logic information structure to obtain the stream tag data and store it. The real-time tag engine module 313 can be configured locally or on the server side, and can perform layout analysis on the layout document by algorithm analysis or manual analysis or combination of algorithm and manual, according to preset flow logic information. After the structure is marked, Obtain the corresponding stream tag data.

As shown in FIG. 3, the system 300 also has a memory 320, which may be a cloud memory or a local memory, which may store streaming tag data in the form of a file or database record, which is stored separately from the layout document. In this embodiment, the streaming tag data is a tag result of parsing the layout document according to a preset streaming logic information structure, wherein the rich streaming document structure information and the structural information and layout information of the corresponding document content are recorded. Therefore, it can better correspond to the original layout document, which is convenient for reconfiguring the layout document, that is, rearranging the layout document.

The system 300 also has a typesetting engine 330 that rearranges the layout documents based on the streamed tag data to find corresponding document content in the layout document. Specifically, the typesetting engine 330 can search for the corresponding document content in the layout document according to the stream tag data through the locally selected streaming logic information structure, so as to rearrange the layout document. The basic process of rearrangement is to obtain the streaming tag (record) corresponding to the layout document from the stream tag data by the correspondence between the stream tag data determined by the locally selected stream logic information structure and the layout document. Find the corresponding content in the layout document for each streaming tag, and then re-type it by the layout engine.

Further, the system 300 can also have a streaming tag pre-processor 340 that is configured locally or on the server side to pre-tag the layout documents and store them after streaming tagged data. Generally, the pre-processing token obtained by the stream tag pre-processor 340 can process the document by using an algorithm on the server side, or can mark the document manually or by artificially combining the algorithm. Usually, in the case of pre-processing tags, certain software tools can be provided to pre-install the manufacturer.

In this embodiment, the stream tag data may be obtained by label preprocessing or by real-time tag processing. Either way, the resulting streaming tag data should be stored separately from the layout document.

In this embodiment, the basic process of performing preprocessing on the layout document is: firstly, the layout document is parsed, and the layout analysis is not limited to algorithm analysis, manual analysis, and the like; Then, the result of the streamed content of the layout information is externally stored, and the storage method is not limited to cloud storage, database, or local external file storage. Thus, by this pre-processing mark, the stream mark can be separated from the original layout document.

In this embodiment, the process of performing stream tag real-time processing on the layout document is similar to that of the tag processing, and the difference between the time and the subject of the tag processing is not described herein. Incidentally, when real-time tagging layout documents to obtain streaming tag data, such as the description standard of a certain file format and the corresponding algorithm problem of real-time parsing of the standard, please refer to the relevant literature in the prior art for details. , will not repeat them here.

Referring to FIG. 3, in conjunction with FIG. 1 and FIG. 2, the basic working process of the rearrangement system 300 of the above-mentioned layout document is as follows:

(1) Perform layout analysis on the layout document, and externally store the result after the layout information is marked; the algorithm of the layout analysis is not limited to algorithm analysis, manual analysis, etc., and the storage method is not limited to the cloud in the form of file or database record. Storage, local external file storage.

(2) The e-reader in the system can select the stream logic information structure that it considers to be optimal when displayed. These selected stream logic information structures can be local algorithm-implemented, locally pre-processed documents, users. The streamed logical information structure of the specified, most recently tagged technology tag.

(3) The process of rearrangement of the layout document is: firstly, the streaming tag data/document corresponding to the original document is obtained through a certain correspondence, such as the original document summary mode (not limited to md5, sha, etc.) Other database key-value pairs to specify the corresponding relationship; and then a streamed tag obtained from the stream tag data structure, which records the correspondence between the stream tag and its related content of the original document, and the correspondence is not limited to the document position. Offset, object number, etc.; finally find the corresponding content in the original document through the stream tag data, and directly submit it to the typesetting engine for layout display.

(4) If the original document cannot find the corresponding external stream logic tag data, it is analyzed and marked by the real-time layout analysis system, and then submitted to the typesetting engine to typeset and externally mark the result. storage.

In this way, the rearrangement system of the layout document of the present application can effectively improve the rearrangement effect and the rearrangement efficiency of the layout document without using the method of externally storing the data by using the streaming mark data without modifying the original document.

For the rearrangement system of the above layout documents, here are some additional questions to be added:

First, the system may need to identify a layout document of a plurality of different flow logic information structures. If the flow logic information structure is not recognized, the flow logic information structure is not the local flow logic information. structure. If the streaming logical information structure is a new version, information such as the version number, whether it is preprocessed, or the like can be described in the streaming logical information structure. In addition, the system can also notify the upgrade reader version accordingly to finally recognize and understand the flow logic information structure.

Second, when the system selects the flow logic information structure, the better correspondence is md5, which uniquely corresponds to an original document by marking the data. Specific to the correspondence of the content, it can be described using the document position offset, the object number, etc., and the details can be referred to the foregoing tag example.

Third, the system has no specific requirements for the typesetting engine. Now the mature typesetting engine (such as webkit) on the market can be selected according to the situation. Of course, the typesetting engine can also be developed by itself. In short, the problem of the typesetting engine is not the focus of the present application. When implementing the technical solution of the present application, the typesetting engine can be considered as a convention.

Fourth, the system has no special requirements for the real-time markup engine, as long as the real-time markup engine processes faster and the effect is acceptable. In the specific implementation, the real-time markup engine is generally implemented by an algorithm. The advantage is that the algorithm can be continuously upgraded, and the speed and effect are continuously improved. Considering that the server has more powerful cluster computing and historical data statistics, machine learning, artificial intelligence, etc., the real-time markup engine can also be considered on the server side, so the calculation speed is not a problem, Big data, machine learning, etc. can be used to get better markup results, just need to use the network to transfer the markup results. In the case of setting the real-time markup engine on the server side: if the network is good, the reader terminal can use the server-side tag data; if the network is not good, the reader terminal can use its own lightweight tag system tag data.

It can be understood that the rearrangement system of the layout document can have different application examples, and it can be a certain network system or a certain single device (for example, a mobile intelligent terminal such as a mobile phone or a tablet computer). An electronic reading terminal is specifically described as a product example.

For the sake of convenience, in the present application, for the structure of the rearrangement system and the electronic reading terminal of the layout document, similar functional structures are denoted by words such as modules, devices or units, which are briefly described below.

Referring to FIG. 4, a block diagram of a component of an embodiment of an electronic reading terminal of the present application is shown. The electronic reading terminal 400 can rearrange the layout document, and has a streaming tag data acquiring unit 410 and a layout document rearranging unit 420, wherein: the streaming tag data acquiring unit 410 can acquire the streaming tag stored separately from the layout document. Data, the streaming tag data is associated with the layout document according to a preset logical information structure. In other words, the streaming tag data marks the result of the layout parsing of the layout document according to the preset streaming logic information structure; the layout document rearranging unit 420 searches the corresponding document content in the layout document according to the streaming tag data. Rearrange the layout documents.

The above-described streaming tag data obtaining unit 410 may pre-search for whether there is pre-processed stream tag data corresponding to the layout document: when there is streaming tag data corresponding to the layout document, the streaming tag data is acquired; When the document corresponds to the stream tag data, the layout document is marked according to a preset stream logic information structure to obtain the stream tag data and store it. Thus, regardless of whether the layout document has been pre-marked, the electronic reading terminal 400 can effectively rearrange and then display.

The related embodiments of the present application have been described in detail above, and the layout document rearrangement technical solution has obvious advantages compared with the prior art solutions, and is briefly summarized as follows.

As mentioned earlier, the existing layout document rearrangement technology mainly adopts two schemes: one is to directly obtain the original layout document, analyze, understand, mark, and rearrange the real-time version; the other is to stream mark the original document. Save the original document and reorder it by getting the streaming tag from the original document. Both of the prior art solutions have certain defects, and the specific reasons are as described above.

Different from this, the rearrangement method, system and electronic reading terminal of the layout document proposed by the present application have obvious advantages, which overcome the defects of the above two prior art solutions in rearrangement effect and efficiency, and solve the document coverage. Incomplete, difficult to synchronize documents, etc. The rearrangement method, system and electronic reading terminal of the layout document have but are not limited to the following features:

(1) It is not limited to external storage of streaming tag data in the form of cloud storage, database or local external file storage, so it does not destroy the original document, and is helpful for professional preprocessing, technology upgrade and data update.

(2) Obtain the streaming tag data through a certain correspondence and parse the layout document. Such correspondences include, but are not limited to, various digests or other means that can store specified relationships with the original document in an external memory, thereby eliminating the need to strongly associate the original document with the streaming logical information structure by modifying the original document.

(3) The flow logic information structure only describes the logical information, and does not store the substantive document information in the logical information structure. Through a certain correspondence in the flow logic information structure, such as specifying a document offset, an object number, and the like, a correspondence relationship with the original document content is generated, and the data amount is small.

(4) Streaming logical structure information is the result of layout analysis of layout documents. These streaming logical structure information is not limited to marking these layout documents by means of algorithm analysis or manual analysis, and the specific mark forms and means are various.

The present application is disclosed in the above preferred embodiments, but it is not intended to limit the application, and any person skilled in the art can make possible changes and modifications without departing from the spirit and scope of the present application. The scope of protection of this application shall be determined by the scope defined by the claims of the present application.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include non-persistent memory, random access memory (RAM), and/or non-volatile memory in a computer readable medium, such as read only memory (ROM) or flash memory. Memory is an example of a computer readable medium.

1. Computer readable media including both permanent and non-persistent, removable and non-removable media may be stored by any system or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory. (ROM), electrically erasable programmable read only memory (EEPROM), flash memory or other memory technology, compact disk read only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, Magnetic tape cartridges, magnetic tape storage or other magnetic storage devices or any other non-transportable media can be used to store information that can be accessed by a computing device. As defined herein, computer readable media does not include non-transitory computer readable media, such as modulated data signals and carrier waves.

2. Those skilled in the art will appreciate that embodiments of the present application can be provided as a system, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.

Claims

A method for rearranging layout documents, comprising:

Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;

The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
The rearrangement method of a layout document according to claim 1, wherein the stream tag data includes logical information corresponding to the document content of the layout document, and does not include the substance of the layout document.
The method of rearranging a layout document according to claim 2, wherein the stream tag data comprises summary content of the layout document.
The method for rearranging a layout document according to claim 1, wherein pre-finding whether there is pre-processed stream tag data corresponding to the layout document;

If yes, obtaining the streaming tag data;

If not, the layout document is marked according to a preset streaming logic information structure to obtain streaming tag data and store it.
The method for rearranging a layout document according to claim 4, wherein the layout document is parsed by algorithm analysis or manual analysis or algorithm analysis combined with manual analysis, according to preset flow logic. After the information structure is marked, the corresponding stream tag data is obtained.
The method for rearranging a layout document according to claim 4, wherein the stream tag data is externally stored in the form of a file or a database record on the server side or locally.
The method for rearranging a layout document according to claim 1, wherein the layout document is rearranged by using a locally selected streaming logic information structure to find a corresponding document content in the layout document according to the streaming tag data. .
The method for rearranging a layout document according to claim 7, wherein the locally selected streaming logic information structure corresponds to a local algorithm implemented, a locally preprocessed, a user specified, or a latest marking technology. The streamed logical information structure of the tag.
The method for rearranging a layout document according to claim 7, wherein the correspondence between the stream tag data determined by the locally selected stream logic information structure and the layout document is obtained from the stream tag data or Part of the stream tag, find the corresponding document content in the layout document for each stream tag, and then re-type and display it by the typesetting engine.
A rearrangement system for a layout document, characterized in that it comprises:

The stream tag extractor is configured to acquire and tag the stream tag data, and the stream tag data is associated with the layout document according to the preset logical information structure;

a memory configured to store streaming tag data, the stream tag data being stored separately from the layout document;

The typesetting engine is configured to rearrange the layout document based on the corresponding document content in the layout document according to the streaming tag data.
The rearrangement system of a layout document according to claim 10, wherein the streaming tag extractor comprises a streaming tag lookup module, a streaming tag reading module, and a real-time tag engine module, wherein:

The streaming tag lookup module is configured to pre-find whether there is streaming tag data corresponding to the layout document;

The streaming tag reading module is configured to acquire the streaming tag data when there is streaming tag data corresponding to the layout document;

The real-time markup engine module is configured to mark the layout document according to a preset flow logic information structure when the flow mark data corresponding to the layout document does not exist, to obtain the flow mark data and store the file.
The rearrangement system of a layout document according to claim 11, wherein the real-time markup engine module is configured to perform layout on a layout document by means of algorithm analysis or manual analysis or combination of algorithm analysis and manual analysis. Parsing, the corresponding stream tag data is obtained after marking according to the preset stream logic information structure.
The rearrangement system of a layout document according to claim 11, wherein the real-time markup engine module is configured at a local or server end.
The rearrangement system of a layout document according to claim 10, wherein the memory is a cloud storage or a local storage, and the streaming tag data is externally stored in the form of a file or a database record.
The rearrangement system of a layout document according to claim 10, wherein the typesetting engine is configured to search for a corresponding document content in the layout document according to the streaming tag data through a locally selected streaming logic information structure. In order to rearrange the layout documents.
The rearrangement system of a layout document according to claim 15, wherein the typesetting engine is configured to correspond to the relationship between the stream tag data and the layout document determined by the locally selected streaming logic information structure. All or part of the stream tag is obtained in the stream tag data, and the corresponding document content in the layout document is found for each stream tag, and is re-formatted and displayed by the typesetting engine.
A rearrangement system for a layout document according to claim 10, comprising a streaming mark preprocessor configured to pre-tag the layout document and store the streamed tag data.
A rearrangement system for a layout document according to claim 17, wherein said streaming tag preprocessor is configured locally or on the server side.
An electronic reading terminal capable of rearranging a layout document, wherein the electronic reading terminal is configured to:

Obtaining stream tag data stored separately from the layout document, and the stream tag data is associated with the layout document according to a preset logical information structure;

The layout document is rearranged based on the streamed tag data to find the corresponding document content in the layout document.
The electronic reading terminal according to claim 19, wherein said electronic reading terminal is configured to:

Pre-fetching whether there is pre-processed streaming tag data corresponding to the layout document;

Obtaining the streaming tag data when there is streaming tag data corresponding to the layout document;

When there is no streaming tag data corresponding to the layout document, the layout document is marked according to a preset streaming logic information structure to obtain the stream tag data and store it.