DOCUMENT PAGINATION SYSTEM AND PROCESS
The present invention relates to a continuous pagination system and process for generalized markup language documents according to the claims 1 and 8.
Performing pagination of a document falls into two basic categories. One can distinguish continuous paging and "best fit" paging. Continuous paging maintains the normal document flow, possibly also using "best fit" practices within the constraints of the normal document flow. In contrast to continuous paging, the so called "best fit" paging does not attempt to maintaindocumentflowandtreatsthedocumentasdiscretepiecesofinforma- tion, which can be re-arranged. The present invention performs continuous paging on a document and, therefore, maintains the normal document flow.
Pagination utilizing traditional proprietary formats is in fairly common use. However, corresponding systems and processes must rely only on their proprietary formats and thus are not suitable to generate paginated documents from standard formats. The present invention on the other hand is directed to a system and a process that may be used in connection with documents composed of standardized generalized markup language (GML).
The class of said former software that supports proprietary formats needs not concern itself over issues relating to standard document formats, particularly to the issues surrounding
GML embedded element structures. In those cases, pagination is achieved with the assistance of the document definition language or template that is employed, which is considered proprietary and bearing none or coincidental resemblance to generalized markup languages, in particular XML.
TheoutputmediumformarkuplanguagessuchasHTML(HyperTextMarkupLanguage)has been primarily the display monitor. The display monitor allows programs, such as browsers, to render the markup as one continuous page. Generalized XML (such as HTML) exists within a document as hierarchical element structures.
Most browsers only print markup without regard to physical page boundaries, page areas (i.e., header, footer, left, right, etc.), or any other type of page specifications, Also, most browsers have either none or limitedfacilities for rendering output in differentformats that support pagination such as PDF and PostScript.
It is an object of this invention to create a system and a process that can automatically or partly automatic generate a continuous paginated document composed of a generalized markup language (GML), particularly composed of XML hierarchical element structures, that can, if desired, span any number of physical pages.
This object is solved by the invention as defined in the claims.
Theaim of combining theconcept of continuouspagination with the hierarchical structure of generalized markup languages, generates a new and unique problem itself, which cannot neither be solved using existing continuous pagination algorithms that do not handle embedded elements as part of the normal flow of a document nor by applying known rendering mechanisms. The hierarchical structure of GML documents opposes by its concept to a linear process for document pagination. Accordingly today's browsers render the markup as one continuous page.
The invention starts from the idea of using running document threads related to the document structure while employing a checkpoint mechanism. This checkpoint mechanism is directed to create and maintain said running document threads of information that are restartable from page to page of the document.
The Pagination system applies to documents composed of generalized markup language (GML), preferably XML documents. The document will be columnlike structured in hierarchic levels whereby the positioning of the desired page brakes for each page are then defined by saving continuous checkpoint snapshots. Each of these checkpoints is linked to corresponding GML-elements and styles.
The inventive system and process accordingly use these element and style stack structures to allow repeatedly a current snapshot of a document to be captured. Thus, positional information is maintained in the document checkpoints in order to establish the location of active document threads within the normal document flow. Furthermore, the invention
automatically handles margins, page areas, and page specifications, while allowing element-level control of significant page events such as "inner element" page breaks, etc.
Sample embodiments of the invention are hereinafter described with reference to the drawings, in which
fig.1 shows a principal example document fig.2 shows a sample structure of element and style stacks fig.3 shows a sample checkpoint list structure fig.4 shows corresponding input and output checkpoint lists fig.5 shows schematically an overview of the pagination process fig.6 shows an embodiment of the inventive process
By means of a sample document shown in figure 1 a principle of the inventive pagination system will be described. The sample document has three pages P1-P3 with corresponing page breaks SI, S2 between each pair of subsequent pages P1-P3. Each of these page breaks SI, S2 is shown by a dashedline. There are noother special pageareas or special page layout conditions that are used for the purpose of this example that is schematically shown in figure 1.
The outermost box that is shown in figure 1 is the document itself, which spans said three pagesP1-P3 and is labeled LO. Within the first page PI and continuing onto page P2 is a two- column element that is labeled LI and represents a level 1 element. Within the second column of the two-column element L1 is a three-column element labeled L2 representing
a level 2 element.
On page P2, the elements LI and L2 terminate and a second two-column, level 1 element LI begins that continues onto page P3. Within the first column of this second element LI is another two-column, second level element L2. It may be noted here that one should recognize that these columnlike structure is layout related but not defined or given by the layout. It is rather part of the inventive solution to structure the document into columnlike related hierarchic levels L0-L2 representing the running document threads.
Since said elements may spread over one or more pages the invention foresees checkpoints that are marked in figure 1 as CP1 to CP4. These checkpoints are specifically used to allow a document content to be continued over physical page boundaries, while saving the positional data on a continuous basis.
When the content of document in figure 1 is rendered, the invention uses stack mechanisms that are continuously updated for both the embedded elements and their corre- spondingformatting styles. An examplethereofisshowninfigure 2. For example, when the level 2 element L2isestablishedwhileformattingpagePl,anelementstackstructureEand a style stack structure S might appear as shown in figure 2. Also in this example, level 2 element L2 establishes a second style L2-B for an in-line change, i.e. font name, font size, etc. At each page break position such stack information for element stack structures E and style stack structures S is saved and maintained while the processing of a subsequent page begins.
In figure 3 an example of a checkpoint linked list structure isshown. When a column of an element encounters a page boundary SI, S2 (cf. fig.1), it's currently active state is saved in a checkpoint linked list structure. The currently active state is defined as the relevant contents of the element stack E and style stack S as explained before in addition to posi- tional information within the document where the document flow may be resumed. The checkpoints CP1-CP4 are snapshot at the page break SI (cf. figure 1) and result in the checkpoint linked list structure together with their respective element and style stack as shown in figure 3.
After an element's state has been saved as a checkpoint in the checkpoint list or an ele- ment is completed or the next column of an element is reached, the document flow is resumed. If the most recent element of the current element stack is contained within the next checkpoint, the next checkpoint is used instead (otherwise, the normal document flow resumes). Normal documentflowconsistsoffurtherexploriπgchild.siblingand parent element relationships. When all possibilities have been exhausted for the current page, a physical page break isthen performed. Thisallowstopaginatea document within a defined columnlike structure or in areas of normal document flow, i.e. in the level 0 element.
While processing pages, an "inpuf'checkpoint list structure IL is utilized. During the course of processing each page, an "output" checkpoint list structure OL is created. Afterprocess- inga particular page, the inputand output checkpoint list structuresare then switched and document processing resumes in the same manner for each subsequent page as can be seen in figure 4.
All document input is controlled by the checkpoint mechanism previously described. Once the first checkpoint has been primed, all subsequent document input is driven by this mechanism. For example, the first input checkpoint for page 1, referred to as IL in figure 4, creates the subsequent output checkpoints, referred to as OL The newly created output checkpoints are then used as the input checkpoints for page 2 and the process is repeated until the end of the document is reached.
Figure 5 now shows more in detail the core pagination process. An initial input checkpoint is created that points to the start of the document having default outermost level box and style elements. An output loop continues processing based upon one or more checkpoints being present in the input checkpoint list. During processing, new checkpoints are created in the output checkpoint list.
At the end of the outer loop, the input and output checkpoint lists are swapped (i.e., the input list becomes the output list and visa versa). In essence, the output checkpoints created by the previous page's processi ng become the i nput checkpoints for the next page's processing, thus driving the outer loop.
The inner loop processes all checkpoints in the input checkpoint list until the list is exhausted (i.e., empty). For each input checkpoint, the box and style elements are restored into the element and style stacks and the document is repositioned accordingly.
Formatting proceeds until a logical break in processing occurs. A logical break in processing is defined as the point where the most recent element of the element stack exists within the next checkpoi nt and either an output checkpoi nt is created due to a logical page break occurring (i.e., for a given content thread) or an element is completed or the next column
of an element is reached. The most recent element of the current element stack is contained within the next input checkpoint.
A logical break in processing can also occur when normal document flow has been exhausted for the current page. In addition, an output checkpoint can also be created due to a logical page break without resulting in a logical break in processing.
After the inner loop processes all checkpoints in the input checkpoint list and has created the output checkpoint list for the current page, a physical page break occurs. At this point the outer loop swaps the input and output checkpoint lists and performs physical page break processing prior to continuing the overall document processing.
This process may be understood mor in detail by reference to a sample main inputloopfor the document as it might appear in C++:
// Prime the first checkpoint. CPinp->putCPtail(0,0,0);
for (intStylestackinx = 0; iπtStylestackinx < MAXSTYLESTACK; ++intStylestackinx) stylestackp[intStylestackinx] = stylepoolp->newstyle(); intStylestackinx = 0;
stylep = stylepoolp->newstyle(); setdefaultStyle(stylep,lngPageWidth,lngPageHeight,
intOrientation,lngMarginTop,lπgMarginRight,lngMargiπBottom,lngMarginLeft,lngGutte r); CPinp->pushStyle(stylep);
for (intBoxstackinx = 0; intBoxstackinx < MAXBOXSTACK; ++intBoxstackinx) boxstackp[intBoxstackinx] = boxpoolp->newbox();
intBoxstackinx = 0;
boxp = boxpoolp->newboxO; CPinp->pushBox(boxp);
// Outer loop for processing the entire document, while (! CPinp->isCPEmptyO) {
// Process all checkpoints in the input checkpoint list. while (! CPinp->getCPhead(lngFilepos,lngFileposout,intRslt)) {
// Restore the style stack. intlnx = 0; while (! CPinp->popStyle(&stylep))
{ stylepoolp->delstyle(stylestackp[intlnx]); stylestackpfintlnx] = stylep; ++intlnx;
} for (; intlnx < intStylestackinx; ++intlnx) clearStyle(stylestackp[intlnx]); intStylestackinx = intlnx;
// Restore the box stack. intlnx = 0; while (! CPinp->popBox(&boxp))
{ boxpoolp->delbox(boxstackp[intlnx]); boxstackp[intlnx] = boxp; ++intlnx;
} for (; intlnx < intBoxstackinx; ++intlnx) clearBox(boxstackp[intlnx]); intBoxstackinx = intlnx;
// Reposition the input buffer. fseek(fp,lngFilepos,SEEK_SET);
// Process the current checkpoint exiting when we need the next input checkpoint. // Normal document processing occurs within the "ProcessLoop" function, if (intRslt = ProcessLoop(panonev,lngFilepos,intPageBreak)) heapreturπ(intRslt);
// Sync-up subsequent checkpoints in the input checkpoint list. CPinp->updateCPSiblings(intBoxstackinx-1);
// Fetch the next checkpoint in the input checkpoint list. CPinp->nextCP();
}
// Swap the now empty input checkpoint list with the recently populated // output checkpoint list, thus giving us our new input checkpoint list. bolCPToggle = (! bolCPToggle); CPinp = (bolCPToggle ? CP2p : CPlp); CPoutp = (bolCPToggle ? CPlp : CP2p);
if (! CPinp->isCPEmptyO) { // Perform the physical page break.
} }
Until thepaginatedoutputdocumentisgeneratedthepaginationprocessperformsseveral steps, some of them being recursive. A document LO composed of a generalized markup language GML is structured by defining columnlike related hierarchic levels L0-L2 (cf. fig.1). An initial checkpoint snapshot CP1-CP4 is saved for the uppermost level LO, whereby a corresponding element stack E and style stacks are associated with this initial checkpoint.
For a page Pn a page break S1, S2 is positioned according to a defined pagination rule for the corresponding page Pn. The output checkpoints CP1-CP4 for elements that encounter a page break S1. S2 are saved whereby the active element stack E, style stack S and file position are associated with each checkpoint CP1-CP4, respectively. After an output check- point has been saved or an element is completed or the next column of an element is reached, the formatti ng of the page Pn is resumed by processing the next i nput checkpoi nt when the most recent element of the element stack exists within the next checkpoint. Otherwise, the normal document flow is resumed until all possibilities have been exhausted for the current page. The physical page break is processed and the input and output checkpoints are swapped. The pagination process iscontinued for any input checkpoints until the end of the document has been reached. The output document is then generated in electronic or printed form.
An example of the inventive process is schematically shown in figure 6. Initially the document structure is defined by using valid XML entities. Elements and styles are defined according to XML and CSS standards, respectively, and according to the user needs. In a preferredembodimentoftheinventiontheautomatedpaginationprocessisappliedtoXML documentsthatcontainapplicationdatacollectedfromadatabase source. Thisapplication data is collected, if desired, on an as needed basis, i.e. the process is collecting the data at the time it is required. Furthermore, the pagination rules are defined by the user or by presets. Basedonthesepaginationrulestheinventionallowstoguarantee predetermined page breaks at defined positions allowing homogenized documents. On the other hand it is possible to use a concept with conditional rules that generate f ul ly automated document pagination. The document structure is pre-processed by an SGML parser. Each page Pn is
then processed asdescribed above i n connection with figure 5. After each page Pn has been processed the page break positions are determined and, hence, the document is rendered by generating the required output, which may be a print or screen output or an output of another desired format, such as PDF, PostScript or the like.