CN103544262A

CN103544262A - XML-based stream page release method and system

Info

Publication number: CN103544262A
Application number: CN201310484727.7A
Authority: CN
Inventors: 王冬雪; 麻锐; 孟利民; 王辉; 张标标
Original assignee: Enjoyor Co Ltd
Current assignee: Yinjiang Technology Co.,Ltd.
Priority date: 2013-10-16
Filing date: 2013-10-16
Publication date: 2014-01-29
Anticipated expiration: 2033-10-16
Also published as: CN103544262B

Abstract

The invention discloses an XML-based stream page release method and system. The method includes the steps of 1, streaming an XML input document that meets predetermined segmenting conditions, namely segmenting and reconfiguring, and selectively performing streaming again; 2, quickly paging the XML input document that meets predetermined partitioning conditions, namely multiple binary tree partitioning and reconfiguring; 3, according to a conversion pattern table provided by a terminal device, converting the input document into documents of other standard formats to output; 4, transmitting the documents of different standard formats to the corresponding terminal devices. The system comprises a streaming unit, a fast paging unit, an XSLT converter and a release server, wherein the streaming unit comprises a segmenting device and a reconfiguring device, and the fast paging unit comprises a partitioning device and a reconfiguring device. The method and system is applicable to super-large XML documents, with improved conversion reliability and fault tolerance, and is well flexible and widely applicable.

Description

An XML-based streaming pagination publishing method and system

技术领域technical field

本发明涉及一种基于XML的分页发布方法及系统。The invention relates to an XML-based paging publishing method and system.

背景技术Background technique

随着信息技术的飞速发展，越来越多的企事业单位都需要操作海量数据，如医院的医疗数据、交通局的交通数据，电业局的电力数据、规划局的规划数据，水利局的水文、水利数据，气象局的气象数据，这些数据往往是以XML的形式存储在服务器中，用户只需要访问服务器上的文档便可以实现数据的获取。但是，当用户通过PC、手持设备、智能手机等不同的终端设备对服务器上的文档进行访问时，由于终端显示格式、软件系统存储与读取格式的不同，要想正确接收并显示数据，就必须对服务器上的文档进行格式转换。目前，XML文档格式转换工具主要包括：DOM、SAX和XSLT，其中，XSLT作为现在最流行的XML文档格式转换技术之一，功能十分强大，工作原理却比较简单，如图1所示。With the rapid development of information technology, more and more enterprises and institutions need to operate massive data, such as medical data of hospitals, traffic data of traffic bureaus, power data of power bureaus, planning data of planning bureaus, Hydrological and water conservancy data, meteorological data of the Meteorological Bureau, these data are often stored in the server in the form of XML, and users only need to access the documents on the server to obtain the data. However, when users access documents on the server through different terminal devices such as PCs, handheld devices, and smart phones, due to the differences in terminal display formats and software system storage and reading formats, it is necessary to correctly receive and display data. A format conversion must be performed on the document on the server. At present, XML document format conversion tools mainly include: DOM, SAX and XSLT, among which, XSLT, as one of the most popular XML document format conversion technologies, is very powerful, but its working principle is relatively simple, as shown in Figure 1.

由于在转换的过程中，首先需要将XML源文档解析成DOM树存放在内存中，文档过大势必会造成内存的溢出。因此，用户在使用PC、手持设备、智能手机等终端设备读取大数据的过程中，往往会因内存不足或显示屏尺寸过小而无法正确接收和显示数据。Because in the conversion process, the XML source document first needs to be parsed into a DOM tree and stored in the memory, if the document is too large, it will inevitably cause memory overflow. Therefore, when users use terminal devices such as PCs, handheld devices, and smart phones to read large data, they often cannot receive and display the data correctly due to insufficient memory or too small a display screen.

又因为传统的分页处理过程只是实现了分段处理器的功能，即，对输入文档进行迭代式的分段处理，所以得到的所有小XML文档都是非“形式良好”的，使得下一步的转换操作不具备相对独立性，可靠性和容错性也较差，另外迭代的处理方式还大大地降低了分段处理的速度。And because the traditional pagination process only realizes the function of the segmentation processor, that is, iteratively segmenting the input document, so all the small XML documents obtained are not "well-formed", making the next conversion The operation is not relatively independent, and the reliability and fault tolerance are also poor. In addition, the iterative processing method also greatly reduces the speed of segmented processing.

发明内容Contents of the invention

为了克服已有基于XML的分页发布方法及系统的不能适用于XML文档过大，和转换可靠性、容错性、灵活性、适用性要求较高的场合的不足,本发明提供了一种适用于XML文档过大、转换可靠性、容错性、灵活性、适用性要求较高的场合下的基于XML的流式分页发布方法及系统。In order to overcome the shortcomings of the existing XML-based paging publishing method and system that cannot be applied to the occasions where the XML document is too large, and the conversion reliability, fault tolerance, flexibility, and applicability are high, the present invention provides a An XML-based streaming pagination publishing method and system for occasions where XML documents are too large and require high conversion reliability, fault tolerance, flexibility, and applicability.

本发明解决其技术问题所采用的技术方案是：The technical solution adopted by the present invention to solve its technical problems is:

一种基于XML的流式分页发布方法，所述发布方法包括以下步骤：An XML-based streaming paging publishing method, the publishing method comprising the following steps:

（1）流化处理过程：(1) Fluidization process:

对于每个大型的XML输入文档，流化处理器先要对其大小进行判断，如果文档大小不超过预先设定的分段读取阈值，即T_s≤T_m，那么进入步骤（2）处理；反之，如果文档大小超过预先设定的分段读取阈值，即T_s>T_m，那么流化处理器将对该文档进行分段和重构处理，处理后将生成两个形式良好的XML文档，一个大小等于T_m，另一个大小等于T_s-T_m，前者将被送入步骤（2）处理，而后者将被送往流化处理器进行再一次地判断、分段和重构处理；For each large XML input document, the streaming processor first needs to judge its size. If the document size does not exceed the preset segment reading threshold, that is, T _s ≤ T _m , then proceed to step (2) for processing ; Conversely, if the file size exceeds the preset segment read threshold, that is, T _s >T _m , then the streaming processor will segment and reconstruct the document, and will generate two well-formed XML document, one size is equal to T _m , and the other size is equal to T _s -T _m , the former will be sent to step (2) for processing, and the latter will be sent to the streaming processor for judgment, segmentation and reassessment structural processing;

（2）快速分页处理过程：(2) Fast paging process:

若XML文档F_s0,1的大小远远超过终端设备的需求内存T，即T_s0,1＞＞T，则对XML文档F_s0,1进行第一轮的分割和重构处理，生成两个“形式良好”的新XML文档F_s1,1和F_s1,2；接下来再对新生成的两个文档F_s1,1和F_s1,2进行判断和第二轮的分割和重构处理，即，若两个新生成的文档F_s1,1和F_s1,2仍满足分割条件：T_s1,1＞＞T且T_s1,2＞＞T，则应同时对这两个文档进行分割和重构处理，生成四个“形式良好”文档F_s2,1、F_s2,2、F_s2,3和F_s2,4，依此类推，反复地判断、分割和重构，直到某一轮分割生成的所有XML文档的大小均不超过终端设备的需求内存，分割和重构处理过程结束；If the size of the XML document F _s0,1 far exceeds the required memory T of the terminal device, that is, T _s0,1 ＞>T, the XML document F _s0,1 will be divided and reconstructed in the first round to generate two "Well-formed" new XML documents F _s1,1 and F _s1,2 ; next, the two newly generated documents F _s1,1 and F _s1,2 are judged and the second round of segmentation and reconstruction processing is performed, That is, if the two newly generated documents F _s1,1 and F _s1,2 still meet the segmentation conditions: T _s1,1 ＞＞T and T _s1,2 ＞＞T, then the two documents should be segmented and Refactoring process, generate four "well-formed" documents F _s2,1 , F _s2,2 , F _s2,3 and F _s2,4 , and so on, repeat judgment, segmentation and reconstruction until a certain round of segmentation The size of all generated XML documents does not exceed the required memory of the terminal device, and the segmentation and reconstruction process ends;

（3）XSLT转换过程：对照终端设备提供的转换样式表，将输入文档转换成其它标准格式的文档输出；(3) XSLT conversion process: compare the conversion style sheet provided by the terminal device, convert the input document into a document output in other standard formats;

（4）发布过程：将具有不同标准格式的文档发送给相应的终端设备。(4) Publishing process: Send documents with different standard formats to corresponding terminal devices.

进一步，所述步骤（1）中，流化处理过程包括分段处理过程和重构处理过程，所述分段处理过程：Further, in the step (1), the stream processing process includes a segmentation processing process and a reconstruction processing process, and the segmentation processing process:

假设现在有一个XML文档F_s，大小为T_s，流化处理器可用的最大内存为T_m，如果XML文档非常大，远远大于流化处理器可用的最大内存，即T_s＞＞T_m，或者说，满足条件：T_s≈pT_m，p＞＞1，那么使用流化处理器中的分段器对它进行分段处理，具体包括以下三个步骤：Assume that there is an XML document F _s with a size of T _s , and the maximum available memory of the streaming processor is T _m . If the XML document is very large, it is far greater than the maximum available memory of the streaming processor, that is, T _s >>T _m , or in other words, satisfy the condition: T _s ≈ pT _m , p>>1, then use the segmenter in the fluidization processor to segment it, which specifically includes the following three steps:

第一、读取XML文档F_s；First, read the XML document F _s ;

第二、设定分段读取阈值T_d=T_m；Second, set the segmentation reading threshold T _d =T _m ;

第三、进行分段处理，生成两个非“形式良好的”XML文档：Third, segment processing to generate two non-"well-formed" XML documents:

①F_s1，大小记为T_s1，T_s1=T_d=T_m；①F _s1 , the size is recorded as T _s1 , T _s1 =T _d =T _m ;

②F_s2，大小记为T_s2，T_s2=T_s-T_d=T_s-T_m。②F _s2 , the size is recorded as T _s2 , T _s2 =T _s -T _d =T _s -T _m .

更进一步，所述重构处理过程包括初步重构和再重构两个步骤，过程如下：Furthermore, the reconstruction process includes two steps of initial reconstruction and further reconstruction, the process is as follows:

1.1）读取第三步骤生成的XML文档F_s1；1.1) Read the XML document F _s1 generated in the third step;

1.2）将指针定位到尾部；1.2) Position the pointer to the end;

1.3）向前搜索结束标签的开始标记“</”，并记录其位置为L₁；1.3) Search forward for the start tag "</" of the end tag, and record its position as L ₁ ;

1.4）从L₁开始向后搜索相应的结束标记“>”，并记录其位置为L₂，此时会有两种可能：1.4) Search backward from L ₁ for the corresponding end tag ">", and record its position as L ₂ , there are two possibilities at this time:

如果能够搜到结束标记“>”，那么L₂的值就是该标记的位置值；If the end tag ">" can be found, then the value of _L2 is the position value of the tag;

反之，如果未能搜到结束标记“>”，这时应将指针定位到L₁处，再一次执行步骤1.3），得到新的L₁值后，再执行步骤1.4），获取新的L₂值，这个新的L₂值才是该情况下结束标记的真正位置；Conversely, if the end tag ">" cannot be found, then the pointer should be positioned at L ₁ , and step 1.3) should be performed again, and after obtaining the new value of L ₁ , perform step 1.4) again to obtain a new L ₂ value, this new L ₂ value is the real position of the end mark in this case;

1.5）将因分割而导致的不完整数据从F_s1的尾部移到F_s2的首部；此后将得到已删除不完整数据的XML文档F_s1，和已添加不完整数据的XML文档F_s2；1.5) Move the incomplete data caused by segmentation from the end of F _s1 to the head of F _s2 ; after that, the XML document F _s1 with incomplete data deleted and the XML document F _s2 with incomplete data added will be obtained;

1.6）获取因分割而缺失的所有祖先节点的标签名字：1.6) Get the label names of all ancestor nodes missing due to the split:

1.6.1）设定读取标志flag=True，当读取的值的长度大小或等于0时，flag=False；1.6.1) Set the read flag flag=True, when the length of the read value is equal to or equal to 0, flag=False;

1.6.2）读取步骤1.5）生成的已删除不完整数据的XML文档F_s1，将每个节点标签名字，空标签名字除外，添加到列表中；1.6.2) Read the XML document F _s1 with deleted incomplete data generated in step 1.5), and add the tag name of each node, except for the empty tag name, to the list;

1.6.3）统计列表中的不同元素及其个数，根据形式良好的XML文档的开始标签和结束标签应配套，空标签要关闭的原理，个数为奇数的元素，第一个元素除外，即为因分割而缺失的祖先节点标签名字，将这些标签名字放入另一个列表中，获取这些节点标签名字时，应保持其在列表中的原有顺序不变；1.6.3) Count the different elements and their numbers in the list. According to the principle that the start tag and end tag of a well-formed XML document should match, and empty tags should be closed, elements with an odd number, except the first element, That is, the label names of the ancestor nodes that are missing due to splitting, put these label names into another list, and when obtaining the label names of these nodes, keep their original order in the list unchanged;

1.7）将步骤1.5）生成的两个XML文档F_s1和F_s2构造成形式良好的XML文档：1.7) Construct the two XML documents F _s1 and F _s2 generated in step 1.5) into a well-formed XML document:

1.7.1）把步骤1.6.3）得到的列表中的元素用作结束标签倒序地添加到已删除不完整数据的XML文档F_s1的尾部；1.7.1) Use the elements in the list obtained in step 1.6.3) as end tags and add them in reverse order to the end of the XML document F _s1 with incomplete data deleted;

1.7.2）把步骤1.6.3）得到的列表中的元素，第一个元素除外，用作开始标签正序地添加到已添加不完整数据的XML文档F_s2的首部；1.7.2) Add the elements in the list obtained in step 1.6.3), except the first element, as start tags to the header of the XML document F _s2 to which incomplete data has been added;

1.7.3）把步骤1.6.3）得到的列表中的第一个元素，即声明标签名字，用作开始标签添加到步骤1.7.2）得到的XML文档F_s2的首部。1.7.3) Add the first element in the list obtained in step 1.6.3), that is, the declaration tag name, as a start tag to the header of the XML document F _s2 obtained in step 1.7.2).

再进一步，所述步骤（2）中，假设现在有一个XML文档F_s0,1，其大小为T_s0,1，终端设备的需求内存为T，如果XML文档非常大，远远大于终端设备的需求内存，即T_s0,1＞＞T，或者说，满足条件：T_s0,1≈qT，q＞＞1，对这个XML文档进行第一轮的分割和重构处理，过程如下：Further, in the step (2), suppose there is an XML document F _s0,1 whose size is T _s0,1 , and the required memory of the terminal device is T. If the XML document is very large, far larger than the terminal device Require memory, that is, T _s0,1 ＞＞T, or in other words, satisfy the condition: T _s0,1 ≈qT, q＞＞1, and perform the first round of segmentation and reconstruction processing on this XML document, the process is as follows:

2.1）读取XML文档F_s0,1；2.1) Read the XML document F _s0,1 ;

2.2）设定一个分割阈值

2.2) Set a segmentation threshold

2.3）进行第一轮分割处理，此轮包括一次分割处理过程，分割后得到两个非“形式良好”的XML文档：2.3) Carry out the first round of segmentation processing. This round includes a segmentation process. After segmentation, two non-"well-formed" XML documents are obtained:

①F_s1,1，大小记为T_s1,1，T_s1，1=T_f0,1；①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 =T _f0,1 ;

②F_s1,2，大小记为T_s1,2，T_s1,2=T_f0,1；②F _s1,2 , the size is recorded as T _s1,2 , T _s1,2 =T _f0,1 ;

2.4）对步骤2.3）生成的两个XML文档F_s1,1和F_s1,2进行重构处理，处理后将得到两个新的“形式良好”的XML文档：2.4) Refactor the two XML documents F _s1,1 and F _s1,2 generated in step 2.3), and after processing, two new "well-formed" XML documents will be obtained:

①F_s1,1，大小记为T_s1,1，T_s1,1≈T_f0，1；①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 ≈ _{T f0,1} ;

②F_s1,2，大小记为T_s1,2，T_s1,2≈T_f0,1；②F _s1,2 , the size is recorded as T _s1,2 , T _s1,2 ≈ _{T f0,1} ;

此时第一轮的分割和重构处理过程结束，接下来我们将设定两个分割阈值

和

对步骤2.4）生成的两个XML文档F_s1,1和F_s1,2进行第二轮的分割和重构处理，处理后将得到四个“形式良好”的文档F_s2,1、F_s2,2、F_s2,3、F_s2,4，大小分别等于T_f1,1、T_f1,1、T_f1,2、T_f1,2，依此类推，设定2^n-1个分割阈值

对2^n-1个XML文档F_s(n-1),k,k=1,…,2^n-1进行第n轮的分割和重构处理，处理后将得到2ⁿ个“形式良好”的XML文档F_sn,k,k=1,…,2ⁿ，至此，所有文档的大小都不超过终端设备的需求内存，即T_sn,k≤T,k=1,…,2ⁿ，不再满足分割条件，分割和重构处理结束。At this point, the first round of segmentation and reconstruction processing is over, and then we will set two segmentation thresholds

and

Carry out the second round of segmentation and reconstruction processing on the two XML documents F _s1,1 and F _s1,2 generated in step 2.4). After processing, four "well-formed" documents F _s2,1 and F _{s2, 2} , F _s2,3 , F _s2,4 , the size is equal to T _f1,1 , T _f1,1 , T _f1,2 , T _f1,2 respectively, and so on, set 2 ^n-1 segmentation thresholds

Perform the nth round of segmentation and reconstruction processing on 2 ^n-1 XML documents F _s(n-1),k ,k=1,…,2 ^n-1 , and get 2 ⁿ “well-formed” documents after processing XML document F _sn,k ,k=1,…,2 ⁿ , so far, the size of all documents does not exceed the required memory of the terminal device, that is, T _sn,k ≤T,k=1,…,2 ⁿ , not Then the segmentation condition is satisfied, and the segmentation and reconstruction process ends.

一种基于XML的流式分页发布系统，所述发布系统包括：An XML-based streaming paging publishing system, the publishing system includes:

流化处理器：对于每个大型的XML输入文档，流化处理器先要对其大小进行判断，如果文档大小不超过预先设定的分段读取阈值，即T_s≤T_m，那么将此文档交由快速分页器处理；反之，如果文档大小超过预先设定的分段读取阈值，即T_s>T_m，那么流化处理器将对该文档进行分段和重构处理，处理后会生成两个形式良好的XML文档，一个大小等于T_m，另一个大小等于T_s-T_m，前者将被送入快速分页器处理，而后者将被送往流化处理器进行再一次地判断、分段和重构处理；Streaming processor: For each large XML input document, the streaming processor first needs to judge its size. If the document size does not exceed the preset segment reading threshold, that is, T _s ≤ T _m , then the This document is processed by the fast pager; on the contrary, if the document size exceeds the preset threshold for segmented reading, that is, T _s >T _m , then the streaming processor will segment and reconstruct the document, processing Two well-formed XML documents will be generated, one with size T _m and the other with size T _s -T _m , the former will be sent to the fast pager for processing, and the latter will be sent to the streaming processor for another ground judgment, segmentation and reconstruction processing;

快速分页器：若XML文档F_s0,1的大小远远超过终端设备的需求内存T，即T_s0,1＞＞T，则对XML文档F_s0,1进行第一轮的分割和重构处理，生成两个“形式良好”的新XML文档F_s1,1和F_s1,2；接下来再对新生成的两个文档F_s1,1和F_s1,2进行判断和第二轮的分割和重构处理，即，若两个新生成的文档F_s1,1和F_s1,2仍满足分割条件：T_s1,1＞＞T且T_s1,2＞＞T，则应同时对这两个文档进行分割和重构处理，生成四个“形式良好”文档F_s2,1、F_s2,2、F_s2,3和F_s2,4，依此类推，反复地判断、分割和重构，直到某一轮分割生成的所有XML文档的大小均不超过终端设备的需求内存，分割和重构处理过程结束；Fast pager: If the size of the XML document F _s0,1 far exceeds the required memory T of the terminal device, that is, T _s0,1 >>T, then perform the first round of segmentation and reconstruction processing on the XML document F _s0,1 , generate two _" well-formed" new XML documents F _s1,1 and F _s1,2 ; then judge and second round of _segmentation and Refactoring processing, that is, if two newly generated documents F _s1,1 and F _s1,2 still meet the segmentation conditions: T _s1,1 ＞＞T and T _s1,2 ＞＞T, then these two documents should be processed at the same time The document is divided and reconstructed to generate four "well-formed" documents F _s2,1 , F _s2,2 , F _s2,3 and F _s2,4 , and so on, repeatedly judging, dividing and reconstructing until The size of all XML documents generated by a certain round of segmentation does not exceed the required memory of the terminal device, and the segmentation and reconstruction process ends;

XSLT转换器：用于对照终端设备提供的转换样式表，将输入文档转换成其它标准格式的文档输出；XSLT Converter: It is used to convert the input document into document output in other standard formats according to the conversion style sheet provided by the terminal device;

发布服务器：用于将具有不同标准格式的文档发送给相应的终端设备。Publisher: used to send documents with different standard formats to corresponding terminal devices.

进一步，所述流化处理器包括分段器和重构器，其中，Further, the streaming processor includes a segmenter and a reconstructor, wherein,

所述分段器中，假设现在有一个XML文档F_s，大小为T_s，流化处理器可用的最大内存为T_m，如果XML文档非常大，远远大于流化处理器可用的最大内存，即T_s＞＞T_m，或者说，满足条件：T_s≈pT_m，p＞＞1，那么使用流化处理器中的分段器对它进行分段处理，具体包括以下三个步骤：In the segmenter, it is assumed that there is an XML document F _s with a size of T _s , and the maximum available memory of the streaming processor is T _m . If the XML document is very large, it is far greater than the maximum available memory of the streaming processor , that is, T _s ＞＞T _m , or in other words, satisfy the condition: T _s ≈pT _m , p＞＞1, then use the segmenter in the fluidization processor to segment it, specifically including the following three steps :

第一、读取XML文档F_s；First, read the XML document F _s ;

第二、设定分段读取阈值T_d＝T_m；Second, set the segmentation reading threshold T _d =T _m ;

①F_s1，大小记为T_s1，T_s1=T_d＝T_m；①F _s1 , the size is recorded as T _s1 , T _s1 =T _d ＝T _m ;

更进一步，所述重构器中，处理过程包括初步重构和再重构两个步骤，过程如下：Further, in the reconstructor, the processing process includes two steps of preliminary reconstruction and re-construction, and the process is as follows:

1.2）将指针定位到尾部；1.2) Position the pointer to the end;

再进一步，所述快速分页器包括分割器和重构器，其中，Still further, the fast pager includes a splitter and a reconstructor, wherein,

所述分割器中，假设现在有一个XML文档F_s0,1，其大小为T_s0,1，终端设备的需求内存为T，如果XML文档非常大，远远大于终端设备的需求内存，即T_s0,1＞＞T，或者说，满足条件：T_s0,1≈qT，q＞＞1，对这个XML文档进行第一轮的分割和重构处理，过程如下：In the splitter, it is assumed that there is an XML document F _s0,1 whose size is T _s0,1 and the required memory of the terminal device is T. If the XML document is very large, it is far larger than the required memory of the terminal device, that is, T _s0,1 ＞＞T, or in other words, satisfy the condition: T _s0,1 ≈qT, q＞＞1, perform the first round of segmentation and reconstruction processing on this XML document, the process is as follows:

2.1）读取XML文档F_s0,1；2.1) Read the XML document F _s0,1 ;

2.2）设定一个分割阈值

2.2) Set a segmentation threshold

①F_s1,1，大小记为T_s1,1，T_s1,1=T_f0,1；①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 =T _f0,1 ;

2.4）对步骤2.3）生成的两个XML文档F_s1,1和F_s1,2在所述重构器中进行重构处理，处理后将得到两个新的“形式良好”的XML文档：2.4) The two XML documents F _s1,1 and F _s1,2 generated in step 2.3) are reconstructed in the reconstructor, and two new "well-formed" XML documents will be obtained after processing:

①F_s1,1，大小记为T_s1,1，T_s1,1≈T_f0,1；①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 ≈ _{T f0,1} ;

此时第一轮的分割和重构处理过程结束，接下来我们将设定两个分割阈值和

对步骤2.4）生成的两个XML文档F_s1,2和F_s1,2进行第二轮的分割和重构处理，处理后将得到四个“形式良好”的文档F_s2,1、F_s2,2、F_s2,3、F_s2,4，大小分别等于T_f1,1、T_f1,1、T_f1,2、T_f1,2，依此类推，设定2^n-1个分割阈值

对2^n-1个XML文档F_s(m-1),k,k=1,…,2^n-1进行第n轮的分割和重构处理，处理后将得到2ⁿ个“形式良好”的XML文档F_sn,k，k=1,…,2ⁿ，至此，所有文档的大小都不超过终端设备的需求内存，即T_sn,k≤T,k=1,…,2ⁿ，不再满足分割条件，分割和重构处理结束。At this point, the first round of segmentation and reconstruction processing is over, and then we will set two segmentation thresholds and

Carry out the second round of segmentation and reconstruction processing on the two XML documents F _s1,2 and F _s1,2 generated in step 2.4). After processing, four "well-formed" documents F _s2,1 and F _{s2, 2} , F _s2,3 , F _s2,4 , the size is equal to T _f1,1 , T _f1,1 , T _f1,2 , T _f1,2 respectively, and so on, set 2 ^n-1 segmentation thresholds

Carry out the n-th round of segmentation and reconstruction processing on 2 ^n-1 XML documents F _s(m-1),k ,k=1,…,2 ^n-1 , and get 2 ⁿ "well-formed" documents after processing XML document F _sn,k , k=1,…,2 ⁿ , so far, the size of all documents does not exceed the required memory of the terminal device, that is, T _sn,k ≤T,k=1,…,2 ⁿ , not Then the segmentation condition is satisfied, and the segmentation and reconstruction process ends.

本发明的技术构思为：使用流化处理器和快速分页器，将这个大型XML文档分成多个不超过终端设备内存限制的小XML文档，然后再通过XSLT转换器完成转换工作。经过上述分析与研究，我们可以绘出分页读取大数据的过程图，如图2所示。The technical idea of the present invention is: use the streaming processor and the fast pager to divide this large XML document into multiple small XML documents that do not exceed the memory limit of the terminal device, and then complete the conversion work through the XSLT converter. After the above analysis and research, we can draw a process diagram of reading big data in pages, as shown in Figure 2.

由上图可知，该过程是在XML文件服务器、流式分页服务器和发布服务器三者的共同作用下完成的，其中XML文件服务器用来存储并发送异构的XML文档；流式分页服务器负责完成XML文档的分割、重构和转换功能；发布服务器的用途是向PC、手持设备、智能手机等终端设备发送相应标准格式的小XML文档，只要这些终端设备向XSLT转换器提出交换数据的请求，并提供自身的转换样式表。As can be seen from the figure above, this process is completed under the joint action of the XML file server, the streaming paging server and the publishing server, where the XML file server is used to store and send heterogeneous XML documents; the streaming paging server is responsible for completing Segmentation, reconstruction and conversion of XML documents; the purpose of the publishing server is to send small XML documents in corresponding standard formats to terminal devices such as PCs, handheld devices, and smart phones, as long as these terminal devices request the XSLT converter to exchange data, And provide its own conversion style sheet.

本发明的有益效果主要表现在：（1）该系统使用流式分页服务器对数据进行了分页处理，使得数据能够以“页”的形式被发送到终端设备上，解决了终端用户设备因其内存、显示屏尺寸和规格等限制而无法正确获取或显示数据的问题；The beneficial effects of the present invention are mainly manifested in: (1) The system uses the streaming paging server to paginate the data, so that the data can be sent to the terminal device in the form of "page", which solves the problem of , Display size and specifications and other limitations make it impossible to obtain or display data correctly;

（2）该系统可以满足多个用户，即多种不同终端或多个具有不同格式标准的同种终端，同时请求访问文件服务器上的XML文档的要求，灵活性高、适用面广。(2) The system can meet the requirements of multiple users, that is, multiple different terminals or multiple terminals of the same type with different format standards, to request access to XML documents on the file server at the same time, with high flexibility and wide application.

（3）该系统可以使用流式分页服务器将任意大小的XML文档分页转换成终端设备可接收和显示的格式和大小，即，系统可应用于任意大小的XML文档，具有普遍适用性。(3) The system can use the streaming paging server to convert XML documents of any size into a format and size that can be received and displayed by terminal devices. That is, the system can be applied to XML documents of any size and has universal applicability.

（4）相较于传统的只是实现了分段处理器功能的分页发布系统，该系统不仅增加了重构器，使得输出的XML文档都是“形式良好”的，提高了解析和转换的可靠性和容错性；还增加了快速分页器，使得分页处理更加快速、高效。(4) Compared with the traditional paging publishing system that only realizes the function of the segment processor, the system not only adds a reconstructor, but makes the output XML documents "well-formed", improving the reliability of parsing and conversion and fault tolerance; a fast pager is also added to make paging processing faster and more efficient.

本系统却在此基础上增加了重构器，目的是为了将这些非“形式良好”小XML文档全部转换成“形式良好”的小XML文档，使得下一步的解析和转换成为可能；On this basis, this system adds a reconstructor, the purpose is to convert all these non-"well-formed" small XML documents into "well-formed" small XML documents, so that the next step of parsing and conversion is possible;

本系统还增加了快速分页器，它采用的是一种基于二叉树的分割算法，可以对上述所有“形式良好”的小XML文档进行快速的分割和重构处理，总体上加快了分页处理速度，即能够更加快速地达到预期的分页效果。This system also adds a fast pager, which uses a binary tree-based segmentation algorithm, which can quickly segment and reconstruct all the above-mentioned "well-formed" small XML documents, and generally speed up the paging process. That is, the expected paging effect can be achieved more quickly.

（5）系统中的任意一个环节，包括流化处理器、快速分页器和XSLT转换器，输出的所有XML文档都是“形式良好”的，可以独立地进行解析和格式转换。其中，任何一个文档发生处理或传输错误不会影响其它文档的正确解析和格式转换，即，有效地将错误隔绝在相应的错误文档中，而不会扩散到其它正确的文档中；一旦文档的错误更正，就可以结合其它正确文档形成一份完整的文档，有效地提高了解析、格式转换的容错性。(5) All XML documents output by any link in the system, including streaming processor, fast pager and XSLT converter, are "well-formed" and can be parsed and converted independently. Among them, the processing or transmission error of any one document will not affect the correct parsing and format conversion of other documents, that is, the error will be effectively isolated in the corresponding error document, and will not spread to other correct documents; Error correction can be combined with other correct documents to form a complete document, which effectively improves the fault tolerance of parsing and format conversion.

（6）由于终端用户设备接收到的所有XML文档也是“形式良好”的，因此，它可以对这些文档进行快速的拆分、解析和组装。(6) Since all XML documents received by end-user devices are also "well-formed", it can quickly split, parse and assemble these documents.

附图说明Description of drawings

图1是XSLT转换原理图。Figure 1 is a schematic diagram of XSLT transformation.

图2是大数据的分页读取过程图。Figure 2 is a diagram of the paged reading process of big data.

图3是流式分页发布系统框图。Fig. 3 is a block diagram of the streaming pagination publishing system.

图4是流化处理器的组成框图。Fig. 4 is a block diagram of the streaming processor.

图5是快速分页器的组成框图。Fig. 5 is a block diagram of the fast pager.

图6是流化处理示意图，其中，（a）表示文档大小不超过预先设定的分段读取阈值时的情况，（b）表示文档大小超过预先设定的分段读取阈值时的情况。Figure 6 is a schematic diagram of streaming processing, in which (a) indicates the situation when the document size does not exceed the preset segmented reading threshold, and (b) indicates the situation when the document size exceeds the preset segmented reading threshold .

图7是F_s1和F_s2的重构处理过程的流程图。Fig. 7 is a flowchart of the reconstruction processing procedure of F _s1 and F _s2 .

图8是基于二叉树的快速分页算法的示意图，其中，每个椭圆节点代表相应轮次处理使用的一份XML文档。FIG. 8 is a schematic diagram of a fast paging algorithm based on a binary tree, wherein each ellipse node represents an XML document used in a corresponding round of processing.

图9是基于二叉树的快速分页过程的流程图。FIG. 9 is a flow chart of the fast paging process based on the binary tree.

图10是快速分页处理流程图。Fig. 10 is a flow chart of fast paging processing.

图11是第一轮分割和重构处理流程图。Fig. 11 is a flowchart of the first round of segmentation and reconstruction processing.

具体实施方式Detailed ways

下面结合附图对本发明作进一步描述。The present invention will be further described below in conjunction with the accompanying drawings.

实施例1Example 1

参照图1～图11，一种基于XML的流式分页发布方法，所述发布方法包括以下步骤：Referring to Figures 1 to 11, an XML-based streaming paging publishing method, the publishing method includes the following steps:

（1）流化处理过程：(1) Fluidization process:

（2）快速分页处理过程：(2) Fast paging process:

若XML文档F_s0,1的大小远远超过终端设备的需求内存T，即T_s0,1＞＞T，则对XML文档F_s0,1进行第一轮的分割和重构处理，生成两个“形式良好”的新XML文档F_s1,1和F_s1,1；接下来再对新生成的两个文档F_s1,1和F_s1,2进行判断和第二轮的分割和重构处理，即，若两个新生成的文档F_s1,1和F_s1,2仍满足分割条件：T_s1,1＞＞T且T_s1,2＞＞T，则应同时对这两个文档进行分割和重构处理，生成四个“形式良好”文档F_s2,1、F_s2,2、F_s2,3和F_s2,4，依此类推，反复地判断、分割和重构，直到某一轮分割生成的所有XML文档的大小均不超过终端设备的需求内存，分割和重构处理过程结束；If the size of the XML document F _s0,1 far exceeds the required memory T of the terminal device, that is, T _s0,1 ＞>T, the XML document F _s0,1 will be divided and reconstructed in the first round to generate two "Well-formed" new XML documents F _s1,1 and F _s1,1 ; next, the two newly generated documents F _s1,1 and F _s1,2 are judged and the second round of segmentation and reconstruction processing is performed, That is, if the two newly generated documents F _s1,1 and F _s1,2 still meet the segmentation conditions: T _s1,1 ＞＞T and T _s1,2 ＞＞T, then the two documents should be segmented and Refactoring process, generate four "well-formed" documents F _s2,1 , F _s2,2 , F _s2,3 and F _s2,4 , and so on, repeat judgment, segmentation and reconstruction until a certain round of segmentation The size of all generated XML documents does not exceed the required memory of the terminal device, and the segmentation and reconstruction process ends;

第一、读取XML文档F_s；First, read the XML document F _s ;

1.2）将指针定位到尾部；1.2) Position the pointer to the end;

2.1）读取XML文档F_s0,1；2.1) Read the XML document F _s0,1 ;

2.2）设定一个分割阈值

2.2) Set a segmentation threshold

和

and

实施例2Example 2

参照图1～图11，一种基于XML的流式分页发布系统，所述发布系统包括：Referring to Figures 1 to 11, an XML-based streaming paging publishing system, the publishing system includes:

发布服务器，用于将具有不同标准格式的文档发送给相应的终端设备。Publisher for sending documents with different standard formats to corresponding terminal devices.

本实施例中，流式分页发布系统由流式分页服务器，包括流化处理器、快速分页器、XSLT转换器，和发布服务器两个部分组成，如图3所示，其工作原理是首先通过流式分页服务器对大型XML文档进行分割、重构和转换处理，生成多个“形式良好”的小XML目标文档，这些文档的大小和数据格式分别取决于终端设备提供的可用内存和转换样式表，然后再通过发布服务器将这些小XML文档以“页”的形式发送到相应的终端设备。In this embodiment, the streaming paging publishing system is composed of a streaming paging server, including a streaming processor, a quick pager, an XSLT converter, and a publishing server, as shown in Figure 3. The streaming paging server splits, reconstructs and transforms large XML documents, and generates multiple "well-formed" small XML target documents. The size and data format of these documents depend on the available memory provided by the terminal device and the conversion style sheet, respectively. , and then send these small XML documents to the corresponding terminal devices in the form of "pages" through the publishing server.

流化处理器由分段器和重构器组成，如图4所示，工作的基本思想是先对输入的XML文档的大小进行判断，如果文档不满足预先设定的分段条件，就不做任何处理地将其送往快速分页器；反之，如果文档满足预先设定的分段条件，就对该文档进行分段读取，确保读入数据的大小不超过预先设定的分段读取阈值，然后再使用重构器对存储着上述数据的新XML文档进行重构处理，使得流化处理器输出的每一个XML文档都是形式良好的。The streaming processor is composed of a segmenter and a reconstructor, as shown in Figure 4. The basic idea of the work is to first judge the size of the input XML document. If the document does not meet the preset segmentation conditions, it will not Do any processing and send it to the fast pager; on the contrary, if the document meets the preset segmentation conditions, the document is read in segments to ensure that the size of the read data does not exceed the preset segment read Take the threshold value, and then use the reconstructor to reconstruct the new XML document storing the above data, so that each XML document output by the streaming processor is in good form.

快速分页器由分割器和重构器组成，如图5所示，工作的基本思想是先对输入的XML文档的大小进行判断，如果文档满足预先设定的分割条件，就对该文档进行二叉树式分割处理，然后再对此次分割处理生成的所有新XML文档进行重构处理。需要强调的是，此处的重构处理过程与流化处理器中的相同。The fast pager is composed of a splitter and a reconstructor, as shown in Figure 5. The basic idea of the work is to first judge the size of the input XML document, and if the document meets the preset splitting conditions, it will perform a binary tree analysis on the document. Then, all the new XML documents generated by this splitting process are reconstructed. It is important to emphasize that the refactoring process here is the same as in the streaming processor.

XSLT转换器的功能是对照终端设备提供的转换样式表，将输入文档转换成其它标准格式的文档输出；发布服务器的功能是将具有不同标准格式的文档发送给相应的终端设备。The function of the XSLT converter is to convert the input document into a document output in other standard formats according to the conversion style sheet provided by the terminal device; the function of the publishing server is to send the documents with different standard formats to the corresponding terminal device.

流化处理过程是利用流化处理器来实现的，对于每个大型的XML输入文档，流化处理器都先要对其大小进行判断。如果文档大小不超过预先设定的分段读取阈值，即T_s≤T_m，那么流化处理器将不做任何处理地将该文档输出到快速分页器，如图6a所示。反之，如果文档大小超过预先设定的分段读取阈值，即T_s>T_m，那么流化处理器将对该文档进行分段和重构处理，处理后将生成两个形式良好的XML文档，一个大小约等于T_m，另一个大小约等于T_s-T_m，前者将被送到快速分页器，而后者将被送往流化处理器进行再一次地判断、分段和重构处理，如图6b所示。The stream processing process is implemented by using a stream processor, and for each large XML input document, the stream processor must first judge its size. If the size of the document does not exceed the preset segment read threshold, that is, T _s ≤ T _m , the streaming processor will output the document to the fast pager without any processing, as shown in FIG. 6 a . Conversely, if the document size exceeds the preset segment read threshold, that is, T _s >T _m , then the streaming processor will segment and reconstruct the document, and two well-formed XMLs will be generated after processing Documents, one with a size approximately equal to T _m , and the other with a size approximately equal to T _s -T _m , the former will be sent to the fast pager, while the latter will be sent to the streaming processor for judgment, segmentation and reconstruction again processing, as shown in Figure 6b.

分段处理过程：假设现在有一个XML文档F_s，大小为T_s，流化处理器可用的最大内存为T_m，如果XML文档非常大，远远大于流化处理器可用的最大内存，即T_s＞＞T_m，或者说，满足条件：T_s≈pT_m，p＞＞1，那么我们将使用流化处理器中的分段器对它进行分段处理，具体包括以下三个步骤：Segmented processing: Assume that there is an XML document F _s with a size of T _s and the maximum available memory of the streaming processor is T _m . If the XML document is very large, it is much larger than the maximum available memory of the streaming processor, that is T _s ＞＞T _m , or in other words, satisfy the condition: T _s ≈pT _m , p＞＞1, then we will use the segmenter in the stream processor to segment it, including the following three steps :

第一、读取XML文档F_s。First, read the XML document F _s .

第二、设定分段读取阈值T_d=T_m。Second, set the segmentation reading threshold T _d =T _m .

重构处理过程比较复杂，它包括初步重构和再重构两个步骤，其实现流程如图7所示，具体实施方式如下：The refactoring process is relatively complicated, and it includes two steps of initial refactoring and re-refactoring. The implementation process is shown in Figure 7, and the specific implementation methods are as follows:

1.1）读取第三步骤生成的XML文档F_s1。1.1) Read the XML document F _s1 generated in the third step.

1.2）将指针定位到尾部。1.2) Position the pointer to the tail.

1.3）向前搜索结束标签的开始标记“</”，并记录其位置为L₁。1.3) Search forward for the start tag "</" of the end tag, and record its position as L ₁ .

1.4）从L₁开始向后搜索相应的结束标记“>”，并记录其位置为L₂。此时会有两种可能：1.4) Search backward from L ₁ for the corresponding end tag ">", and record its position as L ₂ . At this point there are two possibilities:

反之，如果未能搜到结束标记“>”，这时应将指针定位到L₁处，再一次执Conversely, if the end tag ">" cannot be found, the pointer should be positioned at L ₁ , and execute again

行步骤1.3），得到新的L₁值后，再执行步骤1.4），获取新的L₂值，这个新的L₂值Execute step 1.3) to get the new L ₁ value, then perform step 1.4) to get the new L ₂ value, this new L ₂ value

才是该情况下结束标记的真正位置。is the real position of the closing tag in this case.

1.5）将因分割而导致的不完整数据从F_s1的尾部移到F_s2的首部。此后将得到已删除不完整数据的XML文档F_s1，和已添加不完整数据的XML文档F_s2。1.5) Move the incomplete data caused by segmentation from the end of F _s1 to the head of F _s2 . Thereafter, an XML document F _s1 with incomplete data deleted and an XML document F _s2 with incomplete data added will be obtained.

1.6.3）统计列表中的不同元素及其个数，根据形式良好的XML文档的开始标签和结束标签应配套，空标签要关闭的原理，个数为奇数的元素，第一个元素除外，即为因分割而缺失的祖先节点标签名字，将这些标签名字放入另一个列表中。值得注意的是，获取这些节点标签名字时，应保持其在列表中的原有顺序不变。1.6.3) Count the different elements and their numbers in the list. According to the principle that the start tag and end tag of a well-formed XML document should match, and empty tags should be closed, elements with an odd number, except the first element, That is, the label names of the ancestor nodes that were missing due to the split, put these label names into another list. It is worth noting that when obtaining the label names of these nodes, their original order in the list should be kept unchanged.

快速分页处理过程是利用快速分页器来实现的，它的核心是二叉树式分割算法，一种特殊的并行分割算法，其相对于传统的迭代分割算法，具有更加快速、高效的优点。The fast paging process is realized by using the fast pager. Its core is the binary tree segmentation algorithm, a special parallel segmentation algorithm, which is faster and more efficient than the traditional iterative segmentation algorithm.

基于二叉树的快速分页算法：基于二叉树的快速分页算法的基本原理是先将一个XML文档分割并重构成两个新文档，然后再将这两个新XML文档各自分割并重构成另外两个文档，即共得到四个XML文档，依此类推，当得到的所有文档不再满足分割条件时，分割结束。下面我们将用一个简图来说明上述算法的基本原理，如图8所示。Fast paging algorithm based on binary tree: The basic principle of fast paging algorithm based on binary tree is to first split and reconstruct an XML document into two new documents, and then split and reconstruct the two new XML documents into two other documents, that is, a total of Four XML documents are obtained, and so on, when all obtained documents no longer satisfy the segmentation conditions, the segmentation ends. Below we will use a simple diagram to illustrate the basic principle of the above algorithm, as shown in Figure 8.

为了进一步解释基于二叉树的快速分页算法的基本原理，我们可以把图8转换成图9所示的文档变化图，并将文档命名为F_sn,k，其中下小标n代表分割和重构处理的轮次，如n=1说明文档是在第一轮分割和重构处理中生成的；下小标k代表每一轮分割和重构处理后生成的文档的序号，它的取值范围取决于轮次n，即k=1,…,2ⁿ，例如当n=1时，k的取值为1和2，F_s1,1代表第一轮分割和重构处理后生成的第一个文档，F_s1,2代表第一轮分割和重构处理后生成的第二个文档。In order to further explain the basic principle of the binary tree-based fast paging algorithm, we can convert Figure 8 into the document change graph shown in Figure 9, and name the document F _sn,k , where the subscript n represents segmentation and reconstruction processing The number of rounds, such as n=1 indicates that the document is generated in the first round of segmentation and reconstruction processing; the subscript k represents the serial number of the document generated after each round of segmentation and reconstruction processing, and its value range depends on For round n, that is, k=1,...,2 ⁿ , for example, when n=1, the value of k is 1 and 2, and F _s1,1 represents the first one generated after the first round of segmentation and reconstruction document, F _s1,2 represents the second document generated after the first round of segmentation and reconstruction processing.

通过对图9的研究与分析，我们可以绘出快速分页处理的整体流程，如图10所示。由于XML文档F_s0,1的大小远远超过终端设备的需求内存T，即T_s0,1＞＞T，因此我们需要对它进行第一轮的分割和重构处理，生成两个“形式良好”的新XML文档F_s1,1和F_s1,2。接下来我们需要对新生成的两个文档F_s1,1和F_s1,2进行判断和第二轮的分割和重构处理，即，若两个新生成的文档F_s1,1和F_s1,2仍满足分割条件：T_s1,1＞＞T且T_s1,2＞＞T，则应同时对这两个文档进行分割和重构处理，生成四个“形式良好”文档F_s2,1、F_s2,2、F_s2,3和F_s2,4，依此类推，反复地判断、分割和重构，直到某一轮分割生成的所有XML文档的大小均不超过终端设备的需求内存，分割和重构处理过程结束。Through the research and analysis of Figure 9, we can draw the overall flow of fast paging processing, as shown in Figure 10. Since the size of the XML document F _s0,1 far exceeds the required memory T of the terminal device, that is, T _s0,1 >>T, we need to perform the first round of segmentation and reconstruction on it to generate two "well-formed "'s new XML documents F _s1,1 and F _s1,2 . Next, we need to judge the two newly generated documents F _s1,1 and F _s1,2 and perform the second round of segmentation and reconstruction processing, that is, if the two newly generated documents F _s1,1 and F _{s1, 2} still meet the segmentation conditions: T _s1,1 ＞＞T and T _s1,2 ＞＞T, then the two documents should be segmented and reconstructed at the same time, and four "well-formed" documents F _s2,1 , F _s2,2 , F _s2,3 and F _s2,4 , and so on, judge, split and reconstruct repeatedly, until the size of all XML documents generated by a round of split does not exceed the required memory of the terminal device, split and the refactoring process ends.

二叉树式分割及重构处理过程：假设现在有一个XML文档F_s0,1，其大小为T_s0,1，终端设备的需求内存为T，如果XML文档非常大，远远大于终端设备的需求内存，即T_s0,1＞＞T，或者说，满足条件：T_s0,1≈qT，q＞＞1，那么我们将使用快速分页器，包括分割器和重构器，对这个XML文档进行第一轮的分割和重构处理，该处理的流程如图11所示，具体包括以下四个步骤：Binary tree splitting and reconstruction process: Suppose there is an XML document F _s0,1 whose size is T _s0,1 and the required memory of the terminal device is T. If the XML document is very large, it is far larger than the required memory of the terminal device , that is, T _s0,1 ＞＞T, or in other words, satisfy the condition: T _s0,1 ≈qT, q＞＞1, then we will use the fast pager, including splitter and reconstructor, to process the XML document for the first time A round of segmentation and reconstruction processing, the process flow of which is shown in Figure 11, specifically includes the following four steps:

2.1）读取XML文档F_s0,1；2.1) Read the XML document F _s0,1 ;

2.2）设定一个分割阈值 2.2) Set a segmentation threshold

此时第一轮的分割和重构处理过程结束。接下来我们将设定两个分割阈值

和

对步骤4）生成的两个XML文档F_s1,1和F_s1,2进行第二轮的分割和重构处理，处理后将得到四个“形式良好”的文档F_s2,1、F_s2,2、F_s2,3、F_s2,4，大小分别约等于T_f1,1、T_f1,1、T_f1,2、T_f1,2。依此类推，设定2^n-1个分割阈值

对2^n-1个XML文档F_s(n-1),k,k=1,…,2^n-1进行第n轮的分割和重构处理，处理后将得到2ⁿ个“形式良好”的XML文档F_sn,k,k=1,…,2ⁿ。至此，所有文档的大小都不超过终端设备的需求内存，即T_sn,k≤T,k=1,…,2ⁿ，不再满足分割条件，分割和重构处理结束。At this point, the first round of segmentation and reconstruction processing ends. Next we will set two segmentation thresholds

and

Carry out the second round of segmentation and reconstruction processing on the two XML documents F _s1,1 and F _s1,2 generated in step 4), and four "well-formed" documents F _s2,1 and F _{s2, 2} , F _s2,3 , F _s2,4 are approximately equal to T _f1,1 , T _f1,1 , T _f1,2 , T _f1,2 in size respectively. And so on, set 2 ^n-1 segmentation thresholds

Carry out the nth round of segmentation and reconstruction processing on 2 ^n-1 XML documents F _s(n-1),k ,k=1,…,2 ^n-1 , after processing, 2 ⁿ "well-formed" documents will be obtained The XML document F _sn,k ,k=1,…,2 ⁿ . So far, the size of all documents does not exceed the required memory of the terminal device, that is, T _sn,k ≤T,k=1,...,2 ⁿ , the segmentation condition is no longer satisfied, and the segmentation and reconstruction process ends.

Claims

1. an XML-based streaming paging release method, characterized in that: the release method may further comprise the steps:

(1) Fluidization process:

For each large XML input document, the streaming processor first needs to judge its size. If the document size does not exceed the preset segment reading threshold, that is, T _s ≤ T _m , then proceed to step (2) for processing ; Conversely, if the file size exceeds the preset segment read threshold, that is, T _s >T _m , then the streaming processor will segment and reconstruct the document, and will generate two well-formed XML document, one size is equal to T _m , and the other size is equal to T _s -T _m , the former will be sent to step (2) for processing, and the latter will be sent to the streaming processor for judgment, segmentation and reassessment structural processing;

(2) Fast paging process:

If the size of the XML document F _s0,1 far exceeds the required memory T of the terminal device, that is, T _s0,1 ＞>T, the XML document F _s0,1 will be divided and reconstructed in the first round to generate two "Well-formed" new XML documents F _s1,1 and F _s1,2 ; next, the two newly generated documents F _s1,1 and F _s1,2 are judged and the second round of segmentation and reconstruction processing is performed, That is, if the two newly generated documents F _s1,1 and F _s1,2 still meet the segmentation conditions: T _s1,1 ＞＞T and T _s1,2 ＞＞T, then the two documents should be segmented and Refactoring process, generate four "well-formed" documents F _s2,1 , F _s2,2 , F _s2,3 and F _s2,4 , and so on, repeat judgment, segmentation and reconstruction until a certain round of segmentation The size of all generated XML documents does not exceed the required memory of the terminal device, and the segmentation and reconstruction process ends;

(3) XSLT conversion process: compare the conversion style sheet provided by the terminal device, convert the input document into a document output in other standard formats;

(4) Publishing process: Send documents with different standard formats to corresponding terminal devices.

2. An XML-based stream pagination publishing method according to claim 1, characterized in that: in the step (1), the stream processing process includes segment processing and reconstruction processing, and the segment Segment processing:

Assume that there is an XML document F _s with a size of T _s , and the maximum available memory of the streaming processor is T _m . If the XML document is very large, it is far greater than the maximum available memory of the streaming processor, that is, T _s >>T _m , or in other words, satisfy the condition: T _s ≈ pT _m , p>>1, then use the segmenter in the fluidization processor to segment it, which specifically includes the following three steps:

First, read the XML document F _s ;

Second, set the segmentation reading threshold T _d =T _m ;

Third, segment processing to generate two non-"well-formed" XML documents:

①F _s1 , the size is recorded as T _s1 , T _s1 =T _d =T _m ;

②F _s2 , the size is recorded as T _s2 , T _s2 =T _s -T _d =T _s -T _m .

3. a kind of XML-based streaming pagination publishing method as claimed in claim 2, is characterized in that: described refactoring process comprises two steps of primary refactoring and refactoring, and the process is as follows:

1.1) Read the XML document F _s1 generated in the third step;

1.2) Position the pointer to the end;

1.3) Search forward for the start tag "</" of the end tag, and record its position as L ₁ ;

1.4) Search backward from L ₁ for the corresponding end tag ">", and record its position as L ₂ , there are two possibilities at this time:

If the end tag ">" can be found, then the value of _L2 is the position value of the tag;

Conversely, if the end tag ">" cannot be found, then the pointer should be positioned at L ₁ , and step 1.3) should be performed again, and after obtaining the new value of L ₁ , perform step 1.4) again to obtain a new L ₂ value, this new L ₂ value is the real position of the end mark in this case;

1.5) Move the incomplete data caused by segmentation from the end of F _s1 to the head of F _s2 ; after that, the XML document F _s1 with incomplete data deleted and the XML document F _s2 with incomplete data added will be obtained;

1.6) Get the label names of all ancestor nodes missing due to the split:

1.6.1) Set the read flag flag=True, when the length of the read value is equal to or equal to 0, flag=False;

1.6.2) Read the XML document F _s1 with deleted incomplete data generated in step 1.5), and add the tag name of each node, except for the empty tag name, to the list;

1.6.3) Count the different elements and their numbers in the list. According to the principle that the start tag and end tag of a well-formed XML document should match, and empty tags should be closed, elements with an odd number, except the first element, That is, the label names of the ancestor nodes that are missing due to splitting, put these label names into another list, and when obtaining the label names of these nodes, keep their original order in the list unchanged;

1.7) Construct the two XML documents F _s1 and F _s2 generated in step 1.5) into a well-formed XML document:

1.7.1) Use the elements in the list obtained in step 1.6.3) as end tags and add them in reverse order to the end of the XML document F _s1 with incomplete data deleted;

1.7.2) Add the elements in the list obtained in step 1.6.3), except the first element, as start tags to the header of the XML document F _s2 to which incomplete data has been added;

1.7.3) Add the first element in the list obtained in step 1.6.3), that is, the declaration tag name, as a start tag to the header of the XML document F _s2 obtained in step 1.7.2).

4. An XML-based streaming paging publishing method according to any one of claims 1 to 3, characterized in that: in the step (2), it is assumed that there is now an XML document F _s0,1 whose size is T _s0,1 , the required memory of the terminal device is T, if the XML document is very large, far larger than the required memory of the terminal device, that is, T _s0,1 ＞＞T, or in other words, satisfy the condition: T _s0,1 ≈qT, q>>1, the first round of segmentation and reconstruction processing is performed on this XML document, and the process is as follows:

2.1) Read the XML document F _s0,1 ;

2.2) Set a segmentation threshold

2.3) Carry out the first round of segmentation processing. This round includes a segmentation process. After segmentation, two non-"well-formed" XML documents are obtained:

①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 =T _f0,1 ;

②F _s1,2 , the size is recorded as T _s1,2 , T _s1,2 =T _f0,1 ;

2.4) Refactor the two XML documents F _s1,1 and F _s1,2 generated in step 2.3), and after processing, two new "well-formed" XML documents will be obtained:

①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 ≈ _{T f0,1} ;

②F _s1,2 , the size is recorded as T _s1,2 , T _s1,2 ≈ _{T f0,1} ;

At this point, the first round of segmentation and reconstruction processing is over, and then we will set two segmentation thresholds

and

5. An XML-based streaming paging release system, characterized in that: the release system includes:

Streaming processor: For each large XML input document, the streaming processor first needs to judge its size. If the document size does not exceed the preset segment reading threshold, that is, T _s ≤ T _m , then the This document is processed by the fast pager; on the contrary, if the document size exceeds the preset threshold for segmented reading, that is, T _s >T _m , then the streaming processor will segment and reconstruct the document, processing Two well-formed XML documents will be generated, one with size T _m and the other with size T _s -T _m , the former will be sent to the fast pager for processing, and the latter will be sent to the streaming processor for another ground judgment, segmentation and reconstruction processing;

Fast pager: If the size of the XML document F _s0,1 far exceeds the required memory T of the terminal device, that is, T _s0,1 >>T, then perform the first round of segmentation and reconstruction processing on the XML document F _s0,1 , generate two _" well-formed" new XML documents F _s1,1 and F _s1,2 ; then judge and second round of _segmentation and Refactoring processing, that is, if two newly generated documents F _s1,1 and F _s1,2 still meet the segmentation conditions: T _s1,1 ＞＞T and T _s1,2 ＞＞T, then these two documents should be processed at the same time The document is divided and reconstructed to generate four "well-formed" documents F _s2,1 , F _s2,2 , F _s2,3 and F _s2,4 , and so on, repeatedly judging, dividing and reconstructing until The size of all XML documents generated by a certain round of segmentation does not exceed the required memory of the terminal device, and the segmentation and reconstruction process ends;

XSLT Converter: It is used to convert the input document into document output in other standard formats according to the conversion style sheet provided by the terminal device;

Publisher: used to send documents with different standard formats to corresponding terminal devices.

6. The streaming paging publishing system based on XML as claimed in claim 5, characterized in that: the streaming processor includes a segmenter and a reconstructor, wherein,

In the segmenter, it is assumed that there is an XML document F _s with a size of T _s , and the maximum available memory of the streaming processor is T _m . If the XML document is very large, it is far greater than the maximum available memory of the streaming processor , that is, T _s ＞＞T _m , or in other words, satisfy the condition: T _s ≈pT _m , p＞＞1, then use the segmenter in the fluidization processor to segment it, specifically including the following three steps :

First, read the XML document F _s ;

Second, set the segmentation reading threshold T _d =T _m ;

Third, segment processing to generate two non-"well-formed" XML documents:

①F _s1 , the size is recorded as T _s1 , T _s1 =T _d =T _m ;

②F _s2 , the size is recorded as T _s2 , T _s2 =T _s -T _d =T _s -T _m .

7. A kind of XML-based streaming paging publishing system as claimed in claim 6, characterized in that: in the reconstructor, the processing process includes two steps of preliminary reconstruction and reconstruction again, and the process is as follows:

1.1) Read the XML document F _s1 generated in the third step;

1.2) Position the pointer to the end;

1.6) Get the label names of all ancestor nodes missing due to the split:

8. An XML-based streaming paging publishing system according to any one of claims 5 to 7, wherein the fast pager includes a splitter and a reconstructor, wherein,

In the splitter, it is assumed that there is an XML document F _s0,1 whose size is T _s0,1 and the required memory of the terminal device is T. If the XML document is very large, it is far larger than the required memory of the terminal device, that is, T _s0,1 ＞＞T, or in other words, satisfy the condition: T _s0,1 ≈qT, q＞＞1, perform the first round of segmentation and reconstruction processing on this XML document, the process is as follows:

2.1) Read the XML document F _s0,1 ;

2.2) Set a segmentation threshold

①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 =T _f01, ;

②F _s1,2 , the size is recorded as T _s1,2 , T _s1,2 =T _f0,1 ;

2.4) The two XML documents F _s1,1 and F _s1,2 generated in step 2.3) are reconstructed in the reconstructor, and two new "well-formed" XML documents will be obtained after processing:

①F _s1,1 , the size is recorded as T _s1,1 , T _s1,1 ≈ _{T f0,1} ;

②F _s1,2 , the size is recorded as T _s1,2 , T _s1,2 ≈ _{T f0,1} ;

and

Carry out the second round of segmentation and reconstruction processing on the two XML documents F _s1,1 and F _s1,2 generated in step 2.4). After processing, four "well-formed" documents F _s2,1 and F _{s2, 2} , F _s2,3 , F _s2,4 , the size is equal to T _f1,1 , T _f1,1 , T _f1,2 , T _f1,2 respectively, and so on, set 2 ^n-1 segmentation thresholds Perform the nth round of segmentation and reconstruction processing on 2 ^n-1 XML documents F _s(n-1),k ,k=1,…,2 ^n-1 , and get 2 ⁿ “well-formed” documents after processing XML document F _sn,k ,k=1,…,2 ⁿ , so far, the size of all documents does not exceed the required memory of the terminal device, that is, T _sn,k ≤T,k=1,…,2 ⁿ , not Then the segmentation condition is satisfied, and the segmentation and reconstruction process ends.