CN102184239B - Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode - Google Patents
Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode Download PDFInfo
- Publication number
- CN102184239B CN102184239B CN 201110125815 CN201110125815A CN102184239B CN 102184239 B CN102184239 B CN 102184239B CN 201110125815 CN201110125815 CN 201110125815 CN 201110125815 A CN201110125815 A CN 201110125815A CN 102184239 B CN102184239 B CN 102184239B
- Authority
- CN
- China
- Prior art keywords
- node
- document
- xml
- fragment
- document fragment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention belongs to the technical field of XML (Extensive Makeup Language) radio data broadcast, particularly relates to an access probability based document fragmenting method in an XML radio data broadcast mode. The method comprises the following steps of: calculating weight of each element node by a server according to access probability of each element node in an XML document set; selecting the element node with the maximum weight, fragmenting a document fragment using the element node as the root node to form a new XML document and adding the document into the XML document set; repeating the steps until the number of the documents in the XML document set reaches a specified threshold; and constructing an index structure and scheduling the XML document set, and broadcasting the fragmented XML document set.
Description
Technical field
The invention belongs to XML wireless data broadcasting technical field, be specifically related to the document sharding method under the XML wireless data broadcasting pattern.
Background technology
Radio broadcasting is one of master data access module in the wireless environment, have energy-conservation, improve the characteristics such as bandwidth availability ratio, the data of broadcasting can satisfy the demand of a large number of users simultaneously.In data broadcast, server sends information by common signal channel, and the mobile subscriber intercepts this channel, and in time downloads interested information.From user's angle, wireless channel can be regarded as the carrier of storage data, and different from disk, the data in the wireless channel can only sequential access.Traditionally, the index of measurement wireless Data Broadcasting System performance mainly contains two:
(1) access time (AT:Access Time), refers to from the user files a request, downloaded required whole results' time to the user from channel.AT has determined user's stand-by period.
(2) tuning period (TT:Tuning Time) refers to during finishing a request of access, and the user keeps intercepting the T.T. of channel.In the time of monitor channel not, the user can enter dormant state, comes conserve energy.Therefore, TT determines user's Energy Expenditure Levels.In the XML data broadcast, adopt equally AT and TT as the criterion of broadcast behavior.
Aerial index technology and dispatching technique are two important research contents in the traditional data broadcasting, aerial index technology notifies the interested data of mobile subscriber when to arrive by increase some auxiliary index informations in broadcast data, thereby reduces access time AT.And dispatching technique is mainly used in arranging the content of broadcast cycle, i.e. frequency and the time of each data item appearance.According to the difference of scheduling method, wireless data broadcasting can be divided into following two quasi-modes:
(1) periodic broadcasting pattern (Broadcast Mode/Push-based Mode):
Server end with the data of storage with a kind of definite scheduling mode in broadcast channel cocycle broadcasting, the user only needs intercept at broadcast channel, in case find own interested data then download to this locality.
(2) On-demand broadcast mode (On-demand Mode/Pull-based Mode):
The user by up channel with the request of oneself explicit send to server, server determines the order of broadcast data item according to user's request situation.
Traditional wireless data broadcasting is absorbed in discussion " flat data (flat data) ", band structure information not, and the inquiry that the user proposes also is based on field value, XML data and path query that can't the tenaculum structural information.And XML is applied in the wireless environment more and more widely as the standard of exchanges data.Therefore, the XML data broadcast becomes a new study hotspot.
In the periodic broadcasting pattern of traditional data broadcasting, data item has different access probabilities, and access probability has represented the popular degree of data item.In the XML dispatching technique, the minimum data unit of scheduling is the XML document fragment.Document segmentation method in the existing dispatching technique is mainly used under the On-demand environment, main thought is: the situation of the concrete inquiry of known users, the branch of each XML document is divided into a plurality of set, and the document branch in the identity set forms a new XML document fragment.Because each document has kept the complete routing information from the root node to the spliting node, has repeated to comprise identical path between the XML document that this method is partitioned into, these redundant informations have reduced the performance of broadcast system.On the other hand, because the user asks knownly under the On-demand pattern, these dividing methods are difficult to be applied under the periodic broadcasting pattern.
Summary of the invention
The object of the invention is to the access probability according to the XML data, proposed a kind of XML document sharding method under wireless data broadcasting periodic broadcasting pattern.
The main thought of the XML document sharding method that the present invention proposes is: under the periodic broadcasting pattern, access probability according to the node element in the XML data, calculate the weighted value of each node element, then the node element of the weight selection value maximum of recurrence is as the root node of new XML document fragment, this fragment is separated from the XML document of source, then upgrade the weighted value of the node element in the XML document of source, until XML document concentrates the quantity of fragment to reach the threshold value of regulation.Concrete steps are as follows:
(1) the XML data are carried out initialization:
To all elements node in the XML data, according to each
Access probability
, calculate
,
, then according to formula (1) Determining Weights value
In above-mentioned formula (1),
Root node for document i;
Be node
Access probability;
Equal
, wherein
Be node
Ancestor node;
Equal
, wherein
Be node
Descendants's node;
For with
Size for the XML document fragment of root node;
For with node
The size of corresponding dummy node.This weighted value represent with the document fragment take this node as root node from the XML document of source independent burst out after, the degree that the Radio Data System performance improves.
Calculate in order to optimize following burst, each XML document fragment i is distributed the array D of a correspondence
i, deposit the node element pointer among the document fragment i in D
iIn, according to the weighted value of element
From big to small ordering.
And, construct raft H, the pointer of all document fragment i is deposited in the heap, according to the corresponding array D of document i
i[0] the node element weighted value that points to is adjusted heap, so that heap top element is the text pointer at the node element place of weighted value maximum in whole XML data.
(2) burst goes out a new XML document fragment: select the node element of weighted value maximum, with the document fragment burst take it as root node out, forms a new XML document, and it is concentrated that the document is added XML document.Specifically, operation is as follows:
(a) obtain to comprise the document i of greatest member node weights value according to the heap top element of raft H, then D
i[0] node element that points to is the root node of new segment, will separate from the source document fragment with the document fragment and form new document fragment i '.On the position of this node, place a dummy node in the source document fragment, the bookmark name of dummy node is the root node bookmark name of new segment, adds the ID of attribute-bit new segment in dummy node;
(b) weighted value of the node element (comprising the whole node elements in the new segment) among renewal document fragment i ' and the source document i is to array D
iAnd D
I 'Sort from big to small according to weighted value;
(c) weights that will pile the top element replace with the D after the ordering
iThe weighted value of the node element that [0] points to, and H adjusted;
(d) pointer element that will point to document fragment i ' is inserted among the H.
(3) repeated for (2) step, until number of documents reaches defined threshold T in the document sets.The span of threshold value T can be determined according to experimental result, changes the value of T, and AT and TT also can change thereupon, choose rational threshold value according to experimental result.
(4) utilize existing index technology and dispatching technique, organize broadcasted content, this content is periodically broadcasted away.
(5) client protocol: portable terminal is downloaded index, obtains the document fragment ID of matching request and the airtime of document fragment according to index information.After the user has downloaded information needed, by the information of dummy node, the XML document fragment is reduced.Concrete operations are as follows:
(a) intercept channel and download index, obtain ID and the airtime thereof of the document fragment of matching request according to index;
(b) enter park mode until required data reach, then intercept the channel downloading data;
(c) for the XML document fragment that downloads to, if dummy node is arranged in the fragment, then read the fragment ID that the bookmark name of dummy node and attribute tags are described, and the root node of the document fragment that represents with this ID replaces this dummy node, thereby obtain original XML document fragment.
Sharding method of the present invention can make system performance is improved a lot, and performance is better in the situation that access probability tilts.Not only significantly improved user's TT, AT also increases.
Description of drawings
XML document set and the node element weighted value information of Fig. 1 server end.
The initialized data structure of Fig. 2 and storage condition.
Data structure behind Fig. 4 burst and storage condition.
Fig. 5 sharding method of the present invention, along with the increase of minimum query depth on the AT(a of system) and the impact (b) of TT.
Fig. 6 sharding method of the present invention, along with // and/increase of * probability is on the AT(a of system) and the impact (b) of TT.
Fig. 7 sharding method of the present invention, along with the increase of number of documents on the AT(a of system) and the impact (b) of TT.
Embodiment
The specific implementation process of the inventive method is described below in conjunction with example.
1, the XML data are carried out initialization:
(a) as shown in Figure 1, the set of the XML document of server end is { d
1, d
2.P represents the access probability of node element, and S represents that node element is the XML document clip size of root node, and W represents the weighted value that calculates according to formula (1), and in this example, the size of dummy node is 0;
(b) the data storage condition as shown in Figure 2, array corresponding to each document be according to the weighted value ordering of node element, document d
1The node element of middle weighted value maximum is c, and its weighted value is 2.64; Document d
2The node element of middle weighted value maximum is j, and its weighted value is 0.67.Then comprise 2 elements among the raft H, and heap top element is for pointing to document d
1Pointer.
2, XML document burst process:
(a) obtain to point to document d by heap top element
1Pointer, document d
1Corresponding first element of array then is that the XML document fragment of root is separated with the c node for pointing to the pointer of c node, forms new document d
3, the burst result as shown in Figure 3;
(b) at source document d
1In the position of node element c place dummy node, such as document d among Fig. 3
1Middle dotted line node is dummy node;
(c) recomputate document d
1And d
3The weighted value of all elements node, as shown in Figure 3.Be document d
3Distribute array, to document d
1And d
3According to the ordering of node element weighted value, the result as shown in Figure 4 again for corresponding array element.Document d
1The node element of middle weighted value maximum is d, and its weighted value is 0.43; Document d
2The node element of middle weighted value maximum is j, and its weighted value is constant; Document d
3The node element of middle weighted value maximum is h, and its weighted value is 0.67.Because document d
1In the weight limit value upgraded, so H readjusts, heap top element is for pointing to document d
2Pointer;
(d) will point to document d
3Pointer insert among the H, the data structure behind the final burst and storage condition are as shown in Figure 4.
3, repeated for (2) step, until number of documents reaches defined threshold T in the document sets.Threshold value T is generally selected according to actual conditions, in the present example, and optional T=1, so the burst process finishes.
4, utilize existing index technology and dispatching technique, organize broadcasted content, with the collection of document behind the burst, as shown in Figure 3, broadcasting is gone out.
5, client protocol:
(a) intercept channel and download index, obtain ID and the airtime thereof of the document fragment of matching request according to index;
(b) for the XML document fragment that downloads to, if dummy node is arranged in the fragment, then read the fragment ID that the bookmark name of dummy node and attribute tags are described, and the root node of the document fragment that represents with this ID replaces this dummy node, thereby obtain original XML document fragment.If user's request is/a/c that then the user only need download document d
3If user's request is/a then to need to download document d
1With document d
3, document d
1In comprise dummy node c, the user is according to the attribute information in the dummy node, the ancestor node in this dummy node position is document d as can be known
3Root node, therefore, with document d
1With document d
3Be reduced into the front document d as shown in Figure 1 of burst
1.
Verify that below by experimental simulation slicing algorithm that the present invention describes is to the raising situation of broadcast system performance.
The experimental situation of this experiment of paper: the java1.6.0 translation and compiling environment is adopted in this experiment, carries out at the platform of WinXP.Experimental subjects is News Industry Text Format(NITF) the defined XML document of DTD, by the XMLGenerator generation of IBM.Adopt the inquiry of the XPath maker generation user among the YFilter.The access probability of XML data element adopts the mode of 10000 XPath Query Results of statistics to obtain, and along with the variation of XPath query type, the access probability of node element distributes and also changes thereupon.
Some parameters in the simulated experiment can see the following form (1):
Title | Explanation | Default value |
D q | The minimum query depth of XPath |
4 |
Prob | * and // probability that in XPath inquiry, occurs | 0.1 |
N d | The quantity of XML |
1000 |
Table (1)
Fig. 5 has shown, when minimum query depth
D q When changing, slicing algorithm is to the raising degree of system performance.As can be seen from the figure, TT has had large increase, and this is because with the XML document burst, so that a part of portable terminal is only downloaded interested data slot, but not whole XML document.Along with
D q Increase, the branching selection rate of XPath query requests reduces, namely the access probability degree of tilt of node element increases, finally causing more, the node element of minority has higher access probability.Therefore TT along with
D q Increase and reduce, and the ratio of TT also is reduced to 4% from 10% before the TT behind the burst and the burst.
AT among Fig. 5 behind the burst has also reduced.On the one hand, burst is so that some access probabilities are 0 fragment separates, and these data can not be scheduled in the broadcast channel.On the other hand, in the many data item lexical analysis in the traditional data broadcasting, the significantly minimizing of TT also so that AT reduce thereupon.
Fig. 6 has shown
ProbIn the situation about changing, the raising of system performance.Along with
ProbIncrease, the branching selection rate of XPath increases, it is smooth that the access probability of node element also is tending towards, so TT increases.The ratio of TT is reduced to 12% from 3% before TT behind the burst and the burst.In conjunction with the result among Fig. 5, the slicing algorithm that this patent is described has good performance in the situation that the node element access probability tilts.AT among Fig. 6 behind the burst has also reduced.
Fig. 7 has shown that sharding method was on the situation that affects of TT and AT when number of documents changed.TT increases with the increase of number of documents, and this is because of the increase along with number of documents, the inquiry of more document branch match user, and the ratio of TT maintains 9.6% ~ 9.8% always before the TT behind the burst and the burst, the performance of AT is also than the height before the burst behind the burst.Therefore, the sharding method of the present invention's description has good adaptability.
By above-mentioned experimental analysis, can find out that the sharding method that the present invention describes improves a lot to system performance, performance is better in the situation that access probability tilts.Not only significantly improved user's TT, AT also increases.
Claims (2)
- Under the XML wireless data broadcasting pattern based on the document sharding method of access probability, it is characterized in that concrete steps are as follows:(1) the XML data are carried out initialization:To all elements node in the XML data, according to each Access probability , calculate , , then according to formula (1) Determining Weights value :…….(1)In the formula, Root node for document i; Be node Access probability; Equal , wherein Be node Ancestor node; Equal , wherein Be node Descendants's node; For with Size for the XML document fragment of root node; For with node The size of corresponding dummy node;(2) burst goes out a new XML document fragment: select the node element of weighted value maximum, with the document fragment burst take it as root node out, forms a new XML document, and it is concentrated that the document is added XML document;(3) repeated for (2) step, until number of documents reaches defined threshold T in the document sets;(4) utilize existing index technology and dispatching technique, organize broadcasted content, this content is periodically broadcasted away;(5) client protocol: portable terminal is downloaded index, obtains the document fragment ID of matching request and the airtime of document fragment according to index information; After the user has downloaded information needed, by the information of dummy node, the XML document fragment is reduced; Wherein,In the described step (1), distribute the array D of a correspondence for each XML document fragment i i, deposit the node element pointer among the document fragment i in D iIn, according to the weighted value of element From big to small ordering;Construct raft H, the pointer of all document fragment i is deposited in the heap, according to the corresponding array D of document i i[0] the node element weighted value that points to is adjusted heap, so that heap top element is the text pointer at the node element place of weighted value maximum in whole XML data;It is as follows that burst described in the described step (2) goes out the concrete operations of a new XML document fragment:(a) obtain to comprise the document i of greatest member node weights value according to the heap top element of raft H, then D i[0] node element that points to is the root node of new segment, will separate from the source document fragment with the document fragment and form new document fragment i '; On the position of this node, place a dummy node in the source document fragment, the bookmark name of dummy node is the root node bookmark name of new segment, adds the ID of attribute-bit new segment in dummy node;(b) weighted value of the node element among renewal document fragment i ' and the source document i is to array D iAnd D I 'Sort from big to small according to weighted value;(c) weights that will pile the top element replace with the D after the ordering iThe weighted value of the node element that [0] points to, and H adjusted;(d) pointer element that will point to document fragment i ' is inserted among the H.
- 2. document sharding method according to claim 1 is characterized in that the concrete operations of step (5) are as follows:(a) intercept channel and download index, obtain ID and the airtime thereof of the document fragment of matching request according to index;(b) enter park mode until required data reach, then intercept the channel downloading data;(c) for the XML document fragment that downloads to, if dummy node is arranged in the fragment, then read the fragment ID that the bookmark name of dummy node and attribute tags are described, and the root node of the document fragment that represents with this ID replaces this dummy node, thereby obtain original XML document fragment.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110125815 CN102184239B (en) | 2011-05-16 | 2011-05-16 | Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201110125815 CN102184239B (en) | 2011-05-16 | 2011-05-16 | Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102184239A CN102184239A (en) | 2011-09-14 |
CN102184239B true CN102184239B (en) | 2013-02-27 |
Family
ID=44570416
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201110125815 Expired - Fee Related CN102184239B (en) | 2011-05-16 | 2011-05-16 | Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102184239B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326295B (en) * | 2015-07-01 | 2021-12-14 | 中兴通讯股份有限公司 | Semantic data storage method and device |
CN105898714A (en) * | 2016-06-01 | 2016-08-24 | 武汉大学 | Real-time on-demand data broadcast scheduling system and method based on XML |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100432996C (en) * | 2004-12-07 | 2008-11-12 | 国际商业机器公司 | System, method and program for extracting web page core content based on web page layout |
JP4236055B2 (en) * | 2005-12-27 | 2009-03-11 | インターナショナル・ビジネス・マシーンズ・コーポレーション | Structured document processing apparatus, method, and program |
US20080104502A1 (en) * | 2006-10-26 | 2008-05-01 | Yahoo! Inc. | System and method for providing a change profile of a web page |
-
2011
- 2011-05-16 CN CN 201110125815 patent/CN102184239B/en not_active Expired - Fee Related
Also Published As
Publication number | Publication date |
---|---|
CN102184239A (en) | 2011-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1682224B (en) | A data network, user terminal and method for providing recommendations | |
CN104539676B (en) | There is provided, obtain the methods, devices and systems of application installation kit | |
CN104618506A (en) | Crowd-sourced content delivery network system, method and device | |
CN102819584A (en) | Interface file display method and system | |
Zhong et al. | Energy-efficient tree-based indexing schemes for information retrieval in wireless data broadcast | |
Shi et al. | Efficient parallel data retrieval protocols with mimo antennae for data broadcast in 4g wireless communications | |
CN102184239B (en) | Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode | |
CN103841135A (en) | File accelerative download method and apparatus | |
CN112996058A (en) | User QoE (quality of experience) optimization method based on multi-unmanned aerial vehicle network, unmanned aerial vehicle and system | |
CN102118400A (en) | Data acquisition method and system | |
KR100791639B1 (en) | Method for energy efficient spatial index using hilbert curve, recording medium thereof and system for energy efficient spatial index using hilbert curve | |
CN110727738A (en) | Global routing system based on data fragmentation, electronic equipment and storage medium | |
CN108810139B (en) | Monte Carlo tree search-assisted wireless caching method | |
Zhong et al. | Evaluation and comparison of various indexing schemes in single-channel broadcast communication environment | |
CN102075584A (en) | Distributed file system and access method thereof | |
Gao et al. | Global optimization for multi-channel wireless data broadcast with AH-tree indexing scheme | |
Li et al. | Searching continuous nearest neighbors in road networks on the air | |
CN102867058B (en) | A kind of space keyword search method under wireless data broadcasting environment | |
CN102143123A (en) | Method for simultaneously realizing downloading and playing, client, server and system | |
CN101183964B (en) | Multi-data item wireless data broadcast scheduling method under demand mode | |
CN102857875A (en) | Multi-data item broadcast scheduling method oriented to hybrid broadcast in mobile computing environment | |
Qin et al. | Cluster-based scheduling algorithm for periodic XML data broadcast in wireless environments | |
Waluyo et al. | Mobile broadcast services with MIMO antennae in 4G wireless networks | |
He et al. | Efficient data retrieval algorithms for multiple requests in MIMO wireless networks | |
CN101179577B (en) | Method of partitioning XML document in wireless data broadcasting demand mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20130227 Termination date: 20150516 |
|
EXPY | Termination of patent right or utility model |