CN102184239B - Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode - Google Patents

Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode Download PDF

Info

Publication number
CN102184239B
CN102184239B CN 201110125815 CN201110125815A CN102184239B CN 102184239 B CN102184239 B CN 102184239B CN 201110125815 CN201110125815 CN 201110125815 CN 201110125815 A CN201110125815 A CN 201110125815A CN 102184239 B CN102184239 B CN 102184239B
Authority
CN
China
Prior art keywords
node
document
xml
fragment
document fragment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN 201110125815
Other languages
Chinese (zh)
Other versions
CN102184239A (en
Inventor
孙未未
吴晶晶
刘鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN 201110125815 priority Critical patent/CN102184239B/en
Publication of CN102184239A publication Critical patent/CN102184239A/en
Application granted granted Critical
Publication of CN102184239B publication Critical patent/CN102184239B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention belongs to the technical field of XML (Extensive Makeup Language) radio data broadcast, particularly relates to an access probability based document fragmenting method in an XML radio data broadcast mode. The method comprises the following steps of: calculating weight of each element node by a server according to access probability of each element node in an XML document set; selecting the element node with the maximum weight, fragmenting a document fragment using the element node as the root node to form a new XML document and adding the document into the XML document set; repeating the steps until the number of the documents in the XML document set reaches a specified threshold; and constructing an index structure and scheduling the XML document set, and broadcasting the fragmented XML document set.

Description

Under the XML wireless data broadcasting pattern based on the document sharding method of access probability
Technical field
The invention belongs to XML wireless data broadcasting technical field, be specifically related to the document sharding method under the XML wireless data broadcasting pattern.
Background technology
Radio broadcasting is one of master data access module in the wireless environment, have energy-conservation, improve the characteristics such as bandwidth availability ratio, the data of broadcasting can satisfy the demand of a large number of users simultaneously.In data broadcast, server sends information by common signal channel, and the mobile subscriber intercepts this channel, and in time downloads interested information.From user's angle, wireless channel can be regarded as the carrier of storage data, and different from disk, the data in the wireless channel can only sequential access.Traditionally, the index of measurement wireless Data Broadcasting System performance mainly contains two:
(1) access time (AT:Access Time), refers to from the user files a request, downloaded required whole results' time to the user from channel.AT has determined user's stand-by period.
(2) tuning period (TT:Tuning Time) refers to during finishing a request of access, and the user keeps intercepting the T.T. of channel.In the time of monitor channel not, the user can enter dormant state, comes conserve energy.Therefore, TT determines user's Energy Expenditure Levels.In the XML data broadcast, adopt equally AT and TT as the criterion of broadcast behavior.
Aerial index technology and dispatching technique are two important research contents in the traditional data broadcasting, aerial index technology notifies the interested data of mobile subscriber when to arrive by increase some auxiliary index informations in broadcast data, thereby reduces access time AT.And dispatching technique is mainly used in arranging the content of broadcast cycle, i.e. frequency and the time of each data item appearance.According to the difference of scheduling method, wireless data broadcasting can be divided into following two quasi-modes:
(1) periodic broadcasting pattern (Broadcast Mode/Push-based Mode):
Server end with the data of storage with a kind of definite scheduling mode in broadcast channel cocycle broadcasting, the user only needs intercept at broadcast channel, in case find own interested data then download to this locality.
(2) On-demand broadcast mode (On-demand Mode/Pull-based Mode):
The user by up channel with the request of oneself explicit send to server, server determines the order of broadcast data item according to user's request situation.
Traditional wireless data broadcasting is absorbed in discussion " flat data (flat data) ", band structure information not, and the inquiry that the user proposes also is based on field value, XML data and path query that can't the tenaculum structural information.And XML is applied in the wireless environment more and more widely as the standard of exchanges data.Therefore, the XML data broadcast becomes a new study hotspot.
In the periodic broadcasting pattern of traditional data broadcasting, data item has different access probabilities, and access probability has represented the popular degree of data item.In the XML dispatching technique, the minimum data unit of scheduling is the XML document fragment.Document segmentation method in the existing dispatching technique is mainly used under the On-demand environment, main thought is: the situation of the concrete inquiry of known users, the branch of each XML document is divided into a plurality of set, and the document branch in the identity set forms a new XML document fragment.Because each document has kept the complete routing information from the root node to the spliting node, has repeated to comprise identical path between the XML document that this method is partitioned into, these redundant informations have reduced the performance of broadcast system.On the other hand, because the user asks knownly under the On-demand pattern, these dividing methods are difficult to be applied under the periodic broadcasting pattern.
Summary of the invention
The object of the invention is to the access probability according to the XML data, proposed a kind of XML document sharding method under wireless data broadcasting periodic broadcasting pattern.
The main thought of the XML document sharding method that the present invention proposes is: under the periodic broadcasting pattern, access probability according to the node element in the XML data, calculate the weighted value of each node element, then the node element of the weight selection value maximum of recurrence is as the root node of new XML document fragment, this fragment is separated from the XML document of source, then upgrade the weighted value of the node element in the XML document of source, until XML document concentrates the quantity of fragment to reach the threshold value of regulation.Concrete steps are as follows:
(1) the XML data are carried out initialization:
To all elements node in the XML data, according to each
Figure 375421DEST_PATH_IMAGE001
Access probability
Figure 785674DEST_PATH_IMAGE002
, calculate
Figure 899124DEST_PATH_IMAGE003
,
Figure 512770DEST_PATH_IMAGE004
, then according to formula (1) Determining Weights value
Figure 737078DEST_PATH_IMAGE005
Figure 736258DEST_PATH_IMAGE006
…….(1)
In above-mentioned formula (1),
Figure 20609DEST_PATH_IMAGE007
Root node for document i; Be node
Figure 648085DEST_PATH_IMAGE001
Access probability;
Figure 767351DEST_PATH_IMAGE003
Equal
Figure 957024DEST_PATH_IMAGE008
, wherein
Figure 545262DEST_PATH_IMAGE009
Be node
Figure 111372DEST_PATH_IMAGE001
Ancestor node;
Figure 350724DEST_PATH_IMAGE004
Equal
Figure 711298DEST_PATH_IMAGE010
, wherein
Figure 285368DEST_PATH_IMAGE009
Be node
Figure 655169DEST_PATH_IMAGE001
Descendants's node;
Figure 483448DEST_PATH_IMAGE011
For with
Figure 280503DEST_PATH_IMAGE001
Size for the XML document fragment of root node;
Figure 577754DEST_PATH_IMAGE012
For with node The size of corresponding dummy node.This weighted value represent with the document fragment take this node as root node from the XML document of source independent burst out after, the degree that the Radio Data System performance improves.
Calculate in order to optimize following burst, each XML document fragment i is distributed the array D of a correspondence i, deposit the node element pointer among the document fragment i in D iIn, according to the weighted value of element
Figure 699611DEST_PATH_IMAGE005
From big to small ordering.
And, construct raft H, the pointer of all document fragment i is deposited in the heap, according to the corresponding array D of document i i[0] the node element weighted value that points to is adjusted heap, so that heap top element is the text pointer at the node element place of weighted value maximum in whole XML data.
(2) burst goes out a new XML document fragment: select the node element of weighted value maximum, with the document fragment burst take it as root node out, forms a new XML document, and it is concentrated that the document is added XML document.Specifically, operation is as follows:
(a) obtain to comprise the document i of greatest member node weights value according to the heap top element of raft H, then D i[0] node element that points to is the root node of new segment, will separate from the source document fragment with the document fragment and form new document fragment i '.On the position of this node, place a dummy node in the source document fragment, the bookmark name of dummy node is the root node bookmark name of new segment, adds the ID of attribute-bit new segment in dummy node;
(b) weighted value of the node element (comprising the whole node elements in the new segment) among renewal document fragment i ' and the source document i is to array D iAnd D I 'Sort from big to small according to weighted value;
(c) weights that will pile the top element replace with the D after the ordering iThe weighted value of the node element that [0] points to, and H adjusted;
(d) pointer element that will point to document fragment i ' is inserted among the H.
(3) repeated for (2) step, until number of documents reaches defined threshold T in the document sets.The span of threshold value T can be determined according to experimental result, changes the value of T, and AT and TT also can change thereupon, choose rational threshold value according to experimental result.
(4) utilize existing index technology and dispatching technique, organize broadcasted content, this content is periodically broadcasted away.
(5) client protocol: portable terminal is downloaded index, obtains the document fragment ID of matching request and the airtime of document fragment according to index information.After the user has downloaded information needed, by the information of dummy node, the XML document fragment is reduced.Concrete operations are as follows:
(a) intercept channel and download index, obtain ID and the airtime thereof of the document fragment of matching request according to index;
(b) enter park mode until required data reach, then intercept the channel downloading data;
(c) for the XML document fragment that downloads to, if dummy node is arranged in the fragment, then read the fragment ID that the bookmark name of dummy node and attribute tags are described, and the root node of the document fragment that represents with this ID replaces this dummy node, thereby obtain original XML document fragment.
Sharding method of the present invention can make system performance is improved a lot, and performance is better in the situation that access probability tilts.Not only significantly improved user's TT, AT also increases.
Description of drawings
XML document set and the node element weighted value information of Fig. 1 server end.
The initialized data structure of Fig. 2 and storage condition.
Fig. 3 defined threshold T=1,
Figure 667567DEST_PATH_IMAGE012
Burst result in=0 the situation.Document d wherein 2Do not change.
Data structure behind Fig. 4 burst and storage condition.
Fig. 5 sharding method of the present invention, along with the increase of minimum query depth on the AT(a of system) and the impact (b) of TT.
Fig. 6 sharding method of the present invention, along with // and/increase of * probability is on the AT(a of system) and the impact (b) of TT.
Fig. 7 sharding method of the present invention, along with the increase of number of documents on the AT(a of system) and the impact (b) of TT.
Embodiment
The specific implementation process of the inventive method is described below in conjunction with example.
1, the XML data are carried out initialization:
(a) as shown in Figure 1, the set of the XML document of server end is { d 1, d 2.P represents the access probability of node element, and S represents that node element is the XML document clip size of root node, and W represents the weighted value that calculates according to formula (1), and in this example, the size of dummy node is 0;
(b) the data storage condition as shown in Figure 2, array corresponding to each document be according to the weighted value ordering of node element, document d 1The node element of middle weighted value maximum is c, and its weighted value is 2.64; Document d 2The node element of middle weighted value maximum is j, and its weighted value is 0.67.Then comprise 2 elements among the raft H, and heap top element is for pointing to document d 1Pointer.
2, XML document burst process:
(a) obtain to point to document d by heap top element 1Pointer, document d 1Corresponding first element of array then is that the XML document fragment of root is separated with the c node for pointing to the pointer of c node, forms new document d 3, the burst result as shown in Figure 3;
(b) at source document d 1In the position of node element c place dummy node, such as document d among Fig. 3 1Middle dotted line node is dummy node;
(c) recomputate document d 1And d 3The weighted value of all elements node, as shown in Figure 3.Be document d 3Distribute array, to document d 1And d 3According to the ordering of node element weighted value, the result as shown in Figure 4 again for corresponding array element.Document d 1The node element of middle weighted value maximum is d, and its weighted value is 0.43; Document d 2The node element of middle weighted value maximum is j, and its weighted value is constant; Document d 3The node element of middle weighted value maximum is h, and its weighted value is 0.67.Because document d 1In the weight limit value upgraded, so H readjusts, heap top element is for pointing to document d 2Pointer;
(d) will point to document d 3Pointer insert among the H, the data structure behind the final burst and storage condition are as shown in Figure 4.
3, repeated for (2) step, until number of documents reaches defined threshold T in the document sets.Threshold value T is generally selected according to actual conditions, in the present example, and optional T=1, so the burst process finishes.
4, utilize existing index technology and dispatching technique, organize broadcasted content, with the collection of document behind the burst, as shown in Figure 3, broadcasting is gone out.
5, client protocol:
(a) intercept channel and download index, obtain ID and the airtime thereof of the document fragment of matching request according to index;
(b) for the XML document fragment that downloads to, if dummy node is arranged in the fragment, then read the fragment ID that the bookmark name of dummy node and attribute tags are described, and the root node of the document fragment that represents with this ID replaces this dummy node, thereby obtain original XML document fragment.If user's request is/a/c that then the user only need download document d 3If user's request is/a then to need to download document d 1With document d 3, document d 1In comprise dummy node c, the user is according to the attribute information in the dummy node, the ancestor node in this dummy node position is document d as can be known 3Root node, therefore, with document d 1With document d 3Be reduced into the front document d as shown in Figure 1 of burst 1.
Verify that below by experimental simulation slicing algorithm that the present invention describes is to the raising situation of broadcast system performance.
The experimental situation of this experiment of paper: the java1.6.0 translation and compiling environment is adopted in this experiment, carries out at the platform of WinXP.Experimental subjects is News Industry Text Format(NITF) the defined XML document of DTD, by the XMLGenerator generation of IBM.Adopt the inquiry of the XPath maker generation user among the YFilter.The access probability of XML data element adopts the mode of 10000 XPath Query Results of statistics to obtain, and along with the variation of XPath query type, the access probability of node element distributes and also changes thereupon.
Some parameters in the simulated experiment can see the following form (1):
Title Explanation Default value
D q The minimum query depth of XPath inquiry 4
Prob * and // probability that in XPath inquiry, occurs 0.1
N d The quantity of XML document 1000
Table (1)
Fig. 5 has shown, when minimum query depth D q When changing, slicing algorithm is to the raising degree of system performance.As can be seen from the figure, TT has had large increase, and this is because with the XML document burst, so that a part of portable terminal is only downloaded interested data slot, but not whole XML document.Along with D q Increase, the branching selection rate of XPath query requests reduces, namely the access probability degree of tilt of node element increases, finally causing more, the node element of minority has higher access probability.Therefore TT along with D q Increase and reduce, and the ratio of TT also is reduced to 4% from 10% before the TT behind the burst and the burst.
AT among Fig. 5 behind the burst has also reduced.On the one hand, burst is so that some access probabilities are 0 fragment separates, and these data can not be scheduled in the broadcast channel.On the other hand, in the many data item lexical analysis in the traditional data broadcasting, the significantly minimizing of TT also so that AT reduce thereupon.
Fig. 6 has shown ProbIn the situation about changing, the raising of system performance.Along with ProbIncrease, the branching selection rate of XPath increases, it is smooth that the access probability of node element also is tending towards, so TT increases.The ratio of TT is reduced to 12% from 3% before TT behind the burst and the burst.In conjunction with the result among Fig. 5, the slicing algorithm that this patent is described has good performance in the situation that the node element access probability tilts.AT among Fig. 6 behind the burst has also reduced.
Fig. 7 has shown that sharding method was on the situation that affects of TT and AT when number of documents changed.TT increases with the increase of number of documents, and this is because of the increase along with number of documents, the inquiry of more document branch match user, and the ratio of TT maintains 9.6% ~ 9.8% always before the TT behind the burst and the burst, the performance of AT is also than the height before the burst behind the burst.Therefore, the sharding method of the present invention's description has good adaptability.
By above-mentioned experimental analysis, can find out that the sharding method that the present invention describes improves a lot to system performance, performance is better in the situation that access probability tilts.Not only significantly improved user's TT, AT also increases.

Claims (2)

  1. Under the XML wireless data broadcasting pattern based on the document sharding method of access probability, it is characterized in that concrete steps are as follows:
    (1) the XML data are carried out initialization:
    To all elements node in the XML data, according to each
    Figure 26486DEST_PATH_IMAGE001
    Access probability , calculate
    Figure 21566DEST_PATH_IMAGE003
    , , then according to formula (1) Determining Weights value
    Figure 90202DEST_PATH_IMAGE005
    :
    Figure 443823DEST_PATH_IMAGE006
    …….(1)
    In the formula, Root node for document i;
    Figure 706494DEST_PATH_IMAGE002
    Be node
    Figure 201804DEST_PATH_IMAGE001
    Access probability;
    Figure 460747DEST_PATH_IMAGE003
    Equal
    Figure 746235DEST_PATH_IMAGE008
    , wherein
    Figure 342301DEST_PATH_IMAGE010
    Be node
    Figure 131266DEST_PATH_IMAGE011
    Ancestor node;
    Figure 92268DEST_PATH_IMAGE012
    Equal
    Figure 163255DEST_PATH_IMAGE013
    , wherein
    Figure 969537DEST_PATH_IMAGE015
    Be node
    Figure 878587DEST_PATH_IMAGE016
    Descendants's node;
    Figure 2011101258159100001DEST_PATH_IMAGE018
    For with
    Figure 2011101258159100001DEST_PATH_IMAGE019
    Size for the XML document fragment of root node; For with node
    Figure 2011101258159100001DEST_PATH_IMAGE022
    The size of corresponding dummy node;
    (2) burst goes out a new XML document fragment: select the node element of weighted value maximum, with the document fragment burst take it as root node out, forms a new XML document, and it is concentrated that the document is added XML document;
    (3) repeated for (2) step, until number of documents reaches defined threshold T in the document sets;
    (4) utilize existing index technology and dispatching technique, organize broadcasted content, this content is periodically broadcasted away;
    (5) client protocol: portable terminal is downloaded index, obtains the document fragment ID of matching request and the airtime of document fragment according to index information; After the user has downloaded information needed, by the information of dummy node, the XML document fragment is reduced; Wherein,
    In the described step (1), distribute the array D of a correspondence for each XML document fragment i i, deposit the node element pointer among the document fragment i in D iIn, according to the weighted value of element
    Figure 2011101258159100001DEST_PATH_IMAGE023
    From big to small ordering;
    Construct raft H, the pointer of all document fragment i is deposited in the heap, according to the corresponding array D of document i i[0] the node element weighted value that points to is adjusted heap, so that heap top element is the text pointer at the node element place of weighted value maximum in whole XML data;
    It is as follows that burst described in the described step (2) goes out the concrete operations of a new XML document fragment:
    (a) obtain to comprise the document i of greatest member node weights value according to the heap top element of raft H, then D i[0] node element that points to is the root node of new segment, will separate from the source document fragment with the document fragment and form new document fragment i '; On the position of this node, place a dummy node in the source document fragment, the bookmark name of dummy node is the root node bookmark name of new segment, adds the ID of attribute-bit new segment in dummy node;
    (b) weighted value of the node element among renewal document fragment i ' and the source document i is to array D iAnd D I 'Sort from big to small according to weighted value;
    (c) weights that will pile the top element replace with the D after the ordering iThe weighted value of the node element that [0] points to, and H adjusted;
    (d) pointer element that will point to document fragment i ' is inserted among the H.
  2. 2. document sharding method according to claim 1 is characterized in that the concrete operations of step (5) are as follows:
    (a) intercept channel and download index, obtain ID and the airtime thereof of the document fragment of matching request according to index;
    (b) enter park mode until required data reach, then intercept the channel downloading data;
    (c) for the XML document fragment that downloads to, if dummy node is arranged in the fragment, then read the fragment ID that the bookmark name of dummy node and attribute tags are described, and the root node of the document fragment that represents with this ID replaces this dummy node, thereby obtain original XML document fragment.
CN 201110125815 2011-05-16 2011-05-16 Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode Expired - Fee Related CN102184239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110125815 CN102184239B (en) 2011-05-16 2011-05-16 Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110125815 CN102184239B (en) 2011-05-16 2011-05-16 Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode

Publications (2)

Publication Number Publication Date
CN102184239A CN102184239A (en) 2011-09-14
CN102184239B true CN102184239B (en) 2013-02-27

Family

ID=44570416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110125815 Expired - Fee Related CN102184239B (en) 2011-05-16 2011-05-16 Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode

Country Status (1)

Country Link
CN (1) CN102184239B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106326295B (en) * 2015-07-01 2021-12-14 中兴通讯股份有限公司 Semantic data storage method and device
CN105898714A (en) * 2016-06-01 2016-08-24 武汉大学 Real-time on-demand data broadcast scheduling system and method based on XML

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100432996C (en) * 2004-12-07 2008-11-12 国际商业机器公司 System, method and program for extracting web page core content based on web page layout
JP4236055B2 (en) * 2005-12-27 2009-03-11 インターナショナル・ビジネス・マシーンズ・コーポレーション Structured document processing apparatus, method, and program
US20080104502A1 (en) * 2006-10-26 2008-05-01 Yahoo! Inc. System and method for providing a change profile of a web page

Also Published As

Publication number Publication date
CN102184239A (en) 2011-09-14

Similar Documents

Publication Publication Date Title
CN1682224B (en) A data network, user terminal and method for providing recommendations
CN104539676B (en) There is provided, obtain the methods, devices and systems of application installation kit
CN104618506A (en) Crowd-sourced content delivery network system, method and device
CN102819584A (en) Interface file display method and system
Zhong et al. Energy-efficient tree-based indexing schemes for information retrieval in wireless data broadcast
Shi et al. Efficient parallel data retrieval protocols with mimo antennae for data broadcast in 4g wireless communications
CN102184239B (en) Access probability based document fragmenting method in XML (Extensive Makeup Language) radio data broadcast mode
CN103841135A (en) File accelerative download method and apparatus
CN112996058A (en) User QoE (quality of experience) optimization method based on multi-unmanned aerial vehicle network, unmanned aerial vehicle and system
CN102118400A (en) Data acquisition method and system
KR100791639B1 (en) Method for energy efficient spatial index using hilbert curve, recording medium thereof and system for energy efficient spatial index using hilbert curve
CN110727738A (en) Global routing system based on data fragmentation, electronic equipment and storage medium
CN108810139B (en) Monte Carlo tree search-assisted wireless caching method
Zhong et al. Evaluation and comparison of various indexing schemes in single-channel broadcast communication environment
CN102075584A (en) Distributed file system and access method thereof
Gao et al. Global optimization for multi-channel wireless data broadcast with AH-tree indexing scheme
Li et al. Searching continuous nearest neighbors in road networks on the air
CN102867058B (en) A kind of space keyword search method under wireless data broadcasting environment
CN102143123A (en) Method for simultaneously realizing downloading and playing, client, server and system
CN101183964B (en) Multi-data item wireless data broadcast scheduling method under demand mode
CN102857875A (en) Multi-data item broadcast scheduling method oriented to hybrid broadcast in mobile computing environment
Qin et al. Cluster-based scheduling algorithm for periodic XML data broadcast in wireless environments
Waluyo et al. Mobile broadcast services with MIMO antennae in 4G wireless networks
He et al. Efficient data retrieval algorithms for multiple requests in MIMO wireless networks
CN101179577B (en) Method of partitioning XML document in wireless data broadcasting demand mode

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20130227

Termination date: 20150516

EXPY Termination of patent right or utility model