CN101763353A - XML air indexing method in air broadcast - Google Patents

XML air indexing method in air broadcast Download PDF

Info

Publication number
CN101763353A
CN101763353A CN200810207689A CN200810207689A CN101763353A CN 101763353 A CN101763353 A CN 101763353A CN 200810207689 A CN200810207689 A CN 200810207689A CN 200810207689 A CN200810207689 A CN 200810207689A CN 101763353 A CN101763353 A CN 101763353A
Authority
CN
China
Prior art keywords
index
xml
dataguide
aerial
xml document
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN200810207689A
Other languages
Chinese (zh)
Inventor
孙未未
覃泳睿
余平
张卓瑶
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN200810207689A priority Critical patent/CN101763353A/en
Publication of CN101763353A publication Critical patent/CN101763353A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an XML air indexing method in an air broadcast, which mainly comprises a basic index structure based on DataGuide, an index merging technology and an index pruning technology. The whole process is as follows: firstly, building a corresponding DataGuide index for each XML document in a database, then merging the DataGuide into a complete indexing based on the whole database, and finally pruning the complete indexing and deleting the datanode which is not visited by any request in a request queue. By the merging and pruning technologies, the invention can reduce the size of the indexing to about 0.1% to 0.5% of a data document, and thereby greatly reducing the tuning time of users and improving the performance of a broadcasting system.

Description

The aerial indexing means of XML in a kind of radio broadcasting
Technical field
The invention belongs to the crossing domain of wireless data broadcasting and XML document index, be specifically related to the aerial indexing means of XML in a kind of wireless network.
Background technology
Along with popularizing and the development of wireless network of wireless device, wireless mobile computing has become a very active research field, and wireless data broadcasting is because its distinctive scalability and the characteristic that makes full use of bandwidth become the data transferring method that extensively adopts in the present mobile radio network.
How fast access is two subject matters studying in the wireless data broadcasting with saving the energy.Accordingly, two major parameters of estimating broadcast behavior are arranged: access time (Access Time) and tuning period (TuningTime)
Access time be met from user's request of filing a request elapsed time.
Tuning period is the user needs to keep the state of intercepting between the request of filing a request is met time.
Aerial index is one of hot issue of studying in the wireless data broadcasting.By add the index information of designation data in broadcast channel, the user needs elapsed time before can extrapolating system of distance broadcasting desired data.In stand-by period, the user can switch to battery saving mode at this section, switches to the activity pattern data download when data arrive again.Can effectively reduce user's tuning period in this way, reduce user's power consumption accordingly, therefore aerial index is the technology that extensively adopts in the wireless Data Broadcasting System.Research to aerial index technology at present mainly concentrates on index structure and index distribution aspect.On the other hand, the XML data become in the infosystem data storage and exchanged form the most widely just gradually.(Microsoft, Oracle IBM) have added support to the XML data in the product of oneself in nearly all large-scale IT company.A large amount of XML document (HTML etc.) that exist are becoming the part of people's daily life in the network.The XML data broadcasting is the effective ways of XML document transmission in the wireless environment, the aerial index technology of XML in a kind of just wireless data broadcasting involved in the present invention under the On-demand pattern.
Wireless data broadcasting mainly contain two kinds of patterns, be respectively periodic broadcasting pattern and On-demand broadcast mode.
Periodic broadcasting pattern (Broadcast Mode/Push-based Mode): server end is broadcasted with a kind of definite scheduling mode the data of storage in the broadcast channel cocycle, user side only need be intercepted on broadcast channel, in case find the data of own needs then download to this locality.
On-demand broadcast mode (On-demand Mode/Pull-based Mode): the user sends to server end by up channel with oneself request, and server is put into request queue with user's request and according to the request situation arrangement scheduling of formation.In case the data of request are broadcasted, and then should ask to delete from formation.Simultaneously the user intercepts on broadcast channel, in case find the data of own needs then download to this locality.Under the on-demand pattern, the request that server is collected the user is stored in the local request queue, and can broadcast according to the visit of the request in request queue situation (access frequency of different requests etc.).
Aerial index structure in the existing wireless broadcast system is mostly only at traditional structural data (as clauses and subclauses in the database).For the inquiry of these class data normally based on the key assignments (key) of data.For an inquiry, only corresponding usually result.And XML document not only comprises the needed data message of user, also comprises the structural information of storing these data.For the inquiry of XML document, require the structural information of matched data usually, inquire about as XPath etc.For an inquiry, usually can corresponding a plurality of results, have only and think just that when the result of all couplings obtains inquiry has been satisfied.Based on above difference, traditional index structure can't be applicable to document XML.
Main thought at the indexing means of XML document in the existing work is to set up index (being generally the structural information of extracting XML document) for each XML document, these index is inserted in the broadcast channel in some way broadcasts with data then.(, being published in 2007 the 177th phases the 9th of Information Sciences magazine volume 1931-1953 page or leaf) as the method that people such as Yon Dohn Chung propose.Yet this method has some significant disadvantages.At first, the index that produces of this mode can not allow the user that all XML document in the database are had a complete understanding.Under broadcast mode, the user just can judge whether to have downloaded all results that need after need waiting for a complete broadcast cycle, has increased access time and tuning period greatly.In the on-demand environment, because the content of each broadcast cycle and inequality, even can cause monitor user's indefinite duration.Secondly, this mode is all set up index to each XML document, and this makes total index block size very big.And in radio broadcasting, the increase of index size must cause the increase of access time.At last, information and these index blocks that each index block of this method has only carried a document are distributed in the whole broadcast cycle with data block, need be when the user monitors through the switching state between data and the index block of being everlasting, the consumption that this frequent switching brings can not be ignored.
Based on above analysis, before the present invention, still there is not the aerial indexing means of a kind of effective XML.
Summary of the invention
The object of the present invention is to provide the aerial indexing means of XML under a kind of On-demand pattern, not only can produce a complete index, and can make the size of this index taper to a very little degree based on all XML document in the database.
Purpose of the present invention realizes by following method and step:
At first be the index of each XML document foundation based on DataGuide.DataGuide is as the configuration index of XML, it is routing information based on node in the XML document tree construction, by the XML document tree is carried out yojan, make that the tree construction after the yojan is only safeguarded routing information inequality, and do not keep other nodal information with same paths.For example have two under documentation root node/a to be all/child node of b, then the DataGuide of the document only keeps one/b child node.DataGuide has used for reference the thought that the NFA in the automaton theory is transformed into DFA, can obtain the configuration index of the most succinct support simple path inquiry of original XML document when not losing document structure information.By setting up the DataGuide index for original XML document, owing to kept the routing information that was occurred, therefore, only utilize these DataGuide index, just can correctly finish the coupling between XML inquiry and the original XML document.
Secondly, the DataGuide of all documents of building is merged, form one and be based upon the complete index on all documents in the database.This complete index structure has two big advantages, first, can inquire about all documents in the database behind the complete index of user's download, find the structured documents of all couplings, can make the user understand own needed collection of document like this, make the user can judge that whether the request of oneself is met, and has solved existing index may cause user's infinite wait under the on-demand broadcast mode problem.The second, by merge can further delete have between the different DataGuide index that identical path prefix causes redundancy, further reduced the size of index.For example two XML document have identical path/a/b, then only can keep a paths in the index after merging, have reduced the redundance of index.
Generally, the total number of documents in the database is quite huge, but is not that each document all can be asked by the user.When user's request kind limited and can be serviced device when obtaining, can find not have the meaning broadcasted for the data that these requests can not have access to.For example under the on-demand broadcast mode, server is collected all user's requests and is stored in the local request queue, that is to say the set that has comprised in the request queue that current all users ask.And the data that broadcasting time also only needs these requests of broadcasting to have access to get final product.Therefore we have further adopted the index technology of prunning branches, with all can be not accessed by the user to branch from complete index, delete.Because the user who submits in a period of time asks limited amount, the data item of required visit only accounts for the sub-fraction of all number of documents in the database probably, so technology of prunning branches can reduce the size of index greatly.
Below each step of setting up index is further described:
Step 1: for each XML document in the database makes up DataGuide, concrete grammar if this node contains an above same child node, only keeps one of them for begin all nodes of each level of sequential scanning from root node.
Step 2: the DataGuide that will obtain in step 1 merges, and sets up a complete index based on all documents in the database.Concrete grammar joins in the complete index gradually for the root node from each DataGuide begins all nodes that order travels through each level the DataGuide.Node n for current scanning i, if complete index in n iBe positioned at same level and and n iAll nodes with identical father's node all with n iInequality, then with n iAnd all descendants's nodes join under corresponding father's node of complete index.Otherwise continue other nodes among the DataGuide.Repeat above operation up to having traveled through all DataGuide, the index that form this moment is exactly the complete index after merging.
Step 3: complete index is carried out beta pruning.Concrete grammar is for mating all requests in the known request set with complete index, with all vertex ticks that the match is successful in the index is " requested ", begin to scan whole complete index from root node then, if certain node and all descendants's nodes thereof all are not marked as " requested " in the index, then delete this node and all descendants's nodes thereof.
The aerial indexing means of XML under the On-demand pattern that the present invention proposes, it is characterised in that and utilizes index folding and index technology of prunning branches, not only can produce a complete index based on all XML document in the database, and the size that can make this index tapers to a very little degree, and the two-layer index technology that adopts index structure to separate with the broadcast data offset information can further reduce the size of index.
Description of drawings
Fig. 1 is all XML document and the pairing DataGuide of each XML document in the illustrative data base.
Fig. 2 is the example that index merges effect.
Fig. 3 is the example of index beta pruning effect.
Fig. 4 is that different number of documents is for merging Effect on Performance.
Fig. 5 is that different number of documents is for the beta pruning Effect on Performance.
Specific embodiments
Below in conjunction with specific embodiment, the present invention is further elaborated.Embodiment only is used for the present invention is done explanation rather than limitation of the present invention.
Embodiment 1
Have 5 XML document in the database, be respectively d 1, d 2, d 3, d 4And d 5, the structure of each document is as shown in accompanying drawing 1 (a).
Step 1: for each XML document in the database makes up DataGuide, the result is shown in accompanying drawing 1 (b).With document d 1Be example, d 1Root node a under two identical child node b are arranged, but at d 1Among the corresponding DataGuide, root node a has only kept a child node b.
Step 2: five DataGuide in the accompanying drawing 1 (b) are merged, and the result as shown in Figure 2.Node serial number in the index after sign in its bracket is represented to merge, the numbering of the document of each node matching is presented at the below of this node.Can find document d 1, d 2, d 3, d 5Have/path of a/b, and only kept one/a/b path in the index after merging, eliminated the redundancy that the same paths that exists between different document causes.
Step 3: complete index is carried out beta pruning.For example current complete index is shown in accompanying drawing 3 (a).Wherein the node table indicating of grey is designated as the node (can obtain according to known request set coupling) of " requested ".To the result of complete index beta pruning shown in accompanying drawing 3 (b).The node of noting white in the accompanying drawing 3 (a) is deleted, has only kept the node that may be requested to have access to.
Embodiment 2
Present embodiment adopts Java 1.4.2 translation and compiling environment, carries out simulation test on the platform of Linux 2.6.Experimental subjects is defined 500 XML document of News Industry Text Format (NITF) DTD, XMLGenerator by IBM generates and 100 to 1000 inquiries (adopting XPath to represent), adopt YFilter[30] in the XPath maker generate, the inquiry MAXPATHLEN of acquiescence is set to 10.Concrete result such as accompanying drawing 4 and accompanying drawing 5.
Accompanying drawing 4 is depicted as the influence that the inquiry number is combined effect.We estimate the effect of merging with merger ratio, and merger ratio is defined as follows:
Figure G200810207689XD0000061
As can be seen along with inquiring about increasing of number, merger ratio slowly descends from accompanying drawing 4, and this is because increasing routing information needs to keep, and can correctly match corresponding XML document to be broadcast to guarantee to inquire about.When the inquiry number was 100, merger ratio was 94%; And when the inquiry number was 1000, merger ratio was 91%.Average merger ratio among the figure is about 93%.Therefore, adopt folding can eliminate redundant path structure information well, greatly reduced the size of aerial index.
Accompanying drawing 5 is depicted as the several influences to the beta pruning effect of inquiry, and we estimate the beta pruning effect with the beta pruning rate, and the beta pruning rate is defined as follows:
Figure G200810207689XD0000062
From definition as can be seen, the beta pruning rate is high more, illustrate that the index that beta pruning falls is many more, and the effect of beta pruning is just good more; Otherwise the beta pruning rate is low more, and the effect of beta pruning is just poor more.As can be seen, along with increasing of inquiry number, the beta pruning rate slowly descends from accompanying drawing 5, and this is because increasing routing information needs to keep, and can correctly match corresponding XML document to be broadcast with the inquiry that guarantees MU.When MU inquiry number was 100, the beta pruning rate was 63%; And when MU inquiry number was 1000, the beta pruning rate was 24%.Average beta pruning rate among the figure is about 40%.Therefore, adopt technology of prunning branches can eliminate redundant path structure information preferably, reduced the size of aerial index.
Method provided by the present invention can taper to the size of index 0.1%~0.5% of data file by merging and technology of prunning branches.And the size of index is equivalent to the size of whole data file usually in the aerial index technology of existing XML.This shows that the method can effectively reduce the size of aerial index, has improved the performance of broadcast system greatly.

Claims (4)

1. the aerial indexing means of XML in the radio broadcasting is characterized in that by the base index structure based on DataGuide, and index folding and index technology of prunning branches three parts are formed.
2. the aerial indexing means of XML in the radio broadcasting according to claim 1, it is characterized in that described DataGuide base index structure is that the simple structure information of extracting each XML document in the database is set up index, in the tree structure of each XML document, only keep one for the same label that is positioned at identical level, delete all the other all redundant nodes and respective paths.
3. the aerial indexing means of XML in the radio broadcasting according to claim 1, it is characterized in that described index folding is that DataGuide index with each XML document correspondence merges, one of final formation is based upon the complete index on all documents in the database, all redundant individual paths of deletion in merging process, the Data Identification of this path, node place correspondence of storage in each node of index.
4. the aerial indexing means of XML in the radio broadcasting according to claim 1, the information of on the basis of complete index, deleting the branch node that all any requests can not have access to when it is characterized in that described index technology of prunning branches according to known user's request set.
CN200810207689A 2008-12-24 2008-12-24 XML air indexing method in air broadcast Pending CN101763353A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810207689A CN101763353A (en) 2008-12-24 2008-12-24 XML air indexing method in air broadcast

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810207689A CN101763353A (en) 2008-12-24 2008-12-24 XML air indexing method in air broadcast

Publications (1)

Publication Number Publication Date
CN101763353A true CN101763353A (en) 2010-06-30

Family

ID=42494517

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810207689A Pending CN101763353A (en) 2008-12-24 2008-12-24 XML air indexing method in air broadcast

Country Status (1)

Country Link
CN (1) CN101763353A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150346A (en) * 2013-02-07 2013-06-12 南京邮电大学 Wireless sensor network data compression method based on extensible markup language
CN105898714A (en) * 2016-06-01 2016-08-24 武汉大学 Real-time on-demand data broadcast scheduling system and method based on XML
CN108228171A (en) * 2017-12-29 2018-06-29 武汉益模科技股份有限公司 A kind of project tree query and display methods based on tree structure

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150346A (en) * 2013-02-07 2013-06-12 南京邮电大学 Wireless sensor network data compression method based on extensible markup language
CN103150346B (en) * 2013-02-07 2016-08-24 南京邮电大学 A kind of wireless sensor network data compression method based on extensible markup language
CN105898714A (en) * 2016-06-01 2016-08-24 武汉大学 Real-time on-demand data broadcast scheduling system and method based on XML
CN108228171A (en) * 2017-12-29 2018-06-29 武汉益模科技股份有限公司 A kind of project tree query and display methods based on tree structure

Similar Documents

Publication Publication Date Title
Barbara Mobile computing and databases-a survey
CN109558450A (en) A kind of automobile remote monitoring method and apparatus based on distributed structure/architecture
Xu et al. An error-resilient and tunable distributed indexing scheme for wireless data broadcast
Zhu et al. Distributed skyline retrieval with low bandwidth consumption
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN101291304A (en) Transplantable network information sharing method
CN107682416B (en) Broadcast-storage network-based fog computing architecture content collaborative distribution method and application system
CN101753534A (en) Zoning adaptive network system based on cluster server and building method
CN102279880A (en) Method and system for updating cache in real time
CN101819584A (en) Light weight intelligent webpage content analysis method
CN101763353A (en) XML air indexing method in air broadcast
CN101149734A (en) Mobile terminal network browser and network browsing method
Chung et al. An indexing method for wireless broadcast XML data
CN101895550B (en) Cache accelerating method for compatibility of dynamic and static contents of internet website
CN103699556A (en) Digital local chronicle information system for compiling local chronicle and geographical information
WO2005038614A2 (en) System and method for facilitating asynchronous disconnected operations for data access over a network
Sun et al. Two-tier air indexing for on-demand XML data broadcast
Guo et al. Design and implementation of real-time management system architecture based on GraphQL
CN102867058B (en) A kind of space keyword search method under wireless data broadcasting environment
Waluyo et al. Global index for multi channel data dissemination in mobile databases
Mahdizadeh et al. Ranking of components and characteristics of a smart city in the 22nd metropolitan area of Tehran
CN101685470B (en) Query statistic-based guidance searching method for P2P system
Li et al. Location management in cellular mobile computing systems with dynamic hierarchical location databases
Waluyo et al. Indexing schemes for multichannel data broadcasting in mobile databases
Tripathi et al. Mobile Computing and Databases

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20100630