CN103488639B - A kind of querying method of XML data - Google Patents

A kind of querying method of XML data Download PDF

Info

Publication number
CN103488639B
CN103488639B CN201210192018.7A CN201210192018A CN103488639B CN 103488639 B CN103488639 B CN 103488639B CN 201210192018 A CN201210192018 A CN 201210192018A CN 103488639 B CN103488639 B CN 103488639B
Authority
CN
China
Prior art keywords
xml
node
layer
xpath
sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210192018.7A
Other languages
Chinese (zh)
Other versions
CN103488639A (en
Inventor
郭少松
包小源
陈薇
王腾蛟
杨冬青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201210192018.7A priority Critical patent/CN103488639B/en
Publication of CN103488639A publication Critical patent/CN103488639A/en
Application granted granted Critical
Publication of CN103488639B publication Critical patent/CN103488639B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The present invention provides the querying method of a kind of XML data, and its step includes: 1) using Native XML mode to store XML data, its storage organization includes: interior nodes layer, the node of storage XML tree, and XML element uses DDE coded system to encode;Leaf node layer, the text data of storage XML tree leaf node;Arrange layer, the inverted index of storage interior nodes layer;2) according to the XPath query statement of input, from the described row's of falling layer, take out the element sequence corresponding with the node of described XPath, and use the vanquished tree to carry out merger sequence;3) XML element after sorting merger carries out stacked and Pop operations in order, obtains Query Result from relief area.The present invention can process with keyword " OR " and the XPath of asterisk wildcard " * ", and has the highest efficiency.

Description

A kind of querying method of XML data
Technical field
The invention belongs to database technical field, relate to storage and the querying method of semi-structured data XML, be specifically related to one Plant the XML data query method that can effectively support XML query language XPath.
Background technology
Owing to increasing application system uses XML to issue and exchange data, the scale of XML data as reference format Drastically expand, in IDC(Internet data center) the nearest a report display issued, the IT department of 500 enterprises that are interviewed In have 29% to use XML document and XML database the most in a large number.The most effectively manage XML data to become in the urgent need to solving Problem certainly.
Quick and precisely search the XPath all coupling elements in XML database, be the core operation of XML query process. Such as, XPath expression formula a: book [title=' XML '] //author [fn=' Jane ' AND ln=' Doe '], this table The node author reaching formula coupling needs to meet: 1) having child node fn, its content is ' Jane ';2) there is a child node Ln, its content is ' Doe ';3) it is the offspring of book node, and book node has the content to be ' the title joint of XML ' Point.
In XML-schema matching process the more typical TurboXPath algorithm for XML data stream having DB2 to develop and The TwigStack algorithm of academia proposition in 2002.
In TwigStack algorithm, each node q on XPath correspond to Tq and Sq.Tq representative element sequence, Q is the tag names on XPath, and Tq is all elements in XML document with q name matching, and the unit in Tq Element arranges according to document sequence.Sq representative element stack, storage and the element of q name matching, the element processed when algorithm is Crossing when closing label of element in stack, in stack, element to be popped.Algorithm only to element operation in Tq, skips unrelated XML unit Element, so the IO efficiency of algorithm is the highest.But TwigStack algorithm can not process two kinds of situations: first is to have asterisk wildcard " * " XPath, such as //a/* [b]/c because TwigStack algorithm uses Interval Coding, though have element a and element b and The level difference 2 of c, but also cannot determine whether element b and c has identical father;Second is that TwigStack algorithm can only be located It is the XPath of XPath, such as //a [bAND the c]/d of the AND ' relation ', but can not process keyword ' OR ' between reason twig, Such as //a [b OR c]/d.
TurboXPath algorithm is the match query algorithm to XML stream (XML stream) that DB2 uses, and has not both had rope Drawing, the most do not encode, the XML element in XML stream arranges according to document sequence, can process keyword easily ' OR ' XPath.TurboXpath function is more sound, but for the XML data in data base, TurboXPath algorithm is from the beginning Scanning XML document to tail, IO cost is very big, particularly with the XML document that those are bigger.
Summary of the invention
It is an object of the invention to for the problems of the prior art, it is provided that the querying method of a kind of new XML data, it is possible to place Reason is with keyword " OR " and the XPath of asterisk wildcard " * ", and has the highest efficiency.
For achieving the above object, the present invention adopts the following technical scheme that
A kind of querying method of XML data, its step includes:
1) using Native XML mode to store XML data, its storage organization includes: interior nodes layer, and storage is according to document The node of the XML tree of sequence arrangement, wherein XML element uses DDE coded system to encode;Leaf node layer, stores XML The text data of leaf nodes;Arranging layer, the inverted index of storage interior nodes layer, each index entry is the unit that tag names is identical The sequence that element is arranged according to document sequence;
2) according to the XPath query statement of input, from the described row's of falling layer, the element sequence corresponding with the node of described XPath is taken out Row, and use the vanquished tree to carry out merger sequence;
3) XML element after sorting merger carries out stacked and Pop operations in order, and obtains Query Result from relief area.
Further, in described interior nodes layer, the information of every record includes: the integer identifiers that is mapped to by namespace node, DDE Coding and node type.
Further, in the described row of falling layer, the information of each element includes: element type, this element in the address of interior nodes layer and DDE encodes.
Further, described interior nodes layer points to described leaf node layer by pointer.
Further, described employing the vanquished tree carries out merger sequence, is that the coding of the DDE to two elements compares, obtains institute Stating relation before and after two elements, and set preceding element as victor, posterior element is the vanquished.
Further, in described XPath, each node q has two data structures: element sequence Tq and stack Sq;Tq is XML With all elements of q name matching in document, and arrange according to document sequence;Sq is used for the element of storage and q name matching, and Carry out stacked and Pop operations.
Further, when stack-incoming operation, only retaining the ancestors of new element in stack, all elements in stack is all that ancestors offspring is closed System.
Further, if element e wants stacked SE, on XPath, the father node of node E is A, then stacked for element e judgement Condition is:
A) SA have the element of chain;Described go out chain refer to the record of the ancestors that are not e from connecting all elements stack Chained list is deleted;
B) chain and the child of the element near stack top are not gone out during e is SA;
C) type of e is identical with the type of E on XPath.
Further, when XPath occurs asterisk wildcard " * ", amplify out three kinds of new axles: the sub-axle of grandfather, absolute ancestors offspring Axle, special ancestors' offspring's axle, and use described three kinds of new axles that the XPath containing asterisk wildcard " * " carries out equivalent rewriting.
The XML data query method of the present invention, solves TwigStack method and can not support with keyword " OR " and lead to Join the XPath problem of symbol " * ";For the query processing of XML data in data base, have as TwigStack method IO efficiency, and in hgher efficiency than TurboXPath method.At present, increasing application system uses XML conduct Data are issued and exchanged to reference format, and the scale of XML data drastically expands, and finance, medical treatment, E-Government, news etc. are led Territory has used the XML standard of each formulation to realize the data exchange between different department, different enterprise, the inventive method Can be widely applied to these fields, realize the effective query to XML data and management efficiently.
Accompanying drawing explanation
Fig. 1 is the flow chart of steps of the XML data query method of the embodiment of the present invention.
Fig. 2 is the Native XML storage mode schematic diagram of the embodiment of the present invention.
Fig. 3 is the schematic diagram of interior nodes layer in Fig. 2.
Fig. 4 is the querying flow figure of right in the embodiment of the present invention //a [//c]/b.
Fig. 5 is the stacked Pop operations schematic diagram of what right in the embodiment of the present invention //a [//c]/b inquired about.
Fig. 6 is the stacked Pop operations schematic diagram of what right in the embodiment of the present invention //a/* [c]/b inquired about.
Detailed description of the invention
Below by specific embodiment, and coordinate accompanying drawing, the present invention is described in detail.
Fig. 1 is the flow chart of the XML data query method of the present invention, and concrete steps include:
1) Native XML mode is used to store the XML data in data base.
The XML data query method of the present invention belongs to overall sprig method of attachment, compared with early structureization connection, overall little Branch interconnection technique can avoid the most invalid intermediate object program.The basis of the inventive method is Native XML storage, to XML Element uses DDE coded system.Native memory mechanism maintains the document sequence of XML element, by the opening of bid of an element Sign physical address and just can be taken off the subdocument with this element as root.DDE coding is used for the common structure relation (ancestral to XML element First offspring, father and son, brother etc.) judge.
The design Storage of the present invention is divided into three layers: interior nodes layer, leaf node layer and arrange layer, as shown in Figure 2.
A) interior nodes layer
The node of XML tree is arranged according to document sequence, is stored in interior nodes layer.Every record of this layer is an XML tree joint Point, the information of every record includes the convenient storage of integer identifiers tagID(that is mapped to by namespace node, conveniently compares), DDE Coding, node type (element, attribute, text) etc..Fig. 3 is the example of a simple interior nodes layer, wherein, (a) For XML tree;B () is the sequential storage corresponding with (a), with "/" beginning for closing label record;" Database " and " 25.00 " Two leaf nodes are here pointer, and actual content is stored in leaf node layer.
The structural relation that XML coding is used to judge between XML element.TwigStack algorithm can not process with wildcard The XPath of symbol " * ", because the Interval Coding that it uses can not judge brother's axle.The present invention uses DDE to encode, and DDE compiles Code has than the benefit of Interval Coding:
The axle DDE that Interval Coding can determine that can judge, and DDE also can determine that brother's axle, and Interval Coding but can not;
DDE coding can support the renewal of XML document well, and i.e. when XML document changes, original coding is not required to more Changing, Interval Coding is not accomplished.
B) leaf node layer
The text data of every record one leaf node of storage, the text data of storage XML tree leaf node.Interior nodes layer has finger Pin points to here, is found the Physical Page at text data place by these pointers.
C) layer is arranged
The row's of falling layer is similar to the inverted index in IR system.The elementary composition sequence that in the row's of falling layer, all tag names are identical, And arrange according to document sequence.In sequence, the information of each element has: it in the address of interior nodes layer, element type, DDE coding Deng.Information according to the row's of falling layer storage just can complete the match query to XPath, in going according to the element address inquired again Node layer obtains the subdocument between element opening and closing label.In the row's of the falling layer shown in Fig. 2, E1, E2, E3 are to represent XML Element information.
2) according to the XPath query statement of input, from the row's of falling layer, the element sequence corresponding with the node on XPath is taken out, and The vanquished tree is used to carry out merger sequence.
For each node q on XPath, there are two data structures: element sequence TqWith stack Sq。TqIt it is XML document In with all elements of q name matching, and TqIn element according to document sequence arrange.SqDuring algorithm is carried out storage with The element of q name matching, a new element is stacked, and the element of those its ancestors non-will be popped.
The method of the present invention is properly termed as " TurboStack " method.Assume that XPath has n node: q1, q2..., qn, The T corresponding with each nodeqi(1 i n) obtains from the row's of falling layer, TqiIn XML element be ordered into.TurboStack The input of method is the XML element according to the arrangement of document sequence, it is therefore desirable to Tq1, Tq2..., TqnThis n element sequence Row carry out merger sequence, n sequence are merged into a sequence according to the arrangement of document sequence, as the input of algorithm.
DDE can be relied on to encode and use the vanquished tree to carry out merger sequence: for coding dde1 and dde2 of two elements, Compare with the two coding, it can be determined that go out relation before and after two elements, and set preceding element as victor, rear Element be the vanquished.
3) element after sorting for merger, carries out stacked and Pop operations in order, obtains Query Result from relief area.
Execution flow process for holistic approach of the present invention shown below, is designated as algorithm 1, as follows to function declaration therein: ConstructStack (q) is that node q sets up stack, and GetStream (q) obtains the element sequence of node q from bottom stores, The MultiMergeSort (XPath) T to nodes all on XPathqCarrying out merger sequence, getPopElement (e) chooses from stack Not being the element of the ancestors of e, match (e) judges whether e can be stacked.
It is specifically described with popping stacked below.
3.1) stacked
To all TqCarry out merger sequence, after making element arrange according to document sequence, it is possible to prepare stacked successively.Element e to enter Stack, is equivalent to encounter the opening of bid label of e in XML document.In stack, the ancestors of e still remain in stack, reason be according to XML tree structure, the label that closes as ancestors' node of e does not the most scan.And those not es stacked prior to e The element of ancestors, their label that closes have passed through, it should pops.
The most stacked ancestors the most only retaining new element, all elements in stack is all ancestors' descendent relationship.The unit in all stacks Element chained list couples together, and this chained list is called Last Push List, is abbreviated as LPL.After new element e is stacked, it is placed in LPL Head position.One new element e, before stacked, starts to be examined in each record E from LPL headi, compile with DDE Code is by EiCompare with e, if EiIt not the ancestors of e, from LPL, delete Ei, it is referred to as chain, otherwise stops comparing.Go out chain mark Will this element and has been popped, but does not really remove from stack.
If e wants stacked SE, on XPath, the upper layer node of E is A, and in algorithm 1, the Rule of judgment of match (e) is:
1.SAIn also have chain element;
If 2. the axle between A and E is father and son's axle, SAIn do not go out chain and the element near stack top is a1, then e must be The child of a1;
The type of 3.e identical with the type of E on XPath (type of E is probably node element or attribute node).
If E is the root node of XPath, can be stacked as long as then meeting the 3rd condition.
Stacked new record comprises four information:
1. element information (tagID, DDE, type);
2. pointer PLPL, point to LPL next record;
3. pointer Pstack, point to SAIn also on LPL and near the record of stack top.
4. matching status position status, is originally false.If this record also meets the XPath structural requirement to it, then it is set to true.
If E is the leaf node of XPath, then status mode bit is initialized as true.
If E is the output node of XPath, e is placed in the outputBuffer of result buffer.
Shown below for stacked execution flow process, it is designated as algorithm 2, as follows to function declaration therein:
Push (e, SE) element e is put into SEStack top, Lappend_head (e, LPL) is placed in element e on the head of chained list LPL Position, Lappend (e, outputBuffer) puts into element e in output buffer outputBuffer.
3.2) pop
When record in stack goes out chain, the P that this is recordedLPLIt is set to NULL, but this record is not popped.
On XPath, except leaf node, other nodes have child node, and the stack of child node is referred to as the sub-stack of father node.Father saves When element in some stack goes out chain, the element of sub-stack is popped.
Node A has two child nodes B and C, A and B to be father and son's axles, A Yu C is ancestors' offspring's axles.Assume present stack SA In have two record a1 and a2, a2 in stack top, be now to a2 to go out chain.Record a2 does not pop, and that pop is SBAnd SC In record.
First have to judge to record whether a2 meets XPath to query node A requirement structurally, the letter of algorithm 3 the most hereafter The process of number matchStructure (e):
A) SBMiddle record b1,b2,……,bnPstackPointer points to a2, SCMiddle record c1,c2,……,cm PstackPointer is also directed to a2.These records have gone out chain, and their matching status position status has obtained going out chain when.
If b) b and c is AND relation, then
a2->status=(b1->status||......||bn->status)&&(c1->status||......||cm->status);
If between b and c being OR relation, then
a2->status=(b1->status||......||bn->status)||(c1->status||......||cm->status)。
If n=0 or m=0, i.e. SBOr SCIn be not pointed towards the record of a2, then SBOr SCMode bit treat as false.
SBMiddle b1,b2,……,bnAll pop, because they are unlikely to be the child of a1;SCMiddle c1,c2,……,cmMode bit status Popping for false, the P of remaining recordstackPointer all points to a1, because they are also the offsprings of a1.
If now a2-> status=false, then the record belonging to a2 offspring in output buffer is deleted from relief area.
T as the root node root of XPathrootFor empty and SrootIn record the most all go out chain, algorithm stops, in outputBuffer Element be exactly Query Result.
For the execution flow process popped shown in lower surface frame, it is designated as algorithm 3, as follows to function declaration therein: IsEmpty (LPL, SE) Judge SEWhether also has the element of chain, if not returning true;Delete_Stacks(SE, e) SESub-stack in after e Delete for element;(outputBuffer e) deletes offspring's element of e from output buffer to Delete_InterResult;stack_top(SE) Return stack does not goes out chain and the element near stack top;childStack(SE) return SEAll sub-stack;descendants(SC,e) Return stack SCIn belong to the element of offspring of e;PC (E, C) judges whether E and C is filiation;AD (E, C) judges E and C Whether it is ancestors' descendent relationship.
3.3) with the XPath of asterisk wildcard " * "
The common axle of XPath has ancestors offspring (AD) axle, father and son (PC) axle etc., if XPath occurs asterisk wildcard " * ", Then amplify out three kinds of new axles:
A) grandfather's (grand parent-child, i.e. GPC) axle, such as a/*/c, a and c is the grandfather's pass every two-layer System;A/*/*/c, a and c is the grandfather's subrelation every three layers.Present invention use/nRepresent GPC axle, n is integer, represent every Which floor.
B) absolute ancestors offspring (absolute ancestor/descendant, i.e. AAD) axle, such as a/* //c or a//* //c, A and c is at least every absolute ancestors' descendent relationship of two-layer.Use //nRepresenting AAD axle, n is integer, represents at least every several Layer.
C) special ancestors offspring (special ancestor/descendant, i.e. SAD) axle, such as a//*/c, a and c is at least Special ancestors' descendent relationship every two-layer.Use ///nRepresenting SAD axle, n is integer, represents at least every which floor.
AAD and SAD to be distinguished?It is such as AAD axle between a//* [//d] //c, a and d, c, does not has between d and c Relation;For being SAD axle between a//* [d]/c, a and d, c, and d and c is brotherhood;For a//* [d] //c, a And be SAD axle between d, it is AAD axle between a and c, between d and c, it doesn't matter.
GPC, AAD and SAD are the special cases of AD, use DDE coding can judge GPC, AAD and SAD easily, The information because DDE coding has levels.
With tri-kinds of axles of GPC, SAD and AAD, the XPath having asterisk wildcard " * " is carried out equivalence to rewrite.
When in XPath occur " * " and it be node of divergence, such as a/* [d]/c//e, be rewritten as a [/2d]/2C//e, because d and c Brotherhood must also be met, to pay special attention to this situation when processing three kinds of new axles.
New element is stacked, and the stack to enter is Sb, on XPath the father node of b be a, a and b be GPC, SAD and AAD One in three kinds of axles.Stacked condition to be met is:
A) SaIn must have the element of chain;
B) SaMiddle existence element and new element meet the hierarchical relationship required by axle.
C) new element type meets the requirement of b.
If a Yu b, c be/nOr ///nAxle, and n is equal, then b and c needs to meet brotherhood.Assume present element A1 goes out chain, b1,b2,……,bnAnd c1,c2,……,cmIt is the offspring of a1, calculates whether a1 mates, first have to brother's pairing, example Such as (b1,c1,c2), (b2,b3,c3,c4) ... the element in bracket is all brother, then:
If between b and c being AND relation, then a1-> status=[b1->status&&(c1->status||c2->status)]| [(b2->status||b3->status)&&(c3->status||c4->status)]||……;
If between b and c being OR relation, then a1-> status=(b1->status||c1->status||c2->status)||(b2->status|| b3->status||c3->status||c4->status)||......。
If cannot match, then a1-> status=false.
For stack SbAnd ScIn record be to continue with staying stack, still should delete, GPC axle with reference to PC axle process, SAD axle, AAD axle then processes with reference to AD axle.
Fig. 4 is with the querying flow figure of example //a [//c]/b, wherein: all of element sequence merger is sorted by (a);B () sequentially Process each element;C () puts into relief area matching result.Fig. 5 is the stacked Pop operations schematic diagram of example shown in Fig. 4. In stack, three parts of each record are: the left side is element information;Top right-hand side is that matching status position status, F represent false, T table Show true;Limit, bottom right is pointer Pstack.What figure bottom was shown is the change of LPL.The step that right //a [//c]/b inquires about is concrete It is described as follows:
The first step, element sequence T of node a, c and b from the row's of falling layera、TcAnd TbTake out.
Second step, utilizes the vanquished tree to Ta、TcAnd TbCarry out merger sequence, obtain an element sequence: first a, second Individual a, c, first b, second b.
3rd step, these 5 elements are the most stacked and Pop operations:
1) front 3 elements broadly fall into ancestors' descendent relationship, and they are the most stacked, because c is leaf node, so the status of c For true.
2) first b element is stacked, checks that LPL's, c closes label mistake, and c goes out chain.Because b is leaf node, so The status of b is true.Because b is output node, first b puts into output buffer.
3) second b is stacked, check LPL, at this moment first b and second a close label mistake, they go out chain.The When two a go out chain, its status becomes true, because the status of its child node b and c is true.Second a goes out Chain makes first b element pop, because it is not the daughter element of first a, but c element is not popped, because it is The offspring of first a, points to first a the Pstack of c.Because b is output node, second b element is also placed in defeated Go out relief area.
4) last, all elements has processed, the element chain to be gone out in LPL.When first a goes out chain, its status becomes For true, because the status of its child node b and c is true, then two stacks of Sb and Sc all empty, because Sa Stack does not has element.
4th step, finally checks there are two results in output buffer.
Fig. 6 is the query case of right //a/* [c]/b, it is desirable to asterisk wildcard " * " has two child nodes c and b, c and b to be that brother is closed System.Query steps is as follows:
The first step, carries out equivalent rewriting to XPath, after rewriting be //a [/ 2c]/2b, i.e. a have two grandchild node b and c, and b Must be brother with c.
Second step, takes out element sequence Ta, Tc and Tb of node a, c and b from the row's of falling layer.
3rd step, utilizes the vanquished tree that Ta, Tc and Tb are carried out merger sequence, obtains an element sequence: first a, Two a, c, first b, second b.
4th step, these 5 elements are the most stacked and Pop operations:
1) front 3 elements broadly fall into ancestors' descendent relationship, and they are the most stacked, because c is grandson's element of first a, and institute First a element is pointed to the Pstack of c.Because c is leaf node, so the status of c is true.
2) first b element is stacked, checks that LPL's, c closes label mistake, and c goes out chain.First b is first a Grandson's element, so the Pstack of b points to first a element.Because b is leaf node, so the status of b is true.Cause Being output node for b, first b puts into output buffer.
3) second b is stacked, check LPL, at this moment first b and second a close label mistake, they go out chain.The When two a go out chain, its statu remains as false, because it does not has grandson element c and b.Now stack Sa only has first Individual a element or effective element, but second b element is not its grandson's element, so second b element is discontented with The stacked condition of foot.
4) last, all elements has processed, the element chain to be gone out in LPL.When first a element goes out chain, its status Becoming true, because the status of its grandson element b and c is true, and b and c is brotherhood.Then Sb and Sc Two stacks all empty, because not having element in Sa stack.
5th step, finally checks there is a result in output buffer.
Above example is only limited in order to technical scheme to be described, those of ordinary skill in the art can Technical scheme is modified or equivalent, without departing from the spirit and scope of the present invention, the guarantor of the present invention The scope of protecting should be as the criterion with described in claim.

Claims (7)

1. a querying method for XML data, its step includes:
1) using Native XML mode to store XML data, its storage organization includes: interior nodes layer, and storage is according to document The node of the XML tree of sequence arrangement, wherein XML element uses DDE coded system to encode;Leaf node layer, stores XML The text data of leaf nodes;Arranging layer, the inverted index of storage interior nodes layer, each index entry is the unit that tag names is identical The sequence that element is arranged according to document sequence;
2) according to the XPath query statement of input, from the described row's of falling layer, the element sequence corresponding with the node of described XPath is taken out Row, and use the vanquished tree to carry out merger sequence;Described employing the vanquished tree carries out merger sequence, is the coding of the DDE to two elements Comparing, obtain relation before and after said two element, and set preceding element as victor, posterior element is the vanquished; When XPath occurs asterisk wildcard " * ", amplify out three kinds of new axles: after the sub-axle of grandfather, absolute ancestors' offspring's axle, special ancestors For axle, use described three kinds of new axles that the XPath containing asterisk wildcard " * " carries out equivalent rewriting;
3) XML element after sorting merger carries out stacked and Pop operations in order, and obtains Query Result from relief area.
2. the method for claim 1, it is characterised in that in described interior nodes layer, the information of every record includes: by node name Integer identifiers, DDE coding and the node type that word is mapped to.
3. the method for claim 1, it is characterised in that in the described row of falling layer, the information of each element includes: element type, This element encodes in the address of interior nodes layer and DDE.
4. the method for claim 1, it is characterised in that described interior nodes layer points to described leaf node layer by pointer.
5. the method for claim 1, it is characterised in that: in described XPath, each node q has two data structures: element Sequence TqWith stack Sq;TqIt is all elements with q name matching in XML document, and arranges according to document sequence;SqFor depositing Storage and the element of q name matching, and carry out stacked and Pop operations.
6. the method for claim 1, it is characterised in that when stack-incoming operation, only retains the ancestors of new element, in stack in stack All elements be all ancestors' descendent relationship.
7. method as claimed in claim 6, it is characterised in that if element e wants stacked SE, father's joint of node E on XPath Point is A, then stacked for element e Rule of judgment is:
a)SAIn have the element of chain;Described go out chain refer to the record of the ancestors that are not e from connecting the chain of all elements stack Table is deleted;
B) e is SAIn do not go out chain and the child of the element near stack top;
C) type of e is identical with the type of E on XPath.
CN201210192018.7A 2012-06-11 2012-06-11 A kind of querying method of XML data Expired - Fee Related CN103488639B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210192018.7A CN103488639B (en) 2012-06-11 2012-06-11 A kind of querying method of XML data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210192018.7A CN103488639B (en) 2012-06-11 2012-06-11 A kind of querying method of XML data

Publications (2)

Publication Number Publication Date
CN103488639A CN103488639A (en) 2014-01-01
CN103488639B true CN103488639B (en) 2016-12-07

Family

ID=49828879

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210192018.7A Expired - Fee Related CN103488639B (en) 2012-06-11 2012-06-11 A kind of querying method of XML data

Country Status (1)

Country Link
CN (1) CN103488639B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105677740A (en) * 2015-12-29 2016-06-15 中国民用航空上海航空器适航审定中心 Method for matching entity-based text data and XML files
CN108614808B (en) * 2016-12-12 2020-09-04 北大方正集团有限公司 Typesetting method and typesetting device for XML (extensive markup language) document

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584884A (en) * 2003-08-20 2005-02-23 富士通株式会社 Apparatus and method for searching data of structured document
CN101010674A (en) * 2004-06-16 2007-08-01 甲骨文国际公司 Efficient extraction of XML content stored in a LOB

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6804677B2 (en) * 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1584884A (en) * 2003-08-20 2005-02-23 富士通株式会社 Apparatus and method for searching data of structured document
CN101010674A (en) * 2004-06-16 2007-08-01 甲骨文国际公司 Efficient extraction of XML content stored in a LOB

Also Published As

Publication number Publication date
CN103488639A (en) 2014-01-01

Similar Documents

Publication Publication Date Title
Meier et al. Nosql databases
US8738667B2 (en) Mapping of data from XML to SQL
JP5255605B2 (en) Registry-driven interoperability and document exchange
US6804677B2 (en) Encoding semi-structured data for efficient search and browsing
US20060218160A1 (en) Change control management of XML documents
US20070061706A1 (en) Mapping property hierarchies to schemas
US20060173865A1 (en) System and method of translating a relational database into an XML document and vice versa
US20040060006A1 (en) XML-DB transactional update scheme
US20080301168A1 (en) Generating database schemas for relational and markup language data from a conceptual model
WO2001061566A1 (en) System and method for automatic loading of an xml document defined by a document-type definition into a relational database including the generation of a relational schema therefor
CN102033954A (en) Full text retrieval inquiry index method for extensible markup language document in relational database
CN101661481A (en) XML data storing method, method and device thereof for executing XML query
US9805112B2 (en) Method and structure for managing multiple electronic forms and their records using a static database
CN102214243A (en) Version management system for x extensible business reporting language (XBRL) classification standard
CN109871473A (en) A kind of method of pair of project file and Database full-text search document
US9037553B2 (en) System and method for efficient maintenance of indexes for XML files
Koupil et al. A universal approach for multi-model schema inference
EP2425382B1 (en) Method and device for improved ontology engineering
CN103488639B (en) A kind of querying method of XML data
CN105550176A (en) Basic mapping method for relational database and XML
Koupil et al. Schema inference for multi-model data
Barbosa et al. Efficient incremental validation of XML documents after composite updates
Moro et al. Schema advisor for hybrid relational-XML DBMS
WO2010147453A1 (en) System and method for designing a gui for an application program
Cavalieri et al. On the reduction of sequences of XML document and schema update operations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161207

Termination date: 20190611

CF01 Termination of patent right due to non-payment of annual fee