CN101887458A - Path coding-based XML document index method - Google Patents

Path coding-based XML document index method Download PDF

Info

Publication number
CN101887458A
CN101887458A CN 201010219493 CN201010219493A CN101887458A CN 101887458 A CN101887458 A CN 101887458A CN 201010219493 CN201010219493 CN 201010219493 CN 201010219493 A CN201010219493 A CN 201010219493A CN 101887458 A CN101887458 A CN 101887458A
Authority
CN
China
Prior art keywords
path
node
xml
schema
code
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010219493
Other languages
Chinese (zh)
Inventor
宋余庆
陈健美
邹为伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu University
Original Assignee
Jiangsu University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu University filed Critical Jiangsu University
Priority to CN 201010219493 priority Critical patent/CN101887458A/en
Publication of CN101887458A publication Critical patent/CN101887458A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a path coding-based XML document index method and belongs to the technical field of data processing. The method comprises the following index steps: creating a tree module, dividing query paths, creating element list, creating a structural list, forming XML path prefix coding, forming XML value list, determining path codes and coupling paths to obtain a result. In the invention, based on the introduction of XML path codes, the invention provides the scientific and effective XML index method, which can quickly complete the path matching of the query paths and the XML documents and acquire query results, and has a common significance.

Description

A kind of XML document indexing means based on path code
Technical field
The present invention relates to a kind of electronic document indexing means, especially a kind of XML document indexing means based on path code belongs to the microcomputer data processing field.
Background technology
Expandable mark language XML (eXtensible Markup Language) has platform-neutral, self descriptiveness, extensibility and simply is easy to advantage such as processing, becomes the standard of Internet data representation and exchange gradually.Along with popularizing that XML uses, how faster, the Query XML data become the problem that becomes more and more important more accurately.In order to improve the efficient of XML path query, many experts and scholar are devoted to the foundation of XML document index.
The XML data are the judgement that element structure is concerned, for example set membership because it is semi-structured to the crucial part of XML document inquiry.For structure query, a kind of method is to set up the path indexing of XML document tree, comes the calculating of accelerating structure inquiry by path indexing.Another kind method is that the node of XML document tree is encoded, and directly judges the structural relation between the node by encoding.
Main XML path indexing has at present: DataGuide, 1-index and A (k) etc.Dataguide is to be a kind of structural summary in initial concise path from root node.The label character path that forms that is cascaded in limit is only described once in Dataguide.Needed part node when Dataguide has reduced the traverse path inquiry, but have following deficiency:
1) DataGuide summarizes accurately to the XML data plot, if the XML data plot is graph structure, sets up the time of DataGuide and index that required space may be XML data plot size so doubly.
2) there is the possibility that intersects in the superset of each node among the DataGuide, therefore may cause and obscure.A (k) index proposes to have between the node notion of " k similarity ", the basic thought of index is that the similarity with node in the XML data plot is that the node of k is stored in the same nodal set of key map, and this just means that all paths are that the path of k all is stored in the key map.The downward similarity of having ignored them but it has only considered the upwards similarity of node is so be poor efficiency when handling the inquiry of band individual path.The basic thought of Fabric index is that the relation table between the semi-structured data is shown as the path, path code is become character string, on character string, set up index then, the inquiry of support fast path, but handle band // path query the time, for example publications//title represents to search the title that all have ancestors, and Index Fabric seems that efficient is not high.
In addition, the XML code index method that has proposed has XISS, ViST etc.The basic thought of XISS is that the segmentation of query path expression formula is calculated, and connects the generation net result successively by relation constraint between different nodes then.It is encoded with the follow-up traversal value of first preface to XML document, to the path query processing, need not travel through XML document: if but query path form by N element/property, need from index, retrieve N group node, need calling of (N-1) aggregated(particle) structure join algorithm at least and handle each XML document; The node that inevitably has simultaneously in many uncorrelated structures participates in set membership or the judgement of ancestors' one descendent relationship in the simple path processing procedure.ViST encodes XML document and user inquiring simultaneously, represents with character string couple sequence, the query script of XML data is promptly changed into the process of sequences match.Mistake is alert, pretreatment time is oversize but the query processing process usually occurs.
Retrieval is found, application number is that 03108526.1 Chinese patent application discloses a kind of extending mark language indexing means of handling the regular path expression inquiry, this patented claim is extracted all possible path titles and store in the path searching table with path ID to database input XML file the time.The path searching table is as index, the user imported regular path expression be converted into path in the path that actually exists in XML, realizing route coupling in index.The defective of this invention is: when the XML file was very big, the path in the path searching table can significantly increase, and storage cost is bigger; Do not utilize the pattern of XML document, all need to scan the index of all XML document, scan of a high pricely, influence search efficiency for any query path expression formula.Application number is that 200410099272.8 Chinese patent application discloses a kind of highly effective path indexing method based on the XML data, this method is set up UD (k to source XML document data plot, l) index, in key map, finish the connection procedure of condition path and main path by automat, realize the inquiry of the path expression of band branch.Make progress similarity less than k for the destination node in the individual path, and downward similarity only needs just can realize inquiry less than 1 destination node in key map.But parameter k and l determine most important, directly influence the degree of accuracy of inquiry, and this invention does not provide solution.And the scope that exceeds k or l when similarity is, still needs to verify in source data figure, reduces search efficiency greatly.Application number is that the Chinese patent application of 200910158713.X discloses a kind of method and system that is used for generating in the XML data base management system (DBMS) index, this patented claim is when XML document deposits database in, the pattern that relies on XML is set up the index function of XML, key and value index building table with each node deposit library module in.When the Query XML document, as long as pass through the value of index function scanning institute computation index, need not scan entire database, realize efficiently inquiry.But the shortcoming that this invention exists is the inquiry that can only be used for the simple path expression formula, can not inquire about regular path expression, has big limitation.
Summary of the invention
The objective of the invention is to: propose a kind of XML document indexing means, problem such as solve that prior art exists that the XML document scanning amount is big, individual path attended operation complexity and regular path expression search efficiency are not high based on path code.
In order to achieve the above object, the XML document indexing means that the present invention is based on path code realizes that by the intelligent apparatus with central processing unit described index step is:
Step 1, set up tree-model---according to a conventional method, according to document node structure, respectively XML document and corresponding Schema (chart document) thereof are mapped to corresponding document tree model and Schema tree-model, common described document tree model is made of the element node that is linked in sequence by ancestors descendant, the textual value that is connected with leaf node and the attribute node that is connected with the respective element node, and described Schema tree-model is made of the element node that is linked in sequence by ancestors descendant, the attribute node that is connected with the respective element node.
Step 2, division query path---the need query path that will import is divided into one group of conditional branching path and target individual path with the predicate ending.
Step 3, set up the list of elements---according to a conventional method, according to scanning result to above-mentioned Schema tree-model, the title (name) of (comprising element node and attribute node) of each node in the Schema tree-model, preorder traversal value (pre), follow-up traversal value (post), document identification (id) are deposited in respectively in the corresponding form, constitute the Schema list of elements.
Step 4, set up structural table---preorder traversal value (pre) and all leaf nodes (destination node) the preorder traversal values (leaf-pre) and the corresponding document identification (id) in its path, place of above-mentioned each node are listed according to the order of sequence, constituted the Schema structural table.
Step 5, formation XML path prefix coding---each node in above-mentioned document tree model goes out the limit and sorts out with attribute limit and element limit respectively, provide path code with resolution mark by the natural number order, and make each node of XML document carry the preorder traversal value (pre) identical with corresponding node among its Schema, again with root node to certain node the path code of process constitute the path prefix coding of this node in order.
Step 6, formation XML value table---preorder traversal value, path prefix coding (Path-lable) from root node to leaf node and the textual value that be attached thereto of all leaf nodes in Schema listed one by one according to the order of sequence, constitute XML value table, described path prefix is compiled and is combined according to the order of sequence by the path code from root node to this leaf node.
Step 7, determine path prefix coding---according to the corresponding title of each leaf node of dividing the back individual path, from the Schema list of elements of having set up, find corresponding preorder traversal value and document identification; And then, from the Schema structural table of having set up, find corresponding leaf node preorder traversal value according to this preorder traversal value and document identification; According to this leaf node preorder traversal value, find corresponding path prefix coding again from established XML value table.
Step 8, path coupling obtain the result---the two paths prefix codes contrast by turn from left to right earlier in the individual path after will dividing, and identical as the isotopic number sign indicating number, then on this position of coupling path, deposit this number in; As the number that the attribute node is differentiated mark appears having, then do not compare and directly deposits this tape label number in the corresponding position of coupling path in; Finish as a paths prefix code, then the residue number of another paths prefix code is deposited in the follow-up corresponding position of coupling path, coupling path coding in the middle of obtaining, and with longer path in the two paths prefix codes in XML value table corresponding textual value as the centre result that is coupled; Afterwards, refer again to said process, the path prefix coding of middle coupling path coding and all the other individual paths is carried out next round from left to right to be contrasted by turn, middle coupling path coding that obtains upgrading and middle coupling result, be coupled one by one until path code and finish all individual paths, get the result that is coupled to the end, as the search index result.
Inequality as occurring the coordination number in the above-mentioned process, show that then this two path coding can not connect two paths of correspondence, skip to the next round coupling or withdraw from.
The present invention further improves: when inserting new node, add predetermined labels symbol (for example ". ") and the sequence code path code as new node behind the path code of new node respective path.
Conclusion is got up, and the present invention has the following advantages:
1) routing information that utilizes the Schema node to carry, inquiry respective paths and desired value in XML value table, overcome the big problem of classic method XML scanning amount, when path branches connects, only need to get final product by simple coupling, need not a large amount of loaded down with trivial details node attended operations, therefore compare with existing indexing means, search efficiency significantly improves.
2) utilize the schema structural information that structure matching is carried out in query path, according to the relation of inclusion between the code area, can judge ancestors descendant's relation of node, if there is the coupling path among the Schema, then in its corresponding XML document, inquire about, if there is not the coupling path, then no longer its corresponding XML document is scanned, avoided fearless operation.
3) utilize the schema structural information that structure matching is carried out in query path,, match all possible paths at its place, solved canonical path query problem according to all leaf nodes of node correspondence in the Schema structure.
3) scan corresponding XML document, give with Schema in the identical element leaf node of corresponding same position carry its code identification leaf-pre, encoded in the path, with the value of Schema leaf node coding leaf-pre, path code and leaf node correspondence together in the stored value table, therefore need not scan XML document, in the value table, just can obtain the routing information and the textual value at this node place.
4) be that bound pair query path is divided with the node (comprising attribute node and leaf node) that carries textual value, thus the individual path number by the number decision of the textual value of needs inquiries, query time and query path length have nothing to do.
5) when new node need insert, need not to change the coding in other paths in the model tree, very convenient as long as will add the predetermined labels symbol at new node left side brother's path code, and do not influence by the further search index of method of the present invention.
In a word, the present invention provides a kind of scientific and effective XML indexing means on the basis of introducing the XML path code, can finish the route matching of query path and XML document fast, obtains Query Result, is of universal significance.
Description of drawings
The present invention is further illustrated below in conjunction with accompanying drawing.
Figure 1A shows the XML document of a case history.
Figure 1B shows the tree-model of XML document.
Fig. 2 A shows the Schema of XML document correspondence.
Fig. 2 B shows the tree-model of Schema.
Fig. 3 shows the Schema list of elements.
Fig. 4 shows the structural table of Schema.
Fig. 5 shows the path code of XML.
Fig. 6 shows XML value table.
Fig. 7 shows XML and dynamically updates tree-model.
Specific embodiment
Below with a simplified embodiment, specify the XML document indexing means that the present invention is based on path code.
[1] sets up tree-model according to XML, Schema
Figure 1A is a XML case history document, Figure 1B is according to a conventional method, according to document node structure, with the corresponding tree-model that XML document is mapped to, the document tree-model is made of the element node that is linked in sequence by ancestors descendant, the textual value that is connected with leaf node and the attribute node that is connected with the respective element node.Fig. 2 A is the Schema of Figure 1A document correspondence, it is the pattern that XML document will be observed, and XML is carried out the checking (being the legitimacy of relevant documentation object unit itself such as whether optional the nested form of element type, element, attribute type, attribute value data type, property value and structure thereof) of syntactic structure.Fig. 2 B is according to a conventional method, and according to the node structure, with the corresponding tree-model that Schema is mapped to, this Schema tree-model is made of the element node that is linked in sequence by ancestors descendant, the attribute node that is connected with the respective element node.
XML document and Schema thereof can both be mapped to a tree orderly, the limit mark, are called tree-model, note do T=(V, r, E, tag, label).Wherein: (1) V is the set of XML node.(2)
Figure BSA00000174941900071
It is the root node of tree.(3) Be the set on limit, and: 1. V=r ∪ VE ∪ VA ∪ VT, VE, VA and VT represent the set of element node, attribute node and text node respectively; 2. E=EE ∪ EA ∪ ET, wherein
Figure BSA00000174941900073
Be the set on element limit,
Figure BSA00000174941900074
Be the set on attribute limit,
Figure BSA00000174941900075
It is the set on text limit.(4) tag=tagE ∪ tagA ∪ tagT, wherein: 1. function tagE:VE →<name, nodetype 〉, give a binary character string group for each element node, represent the masurium and the node type of this element node respectively.The value of node type is " EE ", " ET " or " EN ", and they represent that respectively content is daughter element, text or is empty element node; 2. function tagA:VA →<name, value, valuetype 〉, give a ternary character string group for each attribute node, represent the type of attribute-name, property value and the property value of this attribute node respectively; 3. function tagT:VT →<text, texttype 〉, give a binary character string group for each text node, represent the content of text node and the type of content respectively.(5) function label:V → string gives a sign id for each node, and this is identified in the document unique.This logic data model has only defined the general data that constitutes XML document: element, attribute and text, and ignore less important data such as processing instruction, note.
According to tree-model definition can write out above-mentioned XML document data query model T=(V, r, E, tag, label), wherein:
Patient VE={, name, age, gop information ..., VA={ case history id}; R={ case history };
Patient EE={-name, patient-age, patient-gop information ..., patient EA={-sick id};
TagE={<patient, EE〉..., tagA={<case history id, ET, unsignedByte〉};
label={<1,10>,<2,9>,<3,1>,<4,2>...}。
[2] divide query path
The need query path of input is divided into one group of simple condition individual path and target individual path that only contains a predicate constraint, as for path " case history/patient [case history id=" 1 "]/gop information/chemical examination ", can be divided into the simple condition individual path that only contains the constraint of predicate " case history/patient [case history id=" 1 "] (case history id is the attribute node; the content in the bracket [] is the predicate constraint, and the path of band predicate constraint is the condition path) and target individual path " case history/patient/gop information/chemical examination ".
[3] set up the Schema list of elements
Interval coding-the Dietz that the Schema tree-model has among Fig. 2 B can support various XML inquiries effectively.This is encoded to each node and gives (pre (u), post (u), dep (u), id)).The preorder traversal value of pre (u) expression node u, the follow-up traversal value of post (u) expression node u, dep represents the node degree of depth, can play aid identification or checking effect, id is included between ancestors' junction area between the id sign descendant junction area of Schema document, if promptly node u is ancestors' node of v, then need satisfy pre (u)<pre (v) ∧ post (v)<post (u).
Reasoning 1 for the tree in any two node u and v, if satisfy pre (u)<pre (v) ∧ post (v)<post (u) ∧ dep (v)-dep (u)=1, then u is the father node of v.
According to a conventional method, to above-mentioned Schema tree-model scanning, the title (name) of (comprising element node and attribute node) of each node in the Schema tree-model, preorder traversal value (pre), postorder traversal value (post), the node degree of depth (dep) and document identification (id) are deposited in respectively in the corresponding form, can constitute the Schema list of elements shown in Figure 3.This table comprises the syntactic structure information of all elements among the Schema.When the user input query path, only need in table, find out with query path in identical element, and carry out structure decision according to each element encoding, just can finish route matching.
[4] set up the Schema structural table
Fig. 4 is for listing the Schema structural table of formation with the preorder traversal value (pre) of above-mentioned each node according to the order of sequence with all leaf nodes (destination node) the preorder traversal values (leaf-pre) and the corresponding document identification (id) in its path, place.If the pre of known node coding can find the pre value of leaf node in all paths at this node place in structural table.Because user inquiring all is interested to textual value, the destination node that is query path all is a leaf node, so when Schema and query path coupling, find the leaf node in path, query path place, pre value according to leaf node, when XML is inquired about, find corresponding leaf node to get final product, do not need to inquire about again whole XML document.
[5] form XML prefix path coding
Existing prefix code all is the coding to node, prefix code has been preserved the routing information of node, the coding of any element node (except the root node) is the prefix of its offspring's element node encoding, the judgement of determining just to be equivalent to prefix substring relation of inclusion of ancestors' descendent relationship between node like this.When carrying out the individual path connection, the node attended operation is too complicated, and can solve single path inquiry problem effectively based on the XML inquiring technology of path indexing based on the node encoding method.
Each node in above-mentioned document tree model goes out the limit and sorts out with attribute limit and element limit respectively, provide path code by the natural number order, and make each node of XML document carry the preorder traversal value (pre) identical with corresponding node among its Schema with resolution mark.As shown in Figure 5, the limit that goes out of same node from left to right provides path code respectively in order according to attribute limit and element limit, be called sequence code, the path code on attribute limit adds [] as differentiating mark, and the path prefix coding in long path is made of each paths coding of its process.With root node to this node the path code of process constitute the path prefix coding of this node in order.
[6] the value table of formation XML document
The preorder traversal value of all leaf nodes in Schema, leaf node place path prefix coding (Path-lable) and the textual value that is attached thereto are listed one by one according to the order of sequence, the XML document value table of pie graph 6, the path prefix coding is combined according to the order of sequence by root node each path code to this leaf node.Path code as path " case history/patient [case history id=" 1 "]/name " is 1[1] 1.
According to the pre value that obtains leaf node after the Schema route matching, promptly leaf-pre can find the prefix code in this path, leaf node place and the textual value of leaf node in the value table.As, known query path leaf node is " name ", the pre=4 in Schema, and in the value table, its place path prefix is encoded to 11 and 21, and textual value is " Xiao Wang " and " Xiao Zhang ".Behind the adopted value table, need not Query XML document again, improved search efficiency greatly.
[7] determine the path prefix coding
Leaf node coding leaf-pre according to each individual path correspondence finds its path code in the value table.For example, for path " case history/patient [case history id=" 1 "]/gop information/chemical examination ", be divided into " case history/patient [case history id=" 1 "] " and " case history/patient/gop information/chemical examination " after, find the corresponding preorder traversal value of node " case history id " and " chemical examination " and be respectively 3 and 7 from the Schema list of elements of having set up shown in Figure 3, document identification is 1; And then according to this preorder traversal value and document identification, finding corresponding leaf node preorder traversal value l eaf-pre from the Schema structural table of having set up shown in Figure 4 respectively is 3 and 7; Again from value table shown in Figure 6, finding the leaf-pre value is two paths coding " 1[1] " and " 2[1] " of 3 node correspondence, and the path code of leaf-pre value 7 correspondences is " 131 ".
[8] form coupling path
After the path code of query path divided, need each individual path be coupled together by the path coupling.This process is: with two individual paths codings that need to connect from the left side first, step-by-step is mated.If the coding of same position is identical, then it is deposited in the corresponding position of coupling path, continue next bit relatively; If occur predicate constraint [a] in the paths, [a] do not participate in coupling, directly deposits the corresponding position of coupling path in, carries out next bit more according to the order of sequence relatively; If paths coupling finishes, and another still has the residue coding, then will remain the follow-up corresponding position that coding deposits coupling path in.Result after being of coupled connections as this two paths with the textual value of longer path correspondence in two paths.
For example, in the present embodiment for path " case history/patient [case history id=" 1 "]/gop information/chemical examination ", be divided into " case history/patient [case history id=" 1 "] " and " case history/patient/gop information/chemical examination " after, path prefix coding by can obtaining them at the value table behind above-mentioned [3]-[7] is 1[1] ", " 2[1] " and " 131 ", corresponding textual value is respectively " 1 ", " 2 " and " routine urinalysis ".Their path code is coupled, i.e. 1[1] and 2[1] be coupled with 131 respectively.As 1[1] when being coupled with 131, from the left side, first is all " 1 ", " 1 " is deposited in coupling intermediate result; Relatively second again, when [] occurring, the value that [] and it are comprised deposits coupling intermediate result in; Continue next bit relatively, 1[1] relatively finish, 131 also are left " 31 ", " 31 " are directly deposited in the follow-up corresponding position of coupling path, the coupling path that obtains is encoded to 1[1] 31, so far show that these two individual paths can connect, can wherein grow path 131 corresponding textual value " routine urinalysis " in XML value table is the result that this two paths connects, and output shows as required Query Result.As not only two of the individual paths divided according to query path, then encode as intermediate value with above-mentioned coupling path, continue to compare the coupling path coding that obtains upgrading and the textual value of longer path correspondence with the path prefix coding of next bar individual path with reference to said process.By that analogy, being coupled one by one until the path code with all individual paths finishes, the result that be coupled to the end, as the search index result.
In above-mentioned coupling process, as finding the coding difference of same position, then coupling failure shows that two paths can not connect, and should skip to the next round coupling or withdraws from.In the present embodiment as 2[1] with 131 whens coupling,, 2[1 from the left side] first be all " 2 ", and 131 first be that 1, two coding is inconsistent, then coupling is failed, and illustrates that this two paths can not connect, and can't obtain Query Result.
[9] XML dynamically updates
XML document dynamically more requires can carry out validation verification to it earlier when XML document is upgraded, make after the renewal XML document still the structural information of corresponding XML Schema with it be consistent and illegal element/attribute node can not occur, be beneficial to the consistance of XML document and XML Schema.
When new node inserts, need carry out path code to new route.Present embodiment adds ". " behind the coding in path, a new route left side and sequence code promptly constitutes new path code.The sequence code of first child's node that inserts is " 1 ", the sequence code of second child's node is " 2 ", when last position of the sequence code of node is added " 0 " during for " 9 " after sequence code, such the 9th child's sequence code is " 90 ", the tenth child's sequence code is " 91 ", and the rest may be inferred.
As shown in Figure 7, when inserting path " gop information/chemical examination/routine blood test ", need carry out path code to " gop information/chemical examination ", promptly left path code " 1 " adds ". " and sequence code " 1 ", and new route is encoded to " (1.1) ".Textual value is deposited in the analog value table.Like this, need not to change the coding in other paths in the model tree.
This shows, adopt present embodiment, Schema is carried out the interval coding, utilize structural information to handle query path, solved individual path inquiry problem based on XML document code index method; XML document is adopted the path prefix coding, solve single path inquiry problem effectively, and overcome problem such as individual path attended operation complexity.

Claims (7)

1. XML document indexing means based on path code, realize that by intelligent apparatus the step of described index is with central processing unit:
Step 1, set up tree-model---according to document node structure, respectively XML document and corresponding Schema thereof are mapped to corresponding document tree model and Schema tree-model;
Step 2, division query path---the need query path that will import is divided into one group of conditional branching path and target individual path with the predicate ending;
Step 3, set up the list of elements---according to scanning result, title, preorder traversal value, follow-up traversal value, the document identification of each node in the Schema tree-model deposited in respectively in the corresponding form, constitute the Schema list of elements above-mentioned Schema tree-model;
Step 4, set up structural table---the preorder traversal value of above-mentioned each node and all the leaf node preorder traversal values and the corresponding document identification in its path, place are listed according to the order of sequence, constituted the Schema structural table;
Step 5, formation XML path prefix coding---each node in above-mentioned document tree model goes out the limit and sorts out with attribute limit and element limit respectively, provide path code with resolution mark by the natural number order, and make each node of XML document carry the preorder traversal value (pre) identical with corresponding node among its Schema, again with root node to certain node the path code of process constitute the path prefix coding of this node in order;
Step 6, formation XML value table---preorder traversal value, path prefix coding (Path-lable) from root node to leaf node and the textual value that be attached thereto of all leaf nodes in Schema listed one by one according to the order of sequence, constitute XML value table, described path prefix is compiled and is combined according to the order of sequence by the path code from root node to this leaf node.
Step 7, determine path prefix coding---according to the corresponding title of each leaf node of dividing the back individual path, from the Schema list of elements of having set up, find corresponding preorder traversal value and document identification; And then, from the Schema structural table of having set up, find corresponding leaf node preorder traversal value according to this preorder traversal value and document identification; According to this leaf node preorder traversal value, find corresponding path prefix coding again from established XML value table;
Step 8, path coupling obtain the result---the two paths prefix codes contrast by turn from left to right earlier in the individual path after will dividing, and identical as the isotopic number sign indicating number, then on this position of coupling path, deposit this number in; As the number that the attribute node is differentiated mark appears having, then do not compare and directly deposits this tape label number in the corresponding position of coupling path in; Finish as a paths prefix code, then the residue number of another paths prefix code is deposited in the follow-up corresponding position of coupling path, coupling path coding in the middle of obtaining, and with longer path in the two paths prefix codes in XML value table corresponding textual value as the centre result that is coupled; Afterwards, refer again to said process, the path prefix coding of middle coupling path coding and all the other individual paths is carried out next round from left to right to be contrasted by turn, middle coupling path coding that obtains upgrading and middle coupling result, be coupled one by one until path code and finish all individual paths, get the result that is coupled to the end, as the search index result.
2. the XML document indexing means based on path code according to claim 1, it is characterized in that: inequality in the described step 8 as the coordination number occurring, show that then this two path coding can not connect two paths of correspondence, skip to the next round coupling or withdraw from.
3. the XML document indexing means based on path code according to claim 2, it is characterized in that: the document tree model in the described step 1 is made of the element node that is linked in sequence by ancestors descendant, the textual value that is connected with leaf node and the attribute node that is connected with the respective element node, and described Schema tree-model is made of the element node that is linked in sequence by ancestors descendant, the attribute node that is connected with the respective element node.
4. the XML document indexing means based on path code according to claim 3 is characterized in that: when inserting new node, add predetermined labels symbol and the sequence code path code as new node behind the path code of new node respective path.
5. the XML document indexing means based on path code according to claim 4 is characterized in that: the node degree of depth that also contains each node correspondence in the Schema list of elements of described step 3.
6. the XML document indexing means based on path code according to claim 5 is characterized in that: the attribute limit in the described step 5 is differentiated and is labeled as " [] ".
7. the XML document indexing means based on path code according to claim 6 is characterized in that: the predetermined labels symbol behind the path code of described new node respective path is ". ".
CN 201010219493 2010-07-06 2010-07-06 Path coding-based XML document index method Pending CN101887458A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010219493 CN101887458A (en) 2010-07-06 2010-07-06 Path coding-based XML document index method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010219493 CN101887458A (en) 2010-07-06 2010-07-06 Path coding-based XML document index method

Publications (1)

Publication Number Publication Date
CN101887458A true CN101887458A (en) 2010-11-17

Family

ID=43073380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010219493 Pending CN101887458A (en) 2010-07-06 2010-07-06 Path coding-based XML document index method

Country Status (1)

Country Link
CN (1) CN101887458A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103123646A (en) * 2012-12-11 2013-05-29 北京航空航天大学 Conversion method for automatically converting XML document into OML document and device
CN103164421A (en) * 2011-12-12 2013-06-19 中国人民解放军第二炮兵工程学院 Extensive markup language (XML) coding method based on preorder position-descendant numbers
CN104809145A (en) * 2014-01-23 2015-07-29 三星泰科威株式会社 Hierarchical data analyzing method
CN105138524A (en) * 2014-05-30 2015-12-09 北大方正信息产业集团有限公司 Method and apparatus for creating document node path index and server
CN108334560A (en) * 2018-01-03 2018-07-27 腾讯科技(深圳)有限公司 A kind of information acquisition method and relevant device
CN110334084A (en) * 2019-05-09 2019-10-15 北京百度网讯科技有限公司 Date storage method, device, equipment and storage medium
CN111694990A (en) * 2020-06-08 2020-09-22 深圳市富中奇科技有限公司 Vehicle data processing method and device and storage medium
CN111815175A (en) * 2020-07-08 2020-10-23 睿智合创(北京)科技有限公司 Five-layer structure XML language interactive application method in nested list form
CN113515544A (en) * 2021-06-23 2021-10-19 金蝶软件(中国)有限公司 Data attribute query method and data attribute query device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1632792A (en) * 2004-12-29 2005-06-29 复旦大学 XML data based highly effective path indexing method
CN101661481A (en) * 2008-08-29 2010-03-03 国际商业机器公司 XML data storing method, method and device thereof for executing XML query

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1632792A (en) * 2004-12-29 2005-06-29 复旦大学 XML data based highly effective path indexing method
CN101661481A (en) * 2008-08-29 2010-03-03 国际商业机器公司 XML data storing method, method and device thereof for executing XML query

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
《中国优秀硕士学位论文全文数据库》 20070131 许娴 基于Schema的XML索引技术的研究 第25-29,33页 1-7 , 2 *
《山东大学学报(理学版)》 20071130 王宁等 一种利用前缀编码高效XML查询的策略 第45-48页 1-7 第42卷, 第11期 2 *
《河南大学学报(自然科学板)》 20100131 杨扬等 一种基于XML前缀编码的路径查询 第85-89页 1-7 第40卷, 第1期 2 *
《计算机工程》 20060930 曾一等 一种基于Schema的XML索引结构 第64-66页 1-7 第31卷, 第18期 2 *
《计算机应用》 20051231 张剑妹等 一种适用于顺序XML树的前缀编码方法 第2879-2881页 1-7 第25卷, 第12期 2 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164421A (en) * 2011-12-12 2013-06-19 中国人民解放军第二炮兵工程学院 Extensive markup language (XML) coding method based on preorder position-descendant numbers
CN103123646A (en) * 2012-12-11 2013-05-29 北京航空航天大学 Conversion method for automatically converting XML document into OML document and device
CN103123646B (en) * 2012-12-11 2015-11-04 北京航空航天大学 XML document is converted into automatically conversion method and the device of OWL document
CN104809145A (en) * 2014-01-23 2015-07-29 三星泰科威株式会社 Hierarchical data analyzing method
CN104809145B (en) * 2014-01-23 2018-05-29 韩华泰科株式会社 Hierarchy type data analysing method
CN105138524A (en) * 2014-05-30 2015-12-09 北大方正信息产业集团有限公司 Method and apparatus for creating document node path index and server
CN108334560A (en) * 2018-01-03 2018-07-27 腾讯科技(深圳)有限公司 A kind of information acquisition method and relevant device
CN110334084A (en) * 2019-05-09 2019-10-15 北京百度网讯科技有限公司 Date storage method, device, equipment and storage medium
CN111694990A (en) * 2020-06-08 2020-09-22 深圳市富中奇科技有限公司 Vehicle data processing method and device and storage medium
CN111694990B (en) * 2020-06-08 2022-12-27 深圳市富中奇科技有限公司 Vehicle data processing method and device and storage medium
CN111815175A (en) * 2020-07-08 2020-10-23 睿智合创(北京)科技有限公司 Five-layer structure XML language interactive application method in nested list form
CN113515544A (en) * 2021-06-23 2021-10-19 金蝶软件(中国)有限公司 Data attribute query method and data attribute query device
CN113515544B (en) * 2021-06-23 2022-02-15 金蝶软件(中国)有限公司 Data attribute query method and data attribute query device

Similar Documents

Publication Publication Date Title
CN101887458A (en) Path coding-based XML document index method
CN102033954B (en) Full text retrieval inquiry index method for extensible markup language document in relational database
CN107491561B (en) Ontology-based urban traffic heterogeneous data integration system and method
CN1552032B (en) Database
CN103823823B (en) Denormalization policy selection method based on Frequent Itemsets Mining Algorithm
CN102693310B (en) A kind of resource description framework querying method based on relational database and system
US9659001B2 (en) Query evaluation using ancestor information
AU2006322637B2 (en) A succinct index structure for XML
CN102722542A (en) Resource description framework (RDF) graph pattern matching method
CN102411580B (en) The search method of XML document and device
CN103123650A (en) Extensible markup language (XML) data bank full-text indexing method based on integer mapping
CN109902142A (en) A kind of character string fuzzy matching and querying method based on editing distance
CN114372174A (en) XML document distributed query method and system
Ramesh et al. Automata-driven indexing of Prolog clauses
CN102760173A (en) Bottom-up XML (eXtensible Markup Language) twig pattern matching method
Augsten et al. Windowed pq-grams for approximate joins of data-centric XML
Jena et al. High performance frequent subgraph mining on transaction datasets: A survey and performance comparison
CN110245248B (en) Remote sensing image keyword query method
CN1560763B (en) Method for translating expandable mark language path inquiry into structure inquiry
CN102043802B (en) Method for searching XML (Extensive Makeup Language) key words based on structural abstract
CN116226220A (en) Remote sensing flow recommendation method based on logic structure and attribute characteristics
JP2010267081A (en) Information search method, device and program
KR101218577B1 (en) Apparatus and method for processing sparql queries for searching keyword
CN102867054A (en) XML (extensible markup language) keyword query method
CN103064952B (en) Based on the service procedure search and the method for reusing that become granularity index

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20101117