CN101661481B - XML data storing method, method and device thereof for executing XML query - Google Patents

XML data storing method, method and device thereof for executing XML query Download PDF

Info

Publication number
CN101661481B
CN101661481B CN200810212515A CN200810212515A CN101661481B CN 101661481 B CN101661481 B CN 101661481B CN 200810212515 A CN200810212515 A CN 200810212515A CN 200810212515 A CN200810212515 A CN 200810212515A CN 101661481 B CN101661481 B CN 101661481B
Authority
CN
China
Prior art keywords
simple path
node
xml
data
warehouse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN200810212515A
Other languages
Chinese (zh)
Other versions
CN101661481A (en
Inventor
刘长浩
张国根
武硕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Priority to CN200810212515A priority Critical patent/CN101661481B/en
Publication of CN101661481A publication Critical patent/CN101661481A/en
Application granted granted Critical
Publication of CN101661481B publication Critical patent/CN101661481B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides an XML data storing method based on a simple path in an XML storehouse. The XML storehouse comprises a simple path storehouse and a data storehouse; the XML data storing method comprises the following steps: for nodes in an XML document, generating node identifiers for uniquely identifying the nodes; generating simple paths of the XML document; storing the simple paths into the simple path storehouse; using the simple paths as indexes to store the data of each node in each simple path into the data storehouse sequentially; and the data comprises the node identifiers and the values of the nodes. The XML data storing method enables the data to be stored in a simple path bunching way, and the data with one simple path are stored according to the order of the node identifiers. Without the mode of the XML document, XML data without modes can be processed and the query performance of the XML data can be increased, and moreover, the invention also provides an XML data storing device based on the simple path in the XML storehouse, a method and a device for executing the XML query in the XML storehouse.

Description

The method and the device thereof of the method for storing X ML data, execution XML inquiry
Technical field
The present invention relates to the execution technique of the storage and the XML inquiry of XML data, particularly, relate to method and apparatus, and in the XML warehouse, carry out the method and apparatus of XML inquiry based on simple path storing X ML data in the XML warehouse.
Background technology
Great deal of XML document is stored in the XML warehouse of primary XML database usually, perhaps is stored in the row of XML data type of relational database.Usually, the XML document that is stored in the same XML warehouse or the row of an XML data type has parallel pattern (schema).The XML inquiry is often used in from these XML documents obtains information.In most of the cases, these information are tiny and disperse.Usually, the language based on XPath is used in the XML inquiry, for example, and XQuery language, XUpdate language, XSLT language etc.
The storage scheme that carries into execution a plan with the XML data of XML inquiry is relevant.Below be existing storing X ML data and the technical scheme of carrying out the XML inquiry:
1.XML data as file storage in file system or as the CLOB object storage in relational database.In such storage scheme, when inquiry is perhaps upgraded the XML data, must resolve XML file or CLOB object, to obtain required information or to Update Information, like this, require a great deal of time.
2.XML the storage of data is based on XML schema file or document class definition (DTD) file.Particularly, in this technical scheme, the schema file of Analysis of X ML file at first, and create one group of relation table, between these relation tables, have adduction relationship.Level and ordering relation between the node of the XML data model tree corresponding with the XML file are mapped to the adduction relationship between the relation table, in relation table, have also stored the value of node.Like this, the XML file is broken down into relation table, and is stored in the relational database.When using XPath character string Query XML data; Convert the XPath character string into the some SQL statements that act on these relation tables; Carry out these SQL statements to obtain the result of inquiry, at last these Query Result sequences are turned to final XML fragment and return to the user.Yet the problem of this technical scheme is that the XML pattern of actual application program is very complicated, therefore, can cause occurring the adduction relationship of complicacy between a large amount of relation tables and these relation tables to the decomposition of XML file, thereby causes inquiry very complicated.And the variation of XML pattern requires the maintenance of lot of data storehouse pattern, and this technical scheme can not be handled the XML data of non-mode.
3.XML the storage of data does not rely on XML schema file or dtd file, that is to say, this storage scheme is that pattern is irrelevant.Particularly, use some labeling methods to come node tagging, be stored on the disk for example based on the ORDPath of Dewey ordering, DLN method etc., and with the value of these labels with node to the XML file.Like this, level between the node of XML tree and ordering relation can easily be derived through the label that compares them.In this technical scheme, the storage of XML data is based on the pack of label, that is to say, the XML file is a sequential storage.Like this, when carrying out the XML inquiry, need a plurality of resident disk pages or leaves that different XML files are arranged of visit, thereby cause the reduction of performance.
For example, supposing has 10000 XML files in the XML warehouse, and each XML file is 4KB, and the size of disk page or leaf also is 4KB.If adopt above-mentioned the 3rd kind of storage scheme; Then these XML files sequentially are stored on 10000 disk pages or leaves; And each file comprises the full detail of an article, and such as the content of author information, article, deliver the time etc., Fig. 1 shows the example of an XML tree.Suppose that the XML inquiry is "/article/author/name ", the name of promptly inquiring about all authors need be visited 10000 disk pages or leaves so, if the size of name field is no more than 64 bytes, then has 10000 *Gibberish (4KB-64B) is removed, and the waste rate reaches 98.44%, thereby makes the performance of XML inquiry descend.
Summary of the invention
The present invention just is being based on above-mentioned technical matters and is proposing, and its purpose is to provide a kind of method and apparatus based on simple path storing X ML data in the XML warehouse, the performance that it can be handled the XML data of non-mode and improve the XML data query.The present invention also aims to provide a kind of method and apparatus of in the XML warehouse, carrying out the XML inquiry,
According to a first aspect of the invention; A kind of method based on simple path storing X ML data in the XML warehouse is provided; Said XML warehouse comprises simple path warehouse and data warehouse; Said method comprises: to each node in the XML document, generate the node identifier that is used for unique this node of identification; Generate the simple path of said XML document; Said simple path is stored in the said simple path warehouse; And to be index with the simple path store the data of each node in each said simple path in the said data warehouse successively, and wherein said data comprise the node identifier and the value of this node.
According to a second aspect of the invention, a kind of method of in the XML warehouse, carrying out the XML inquiry is provided, wherein, in said XML warehouse, uses the method for above-mentioned storing X ML data to store a plurality of XML documents, said method comprises: receive said XML inquiry; Simple path warehouse based on said XML warehouse; Resolve said XML inquiry and carry out tree to generate; Wherein, said execution tree is made up by one or more main simple paths, and each main simple path is associated with zero or a plurality of secondary simple path and predicate simple path; And, carry out said execution tree to obtain the result of said XML inquiry based on the data warehouse in said XML warehouse.
According to a third aspect of the present invention; A kind of device based on simple path storing X ML data in the XML warehouse is provided; Said XML warehouse comprises simple path warehouse and data warehouse; Said device comprises: the node identifier generation module, be used for each node to XML document, and generate the node identifier that is used for unique this node of identification; And the simple path generation module, be used to generate the simple path of said XML document; Wherein, The simple path that is generated is stored in the said simple path warehouse; And being index with the simple path stores the data of each node in each said simple path in the said data warehouse successively, and wherein said data comprise the node identifier and the value of this node.
According to a fourth aspect of the present invention; A kind of device of in the XML warehouse, carrying out the XML inquiry is provided, wherein, in said XML warehouse, uses the device of above-mentioned storing X ML data to store a plurality of XML documents; Said device comprises: receiver module is used to receive said XML inquiry; Parsing module; Be used for simple path warehouse, resolve said XML inquiry and carry out tree, wherein to generate based on said XML warehouse; Said execution tree is made up by one or more main simple paths, and each main simple path is associated with zero or a plurality of secondary simple path and predicate simple path; And execution module, be used for data warehouse based on said XML warehouse, carry out said execution tree to obtain the result of said XML inquiry.
Description of drawings
Fig. 1 is the synoptic diagram of the example of XML data model tree;
Fig. 2 is the process flow diagram based on simple path method of storing X ML data in the XML warehouse according to an embodiment of the invention;
Fig. 3 is the synoptic diagram with XML data model tree of coded strings;
Fig. 4 is the synoptic diagram that adopts the data warehouse of B+ tree;
Fig. 5 is a process flow diagram of in the XML warehouse, carrying out the method for XML inquiry according to an embodiment of the invention;
Fig. 6 is a synoptic diagram of carrying out the example of tree;
Fig. 7 is the schematic block diagram based on simple path device of storing X ML data in the XML warehouse according to an embodiment of the invention;
Fig. 8 is a schematic block diagram of in the XML warehouse, carrying out the device of XML inquiry according to an embodiment of the invention.
Embodiment
Believe that through below in conjunction with the detailed description of accompanying drawing to specific embodiment of the present invention, above and other objects of the present invention, feature and advantage will become more obvious.
Fig. 2 shows the process flow diagram based on simple path method of storing X ML data in the XML warehouse according to an embodiment of the invention.Below in conjunction with accompanying drawing, specify present embodiment.
In the present embodiment, the XML warehouse comprises simple path (Simple Path) warehouse and data warehouse, and wherein, the simple path warehouse is used to store all simple paths, and data warehouse is used to store the data of all XML documents.Definition about simple path will be described hereinafter.
As shown in Figure 2, at step S201,, generate the node identifier (DOrderPath) that is used for unique this node of identification to each node in the XML document.In the present embodiment, the node identifier of node comprises the document identification (DocID) of the XML document that this node is affiliated and the coded strings (OrderPath) of this node.Coded strings OrderPath can be the Dewey coded string, or based on the character string of similar Dewey coding method, such as ORDPath coding method, DLN coding method etc.
In the present embodiment, node identifier DOrderPath is used for discerning the unique node of unique XML document, and therefore, a node is associated with a node identifier.In XML data model tree, level and ordering relation between two nodes can easily be derived through node identifier.Fig. 3 shows the synoptic diagram of the XML data model tree with coded strings; Wherein, " article ", " author " etc. are the titles of node, and " 0 ", " 0.0 ", " 0.0.1 " are respectively the coded strings of node " article ", " author ", " age ".The document identification of supposing this XML document is 1, and then the node identifier of node " article " is " 1,0 ", and correspondingly, the node identifier of node " author " is " 1,0.0 ", and the node identifier of node " age " is " 1,0.0.1 ".
For node identifier DOrderPath, 6 fundamental operations have been defined.Suppose that path, path1 and path2 are 3 node identifiers, so,
(1) Length (path) returns the degree of depth of the node of this path appointment, and this can obtain through resolving this path;
(2) (path is if n) represent that ≤n then returns path to Length (path), otherwise the length of returning path is the prefix of n for Intercept;
(3) Value (path) returns the value of the node of this path appointment;
(4) operator>(path1, whether the comparative result that path2) returns individual bit (or byte) character string of path1 and path2 is path1>path2;
(5) operator (path1, whether the comparative result that path2) returns individual bit (or byte) character string of path1 and path2 is path1 < path2;
(6) (path1, whether the comparative result that path2) returns individual bit (or byte) character string of path1 and path2 is path1=path2 to operator=.
These computings can be used in the execution of the XML inquiry that will describe in the back, for example are used for predicate computing, XML fragment sequenceization etc.
At step S205, generate the simple path of XML document.So-called simple path is the tag path from the root node of XML document to any node, separates through "/" between the node.In the present embodiment, the label of simple path adopts the title of node.In example shown in Figure 3, the simple path of this XML document is: article; Article/author; Article/chapter; Article/author/name; Article/author/age; Article/chapter/title; Article/chapter/text.
At step S210, the simple path that will in step S205, generate stores in the simple path warehouse.In the present embodiment; Confirm whether comprise these simple paths in the simple path warehouse; If certain or some simple paths in these simple paths are comprised, then ignore this simple path, otherwise; Store this simple path, and be that the simple path of being stored distributes a unique simple path sign (PathID).Like this, for a plurality of XML documents with model identical, only need the simple path of the single XML document of storage to get final product, the quantity of simple path is confirmed by the schema file or the dtd file of XML document.
In the present embodiment, the simple path warehouse adopts the form of XML file, provides the example in a simple path warehouse below:
<root?NumOfSP=”12”>
<article?UID=”0”>
<xmlns:xsi?UID=”1”/>
<xsi:noNamespaceSchemaLocation?UID=”2”/>
<author?UID=”4”>
<name?UID=”5”>
...?...
...?...
</name>
<age?UID=”8”/>
</author>
<chapter?UID=”10”/>
...?...
</chapter>
<text?UID=”15”>
...?...
</text>
<ID?UID=”16”>
</article>
</root>
At step S215, being index with the simple path stores the data of each node in each simple path in the data warehouse successively, and wherein, the data of node comprise the node identifier and the value of this node.The data of XML document are made up of two parts: value information and structural information; Wherein, Value information is meant the value of node in the XML document, and this can obtain through the obtaining value method that in the XML data model, limits, and structural information is meant level between the node and ordering relation in the XML document.In the present embodiment, the structural information of XML document is the node identifier of node.Like this, in data warehouse, the data of XML document are based on simple path storage, and the data of each simple path are stored by the order of node identifier.
In the present embodiment; Data warehouse adopts the form of B+ tree, its with " simple path sign, node identifier (PathID; DOrderPath (DocID; OrderPath)) " as key assignments, and for the node that has value in the XML document, the value of these nodes of storage in the corresponding leaf node of B+ tree.Fig. 4 shows an example of the data warehouse that adopts the B+ tree.
Can find out through above description; Adopt the method based on simple path storing X ML data in the XML warehouse of present embodiment; The XML data are stored with the mode of the pack of simple path, and the data of a simple path are stored according to the order of node identifier.Compare with existing scheme based on schema file decomposition XML document and storage; The method of present embodiment does not need the pattern of XML document; Can handle the XML data of non-mode; And use node identifier to keep the structure of XML document, node is added identifier can carry out in the process of loading XML document at an easy rate, and does not relate to the complicacy of creating and safeguarding relation table.Compare with the scheme of existing pack storing X ML data based on label, the method for present embodiment is carried out the XML inquiry efficiently owing to the mode storing X ML data of the pack of simple path, therefore can improve the performance of XML data query.
Under same inventive concept, Fig. 5 is a process flow diagram of in the XML warehouse, carrying out the method for XML inquiry according to an embodiment of the invention, wherein, for the part identical with front embodiment, suitably omits its explanation.Below in conjunction with accompanying drawing present embodiment is elaborated.
In the present embodiment; In the XML warehouse, use the method for storing X ML data shown in Figure 2 to store a plurality of XML documents, that is, stored all simple paths in the simple path warehouse in XML warehouse; Store the data of all XML documents in the data warehouse, comprised value information and structural information.Preferably, these XML documents have similar pattern.
As shown in Figure 5, in step 501, receive an XML inquiry $path.As previously mentioned, in the present embodiment, XML inquiry is based on XPath, and , $path='/article/author [age < 30] ' for example promptly inquires about the age less than 30 years old article author.
Then, in step 505,, resolve the XML inquiry that is received and carry out tree to generate based on the simple path warehouse.In the present embodiment, carry out tree and make up, and each main simple path is associated with zero or a plurality of secondary simple path and predicate simple path by one or more main simple paths.
Implication in the face of main simple path, secondary simple path and predicate simple path describes down.
Main simple path (Primary Simple Path; Be designated as PriSP ($path)) be meant that its node is the simple path as the root node of the result's of XML inquiry XML fragment; That is to say that the node of main simple path is the result's of XML inquiry the root node of XML fragment.Main simple path can be one or more.
For a main Jian Danlujing $pripath; Can have zero or a plurality of secondary simple path (Secondary Simple Path that belongs to this main simple path; Be designated as SecSP ($path; $pripath)) and predicate simple path (Predicate Simple Path is designated as PredSP ($path , $pripath)).In the present embodiment; Secondary simple path is meant that its node is the simple path of descendent node of the node of main simple path, and the predicate simple path is meant that the value of its node and this node is used to filter main simple path and the simple path that belongs to the secondary simple path of this main simple path.
Particularly; In step 505; After receiving XML inquiry, at first in the simple path warehouse, first node in this XML inquiry is carried out the title test, to find the node with first node matching; And obtain the simple path of the node that comprises this coupling, set to make up the execution relevant with first node.Then, in the execution relevant tree with first node record to other test (comprising node type test, the test of node data type) of first node.Judge whether first node exists the predicate computing; If exist; Then the predicate node is carried out the title test; And in the execution tree relevant, write down the position relation of predicate node and first node and the value expression of predicate computing with first node, the execution that storage is relevant with first node is then set.Node subsequently in the XML inquiry repeats above-mentioned processing, and upgrades the execution tree relevant with previous node with the constructed execution tree relevant with this node.Like this, after the processing of all nodes in the XML inquiry was all accomplished, resulting execution tree was exactly to carry out the required execution tree of this XML inquiry.
Example for above-mentioned XML inquiry $path; In the execution tree that after resolving this XML inquiry $path, generates; Main simple path PriSP ($path)={ '/article/author ' } secondary simple path SecSP ($path, main simple path PriSP ($path)={ '/article/author ' }, '/article/author ')={ '/article/author/name '; '/article/author/age ' }; Correspondingly, predicate simple path PredSP ($path, '/article/author ')={ '/article/author/age ' }.In another example; If XML inquires about $path='/article/* '; Then main simple path has a plurality of; Be PriSP ($path)={ '/article/author ', '/article/chapter/ ' }, wherein each main simple path has separately secondary simple path and predicate simple path.
In addition, for certain predicate Jian Danlujing $predpath, two kinds of predicates have been defined: structure predicate (Structure Predicate; Be designated as SturcturePred ($pripath , $predpath)), value predicate (Value Predicate; Be designated as ValuePred ($pripath , $predpath)).In the present embodiment, the structure predicate has been described the node of main simple path and has been belonged to the relation between the node of node or predicate simple path of secondary simple path of this main simple path; The value predicate is the literal value part of the predicate among the XML inquiry $path.In the above example, the structure predicate is operator=(lh, Intercept (rh, Length (rh)-1)), and the value predicate is " Value ('/article/author/age ')>30 ".
Fig. 6 shows an example carrying out tree, and wherein, in the predicate simple path, " positional information " write down the structure predicate of this predicate simple path, and " expression formula information " has then write down the value predicate of this predicate simple path; In secondary simple path, " positional information " write down the relation of this secondary simple path and main simple path.
In step 510,, carry out the execution that in step 505, generates and set to obtain the result of XML inquiry based on the data warehouse in XML warehouse.
In the present embodiment, for each PriSP [i] among n the main simple path PriSP that carries out in the tree (i≤n), according to its simple path sign, the visit data warehouse, the data with all nodes of obtaining this main simple path comprise node identifier and value.(j≤m), according to its simple path sign, the visit data warehouse is with the data or the handle of all nodes of obtaining this secondary simple path for each SecSPi [j] among the m that belongs to certain main simple path PriSP [i] the secondary simple path SecSPi.(p≤k), according to its simple path sign, the visit data warehouse is with the data or the handle of all nodes of obtaining this predicate simple path for each PreSPi [p] among k the predicate simple path PreSPi that belongs to certain main simple path PriSP [i].
Then, for the data of each node in each main simple path, use the data of all nodes of each the predicate simple path that belongs to this main simple path to carry out the predicate computing successively.Result the data of this node being carried out the predicate computing is true time; Be subordinated to the data of the descendent node of the node that takes out this main simple path in the data of all nodes of secondary simple path of this main simple path; And the data sequence of being taken out turned to the XML fragment, as the result's of XML inquiry a part.After main simple path all carries out above-mentioned processing to all, merge all XML fragments, thereby obtain the result of final XML inquiry.
Can find out through above description, adopt the method for in the XML warehouse, carrying out the XML inquiry of present embodiment since in the XML warehouse based on the pack storing X ML data of simple path; Therefore, can obtain the data that belong to this simple path apace through a simple path, thereby; Can reduce I/O number when data query from the XML warehouse; Simultaneously, because the node data that belongs to a simple path all is with the sequential storage of node identifier, therefore; Can guarantee that in the process of carrying out the XML inquiry predicate computing and serializing operate all in linear session, thereby improve the performance of XML data query.
Under same inventive concept, Fig. 7 is the schematic block diagram based on simple path device of storing X ML data in the XML warehouse according to an embodiment of the invention.Below in conjunction with accompanying drawing, present embodiment is elaborated.
As previously mentioned, in the present embodiment, the XML warehouse comprises simple path warehouse and data warehouse.
As shown in Figure 7, the device 700 based on simple path storing X ML data in the XML warehouse of present embodiment comprises: node identifier generation module 701, and it generates the node identifier that is used for unique this node of identification to each node in the XML document; And simple path generation module 702, it generates the simple path of XML document; Wherein, The simple path that is generated is stored in the simple path warehouse 703; And being index with the simple path is stored into the data of each node in each simple path in the data warehouse 704 successively, and the data of node comprise the node identifier and the value of this node.
When certain XML document need store in the XML warehouse, node identifier generation module 701 was that each node of this XML document generates node identifier.Preferably, node identifier comprises the document identification of this XML document and the coded strings of this node.Particularly, in node identifier generation module 701, the document identification allocation units distribute document identification for this XML document, and simultaneously, the coded strings generation unit generates coded strings for this node.In the present embodiment, the coded strings generation unit adopts the Dewey coding method to generate coded strings.
Then, simple path generation module 702 generates the simple path of this XML document.About simple path, in the detailed description of embodiment shown in Figure 1, introduced, omit its explanation here.Then, the simple path that is generated is stored in the simple path warehouse, and to be index with the simple path store the data of each node of each simple path in the data warehouse into successively.
Further, the device 700 based on simple path storing X ML data in the XML warehouse of present embodiment can also comprise: determination module 705, and it confirms whether comprise the simple path that is generated by simple path generation module 702 in the simple path warehouse 703; And simple path identification distribution module 706, it is that the simple path of being stored distributes unique simple path sign.Like this, when storing into the simple path that is generated in the simple path warehouse, determination module 705 confirms whether the simple paths that generated are comprised, if certain simple path comprised, then ignore this simple path; If certain simple path is not comprised, then store this simple path, and distribute a unique simple path to identify to this simple path by simple path identification distribution module 706.
In the present embodiment, the simple path warehouse adopts the form of XML file, and data warehouse can adopt the form of B+ tree.About simple path warehouse that adopts the XML file and the data warehouse that adopts the B+ tree, described in front, omit its explanation here.
The device 700 based on simple path storing X ML data in the XML warehouse that should be pointed out that present embodiment can be realized the method based on simple path storing X ML data in the XML warehouse shown in Figure 2 in operation.
Under same inventive concept, Fig. 8 is a schematic block diagram of in the XML warehouse, carrying out the device of XML inquiry according to an embodiment of the invention, wherein, for the part identical with front embodiment, suitably omits its explanation.
In the present embodiment, use the device 700 of storing X ML data as shown in Figure 7 in the XML warehouse, to store a plurality of XML documents.Preferably, these XML documents have similar pattern.
As shown in Figure 8, the device 800 of in the XML warehouse, carrying out the XML inquiry of present embodiment comprises: receiver module 801, and it receives the XML inquiry; Parsing module 802; It is resolved the XML inquiry that is received and carries out tree to generate, wherein based on the simple path warehouse in XML warehouse; Said execution tree is made up by one or more main simple paths, and each main simple path is associated with zero or a plurality of secondary simple path and predicate simple path; And execution module 803, it carries out the execution tree that is generated by parsing module 802 based on the data warehouse in XML warehouse, to obtain the result of this XML inquiry.
About main simple path, secondary simple path and predicate simple path, describe among the embodiment in front, suitably omit its explanation here.
In the present embodiment, when receiver module 801 receives an XML inquiry,, send it to parsing module 802 and resolve for example based on the inquiry of XPath.In parsing module 802, at first for each node in this XML inquiry, carry out the title test, to find the node with this node matching by 8021 pairs of these nodes of title test cell.Then; Carry out the simple path that tree construction unit 8022 obtains the node that comprises this coupling; To make up the execution tree relevant with this node; And record is to other test (comprising node type test, the test of node data type) of this node in the execution tree relevant with this node for record cell 8023, and simultaneously, predicate computing judging unit 8024 judges whether this node exists the predicate computing.When predicate computing judging unit 8024 judges that there is the predicate computing in this node; Carry out the title test by 8021 pairs of predicate nodes of title test cell, and by record cell 8023 record predicate node and the position relation of this node and value expression of predicate computing in the execution tree relevant with this node.The execution tree that updating block 8025 usefulness are relevant with this node upgrades the execution tree relevant with previous node.In the present embodiment, parsing module 802 can carry out following simple modification and obtain on the basis of existing XPath resolver: 802 of parsing modules carry out title to present node to be tested, and other test then directly records in the execution tree that is generated; If there is the predicate computing in present node, then the predicate node is also only carried out the title test, and the value expression of the relation of predicate node and present node and predicate records also in the execution tree.The execution tree that parsing module 802 is exported is as shown in Figure 6.
Further, parsing module can also comprise: predicate is confirmed the unit, and it confirms structure predicate and value predicate for each predicate simple path.About structure predicate and value predicate, describe among the embodiment in front, omit its explanation here
After having obtained to carry out tree, execution module 803 is carried out this execution tree to obtain the result of XML inquiry based on data warehouse.In execution module 803; Addressed location 8031 is according to the simple path sign of each main simple path, the simple path sign of each secondary simple path and the simple path sign of each predicate simple path; The data warehouse in visit XML warehouse is with the data of all nodes of the data of all nodes of the data of all nodes of obtaining main simple path, secondary simple path and predicate simple path.Then, predicate arithmetic element 8032 uses the data of all nodes of each the predicate simple path that belongs to this main simple path to carry out the predicate computing to the data of each node in each main simple path successively.Result in the predicate computing that the data of this node are carried out is true time; Data retrieval unit 8033 is subordinated to the data of the descendent node of the node that takes out this main simple path in the data of all nodes of secondary simple path of this main simple path, and by serializing unit 8034 data sequence of being taken out is turned to the XML fragment.To after the finishing dealing with of all main simple paths, merge cells 8035 merges all XML fragments, thereby obtains the result of XML inquiry.
The device 800 of in the XML warehouse, carrying out the XML inquiry that should be pointed out that present embodiment can be realized the method for in the XML warehouse, carrying out the XML inquiry shown in Figure 5 in operation.
Should be understood that; The device of carrying out the XML inquiry based on the device of simple path storing X ML data in the XML warehouse with in the XML warehouse in the foregoing description and each ingredient thereof can by such as VLSI (very large scale integrated circuits) or gate array, such as the semiconductor of logic chip, transistor etc., or realize such as the hardware circuit of the programmable hardware device of field programmable gate array, programmable logic device etc.; Also can use the software of carrying out by various types of processors to realize, also can realize by the combination of above-mentioned hardware circuit and software.
Though more than embodiment through certain exemplary describe the method and apparatus of carrying out the XML inquiry based on the method and apparatus of simple path storing X ML data in the XML warehouse and in the XML warehouse of the present invention in detail; But above these embodiment are not exhaustive, and those skilled in the art can realize variations and modifications within the spirit and scope of the present invention.Therefore, the present invention is not limited to these embodiment, and scope of the present invention is only limited appended claim.

Claims (22)

1. method based on simple path storing X ML data in the XML warehouse, said XML warehouse comprises simple path warehouse and data warehouse, said method comprises:
To each node in the XML document, generate the node identifier that is used for unique this node of identification;
Generate the simple path of said XML document;
Said simple path is stored in the said simple path warehouse;
Being index with the simple path stores the data of each node in each said simple path in the said data warehouse successively, and wherein said data comprise the node identifier and the value of this node;
Wherein, said data warehouse adopts the form of B+ tree, and wherein, as key assignments, the value of said node is included in the leaf node of correspondence of said B+ tree said B+ tree with " simple path sign, node identifier ".
2. method according to claim 1, wherein, said node identifier comprises: the document identification of said XML document and the coded strings of this node.
3. method according to claim 2, wherein, said coded strings is based on that the Dewey coding method generates.
4. method according to claim 1, wherein, said said simple path is stored in the said simple path warehouse comprises:
Confirm whether comprise said simple path in the said simple path warehouse;
If comprised said simple path, then ignore said simple path; Otherwise, store said simple path; And
For the simple path of being stored distributes a simple path sign.
5. according to any described method of claim 1 to 4, wherein, said simple path warehouse adopts the form of XML file.
6. a method of in the XML warehouse, carrying out the XML inquiry wherein, uses the method for any described storing X ML data of claim 1 to 4 to store a plurality of XML documents in said XML warehouse, and said method comprises:
Receive said XML inquiry;
Simple path warehouse based on said XML warehouse; Resolve said XML inquiry and carry out tree to generate; Wherein, said execution tree is made up by one or more main simple paths, and each main simple path is associated with zero or a plurality of secondary simple path and predicate simple path; And
Based on the data warehouse in said XML warehouse, carry out said execution tree to obtain the result of said XML inquiry.
7. method according to claim 6, wherein, the said XML inquiry of said parsing comprises to generate the step of carrying out tree: for each node in the said XML inquiry,
This node is carried out title test, to find the node with this node matching;
Obtain the simple path of the node that comprises said coupling, to make up the execution tree relevant with this node;
In the said execution tree relevant, write down other test to this node with this node;
Judge whether this node exists the predicate computing,, then the predicate node is carried out the title test, and in the said execution tree relevant, write down the position relation of said predicate node and this node and the value expression of said predicate computing with this node if exist; And
Upgrade the execution tree relevant with the said execution tree relevant with previous node with this node.
8. method according to claim 7, wherein, the said XML inquiry of said parsing also comprises to generate the step of carrying out tree: for each said predicate simple path, confirm structure predicate and value predicate.
9. method according to claim 8, wherein, the step of the said execution tree of said execution comprises:
According to the simple path sign of each said main simple path, the simple path sign of each said secondary simple path and the simple path sign of each said predicate simple path; Visit the data warehouse in said XML warehouse, with the data of all nodes of the data of all nodes of the data of all nodes of obtaining this main simple path, this secondary simple path and this predicate simple path;
For the data of each the said node in each said main simple path,
Use the data of all nodes of each the said predicate simple path that belongs to this main simple path to carry out the predicate computing successively;
Result for the said predicate computing of the data of this node is true time, is subordinated to the data of taking out the descendent node of this node in the data of all nodes of secondary simple path of this main simple path;
The data sequence of being taken out is turned to the XML fragment; And
Merge all said XML fragments to obtain the result of said XML inquiry.
10. method according to claim 6, wherein, said XML inquiry is based on XPath's.
11. method according to claim 6, wherein, said XML document has similar pattern.
12. the device based on simple path storing X ML data in the XML warehouse, said XML warehouse comprises simple path warehouse and data warehouse, and said device comprises:
The node identifier generation module is used for each node to XML document, generates the node identifier that is used for unique this node of identification;
The simple path generation module is used to generate the simple path of said XML document;
Wherein, The simple path that is generated is stored in the said simple path warehouse; And being index with the simple path stores the data of each node in each said simple path in the said data warehouse successively, and wherein said data comprise the node identifier and the value of this node;
Wherein, said data warehouse adopts the form of B+ tree, and wherein, as key assignments, and the value of said node is included in the leaf node of correspondence of said B+ tree said B+ tree with " simple path sign, node identifier ".
13. device according to claim 12, wherein, said node identifier comprises: the document identification of said XML document and the coded strings of this node;
Said node identifier generation module comprises:
The document identification allocation units are used to said XML document and distribute document identification; And
The coded strings generation unit is used to said node and generates coded strings.
14. device according to claim 13, wherein, said coded strings generation unit adopts the Dewey coding method.
15., also comprise according to any described device of claim 12 to 14:
Determination module is used for confirming whether said simple path warehouse comprises said simple path; And
Simple path identification distribution module is used to the simple path of being stored and distributes the simple path sign;
Wherein, when said determination module confirmed that said simple path has been comprised, said simple path was left in the basket; Otherwise said simple path is stored in the said simple path warehouse.
16. according to any described device of claim 12 to 14, wherein, said simple path warehouse adopts the form of XML file.
17. a device of in the XML warehouse, carrying out the XML inquiry wherein, uses the device of any described storing X ML data of claim 12 to 14 to store a plurality of XML documents in said XML warehouse, said device comprises:
Receiver module is used to receive said XML inquiry;
Parsing module; Be used for simple path warehouse, resolve said XML inquiry and carry out tree, wherein to generate based on said XML warehouse; Said execution tree is made up by one or more main simple paths, and each main simple path is associated with zero or a plurality of secondary simple path and predicate simple path; And
Execution module is used for the data warehouse based on said XML warehouse, carries out said execution tree to obtain the result of said XML inquiry.
18. device according to claim 17, wherein, said parsing module comprises:
The title test cell is used for each node for said XML inquiry, and this node is carried out the title test, to find the node with this node matching;
Carry out the tree construction unit, be used to obtain the simple path of the node that comprises said coupling, to make up the execution tree relevant with this node;
Record cell is used for writing down other test to this node in the said execution tree relevant with this node;
Predicate computing judging unit; Be used to judge whether this node exists the predicate computing; Wherein, When there is the predicate computing in said this node of predicate computing judgment unit judges, by said title test cell the predicate node is carried out the title test, and by the record cell said predicate node of record and the position relation of this node and value expression of said predicate computing in the said execution tree relevant with this node; And
Updating block is used for upgrading the execution tree relevant with previous node with the said execution tree relevant with this node.
19. device according to claim 18, wherein, said parsing module also comprises: predicate is confirmed the unit, is used for for each said predicate simple path, confirms structure predicate and value predicate.
20. device according to claim 17, wherein, said execution module comprises:
Addressed location; Be used for the simple path sign according to each said main simple path, the simple path sign of each said secondary simple path and the simple path sign of each said predicate simple path; Visit the data warehouse in said XML warehouse, with the data of all nodes of the data of all nodes of the data of all nodes of obtaining this main simple path, this secondary simple path and this predicate simple path;
The predicate arithmetic element is used for the data for each said node of each said main simple path, uses the data of all nodes of each the said predicate simple path that belongs to this main simple path to carry out the predicate computing successively;
The data retrieval unit is used for being true time in the result for the said predicate computing of the data of this node, is subordinated to the data of taking out the descendent node of this node in the data of all nodes of secondary simple path of this main simple path;
The serializing unit is used for the data sequence of being taken out is turned to the XML fragment; And
Merge cells is used to merge all said XML fragments to obtain the result of said XML inquiry.
21. device according to claim 17, wherein, said XML inquiry is based on XPath's.
22. device according to claim 17, wherein, said XML document has similar pattern.
CN200810212515A 2008-08-29 2008-08-29 XML data storing method, method and device thereof for executing XML query Expired - Fee Related CN101661481B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810212515A CN101661481B (en) 2008-08-29 2008-08-29 XML data storing method, method and device thereof for executing XML query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810212515A CN101661481B (en) 2008-08-29 2008-08-29 XML data storing method, method and device thereof for executing XML query

Publications (2)

Publication Number Publication Date
CN101661481A CN101661481A (en) 2010-03-03
CN101661481B true CN101661481B (en) 2012-09-26

Family

ID=41789509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810212515A Expired - Fee Related CN101661481B (en) 2008-08-29 2008-08-29 XML data storing method, method and device thereof for executing XML query

Country Status (1)

Country Link
CN (1) CN101661481B (en)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10698953B2 (en) 2009-10-30 2020-06-30 Oracle International Corporation Efficient XML tree indexing structure over XML content
CN101887458A (en) * 2010-07-06 2010-11-17 江苏大学 Path coding-based XML document index method
CN102043852B (en) * 2010-12-22 2012-07-18 东北大学 Path information based extensible markup language (XML) ancestor-descendant indexing method
CN102650992B (en) * 2011-02-25 2014-07-30 国际商业机器公司 Method and device for generating binary XML (extensible markup language) data and locating nodes of the binary XML data
CN102682028A (en) * 2011-03-17 2012-09-19 新奥特(北京)视频技术有限公司 Cataloguing description indexing method and device based on dynamic field storage
CN103123631B (en) * 2011-11-21 2015-12-02 阿里巴巴集团控股有限公司 The generation of official documents and correspondence, the methods of exhibiting of webpage official documents and correspondence, device and Website server
CN102768674B (en) * 2012-06-12 2016-08-24 北大方正集团有限公司 A kind of XML data based on path structure storage method
CN103246605A (en) * 2013-04-10 2013-08-14 深圳创维数字技术股份有限公司 Method and system for collocating remote controller key values based on xml (extensive markup language)
US9659045B2 (en) * 2013-11-08 2017-05-23 Oracle International Corporation Generic indexing for efficiently supporting ad-hoc query over hierarchically marked-up data
CN105224531A (en) * 2014-05-28 2016-01-06 腾讯科技(深圳)有限公司 The method and apparatus of localization of XML node
CN105138524A (en) * 2014-05-30 2015-12-09 北大方正信息产业集团有限公司 Method and apparatus for creating document node path index and server
CN105373561B (en) * 2014-08-28 2019-02-15 国际商业机器公司 The method and apparatus for identifying the logging mode in non-relational database
CN105608092B (en) * 2014-11-24 2020-07-14 北大方正集团有限公司 Method and device for creating dynamic index
CN107016071B (en) * 2017-03-23 2019-06-18 中国科学院计算技术研究所 A kind of method and system using simple path characteristic optimization tree data
CN108319684A (en) * 2018-01-31 2018-07-24 国信优易数据有限公司 A kind of storage method and device of expandable mark language XML file
US11640380B2 (en) 2021-03-10 2023-05-02 Oracle International Corporation Technique of comprehensively supporting multi-value, multi-field, multilevel, multi-position functional index over stored aggregately stored data in RDBMS

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006018584A (en) * 2004-07-01 2006-01-19 Toshiba Corp Structured document management system, and method and program for generating value-index
CN1949225A (en) * 2006-11-23 2007-04-18 金蝶软件(中国)有限公司 XML file preprocessing method, apparatus, file structure, reading method and device
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1965316A (en) * 2004-04-09 2007-05-16 甲骨文国际公司 Index for accessing XML data
JP2006018584A (en) * 2004-07-01 2006-01-19 Toshiba Corp Structured document management system, and method and program for generating value-index
CN1949225A (en) * 2006-11-23 2007-04-18 金蝶软件(中国)有限公司 XML file preprocessing method, apparatus, file structure, reading method and device

Also Published As

Publication number Publication date
CN101661481A (en) 2010-03-03

Similar Documents

Publication Publication Date Title
CN101661481B (en) XML data storing method, method and device thereof for executing XML query
CN102033954B (en) Full text retrieval inquiry index method for extensible markup language document in relational database
CN105518676B (en) Universal SQL enhancement to query arbitrary semi-structured data and techniques to efficiently support such enhancements
EP2901318B1 (en) Evaluating xml full text search
US7577642B2 (en) Techniques of XML query optimization over static and dynamic heterogeneous XML containers
US7840590B2 (en) Querying and fragment extraction within resources in a hierarchical repository
US8566343B2 (en) Searching backward to speed up query
CN100498782C (en) Method for quick updating data domain in full text retrieval system
US10242123B2 (en) Method and system for handling non-presence of elements or attributes in semi-structured data
US20100030726A1 (en) Mechanism For Deferred Rewrite Of Multiple Xpath Evaluations Over Binary XML
CN115840589A (en) Publishing method supporting heterogeneous distributed database
Qtaish et al. A narrative review of storing and querying XML documents using relational database
Ghaleb et al. A dynamic labeling scheme based on logical operators: a support for order-sensitive XML updates
Barbosa et al. Efficient incremental validation of XML documents after composite updates
Hsu et al. UCIS-X: an updatable compact indexing scheme for efficient extensible markup language document updating and query evaluation
Leonardi et al. Xandy: A scalable change detection technique for ordered XML documents using relational databases
Chebotko et al. XML subtree reconstruction from relational storage of XML documents
CN103488639B (en) A kind of querying method of XML data
CN113434748A (en) Template annotation based distributed crawler method and device, computer device and computer readable storage medium
Yang et al. Efficient mining of frequent XML query patterns with repeating-siblings
Wu et al. Processing XML twig pattern query with wildcards
Grün Pushing XML Main Memory Databases to their Limits.
US20080016088A1 (en) Techniques of XML query optimization over dynamic heterogeneous XML containers
Amer-Yahia et al. Logical and physical support for heterogeneous data
Hisbani A generic prototype for storing and querying XML documents in RDBMS using model mapping methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120926

Termination date: 20150829

EXPY Termination of patent right or utility model