CN107256218A - A kind of method for quickly querying of XML stream data - Google Patents
A kind of method for quickly querying of XML stream data Download PDFInfo
- Publication number
- CN107256218A CN107256218A CN201710256675.6A CN201710256675A CN107256218A CN 107256218 A CN107256218 A CN 107256218A CN 201710256675 A CN201710256675 A CN 201710256675A CN 107256218 A CN107256218 A CN 107256218A
- Authority
- CN
- China
- Prior art keywords
- label
- node
- pat
- inquiry
- xml
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
- G06F16/835—Query processing
- G06F16/8373—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/83—Querying
- G06F16/832—Query formulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention discloses a kind of method for quickly querying of XML stream data, the level fusion type inquiry being made up of multiple tag paths with complicated hierarchical relationship is carried out in XML stream data, the quick search model for solving this problem is constructed --- quick XML multi-tags path query (QXMTQ).QXMTQ models are based on succinct query interface (QI), enquiry navigation PAT data tree structures model (QGPATT) and quick search processing engine (QQE), the tag path to be inquired about is provided wherein in QI, complicated hierarchical relationship between model adaptation tag path, supports predicate expressions parameter interface (optional);QGPATT shares tab-navigation structure can make QQE more rapidly, more accurately match target labels, filter unrelated branch, inquire about and obtain related label value to auxiliary fast search structure PAT.Show that QXMTQ models have very prominent inquiry spatiotemporal efficiency for the complicated tag path of extensive XML stream data query many levels by test.
Description
Technical field
Exchanged the invention belongs to information and inquiry field, more particularly to a kind of method for quickly querying of XML stream data, specifically
To carry out the quick level fusion type inquiry being made up of multiple tag paths with labyrinth relation in XML stream data,
And return to two-dimensional result value set.
Background technology
XML (extensible markup language), you can the markup language of extension, is a set of definition semantic marker
Specification.XML provides unified method to describe and exchange the structural data independently of application program.With network application
It is fast-developing so that the data of XML type turn into the mainstream data form exchanged based on internet data.XML stream data be with
The form on-line normalization of stream, real-time arrival, it is necessary to parse and handle, therefore be directed to the fast quick checking of extensive XML stream data immediately
Inquiry processing is the study hotspot of continuous query model.
XML format data have natural hierarchical relationship i.e. tree structure relation, therefore under many application scenarios
Also there is hierarchical relationship characteristic for the inquiry of XML data.A kind of common hierarchical data inquiry in traditional relational
Using:The data of multiple fields are conditionally retrieved from the tables of data of different levels.It is assumed that following relation:Department (Department compiles Number, department name);Employee (employee number,Industry number, employee name, position, sex, age), be between department and employee
One-to-many relation (tree), it is corresponding to write a Chinese character in simplified form:Dept (dep_id, dep_name) and Emp (emp_id, dep_id,
name,title,gender,age).Obvious department table and employee's table are the tables of different levels relation, and the latter is the former sublist,
It is now to retrieve " age is more than all employee names, position and affiliated function's title of 40 years old ", then corresponding SQL pin
This sentence is as follows:
This stratification connection (amalgamation) inquiry is summarized to have the characteristics that:
1) circulating level of data, result set is loop-around data collection rather than the department centered on employee's level, in SQL
In it is default using lowest level as circulation center;
2) oneself affiliated function is marked it is necessary to have industry number in employee's table, employee's table space need to be taken;
3) incidence relation that attended operation is embodied between level has isolation again simultaneously, i.e., numbered with same department
Employee's (in employee's table) affiliated function's information (in department table) it is also identical, while different departments possesses different members
Work, even if there is trans-departmental employee, this employee information also occurs repeatedly (industry number is different, many-one relationship);
4) field to be inquired about also has level amalgamation, and each employee also includes affiliated in addition to the Proprietary Information of oneself
The name (department's name in department table, not in employee's table) of department.
As the main body of data exchange, XML has very strong hierarchical structure self-described characteristic again, and two above have
The tables of data (department table and employee's table) of " set membership " can be readily converted to the XML lattice of secondary cycle (department and employee)
Formula data, lower floor's XML branch is nested in some loop branches on upper strata, i.e., multiple employee informations (employee of same department)
Branch of lower floor is embedded into the upper strata branch of affiliated function, with natural or default level join condition, it is not necessary in employee
Layer indicates the industry number presence of ancestor node form of embedded employee's layer branch (industry number as), it is to avoid These characteristics
2) memory space is saved.Based on this, propose that above-mentioned level fusion type query demand is also natural for such XML data
Thing.Whether the so current Query Processing Technique for XML can be good at solving the above problems
XPath is a kind of language that information is searched in XML document, is W3C proposed standards, so far, academia's collection
The processing based on XML of middle discussion is all around XPath expansion.XPath chooses XML document by means of path expression
In node, node set, value of atom and node and value of atom mixing.By along location paths expression formula (path)
Or walk (steps) to choose interdependent node.
But it is due to phraseological limitation, what generally XPath was returned is all one-dimensional result set, the member of result set
All it is brotherhood between element, can not be obtained even if merging the value set of two expression formulas returns by force using joint operation " | "
To correct level fusion type result set.Therefore XPath, which is used alone, can not directly return with different levels structural relation
The value set of multiple fields (or tag path).
XQuery set up XPath expression formulas basis on be used for XML data query language, XQuery XPath it
Turn into W3C proposed standards afterwards.XQuery innately supports XPath and as a part for XQuery grammers, and XQuery is obvious
Any task that XPath can be done can be completed.
Because XQuery is that figure is clever complete (Turing-complete), a kind of all-purpose language can be counted as, thus
It is easy to overcome XPath many limitations, XQuery provides a collection of important built-in function and operator, but also provides
Express the function that any conversion is carried out to result set.But the complexity that XQuery is used substantially increases, returned and had using XQuery
The result set for having different levels structural relation generally requires to write the XQuery scripts of extremely complex multilayer nest, or even needs to compile
Could complete the inquiry of fusion type with the help of Cheng Yuyan, and script execution to be depended on spatiotemporal efficiency it is selected
XQuery query engines.
The content of the invention
In order to solve the above problems, the present invention provides a kind of method for quickly querying of XML stream data, using based on XML stream
The quick level fusion type interrogation model QXMTQ (Quick XML multiple tags query) of data, disclosure satisfy that inquiry
The requirement of adaptivity and higher spatiotemporal efficiency.
To achieve the above object, the present invention is adopted the following technical scheme that:
A kind of method for quickly querying of XML stream data, comprises the following steps:
Step 1, XML Schema definition pretreatments:Build enquiry navigation PAT trees
Step 101, search root element and the root node that query tree is created according to its element information;
Step 102, the most end element determined whether in Schema definition documents, if terminating this pretreatment, otherwise
Jump to step 103;
Step 103, find all daughter elements by currentElement, build corresponding subtab node and be simultaneously put into query tree
In, this specific details of daughter element label is put into the subtab node, is led while being put into parent-child label node
Boat information;
Step 104, according to node and its all build Patricia tries directly under child nodes and aid in fast search knot
Structure, the length of middle PAT nodes records " common " part, leaf PAT nodes point to corresponding directly under child's label node;
Step 2, query argument pretreatment:Build predicate expressions grammer and calculate tree, enquiry navigation PAT trees
Step 201, if predicate expressions are not present, jump to step 204, if it does, order perform it is next
Step 202;
Step 202, according to expression formula EBNF normal forms, syntactic analysis predicate conditions expression formula, and assign operand as leaf
Node, related operator as its father node, the like construction predicate grammer calculate tree;
Step 203, the tag path in each conditional expression subitem is appended in inquiry tag routing table;
Step 204, judge whether inquiry tag path list end, if not step 205 is performed, otherwise terminate inquiry
Parameter is pre-processed;
Step 205, for each inquiry tag path, a sequence label is split into first, is handled in sequence each
Individual label, the corresponding label node label in enquiry navigation PAT tree constructions focuses on to need to inquire about:This label node state is
" passing through ", while be " passing through " each PAT node labels state that father's label node leads to the label node, it is other
The state of PAT nodes is " refusal ";Then the next label of sequential search, until this sequence label terminates, redirects execution step
204;
Step 3, query processing simultaneously provide two-dimensional result collection
According to the event call-back distinct methods of generation in step 301, parsing target XML stream data file, resolving, its
Middle event " StartElement " performs step 302, event " Characters " and performs step 303, event " EndElement "
Perform step 304;
I.e. from current label node father's label node in step 302, input label matching search inquiry navigation PAT trees
PAT supplementary structures fast search algorithm corresponding with its is begun through, can quickly check and determine whether to navigate to correct child
On child node label, context state is updated according to matching result, including:" receiving " and " refusal " state;
Step 303, collect the corresponding label value of this event and be put into caching;
If step 304, the label node for reaching mark Predicate evaluation position, each tag path in expression formula is extracted
Respective value, then starts to carry out calculation expression according to predicate expressions grammer calculating tree construction, is as a result very then to perform step
305, as a result it is all label value collection of the false then refusal collection by the branch of root node of this label node;
If step 305, result of calculation are that true and this label node is " receiving " state, this secondary label correspondence is collected
It is worth and is put into caching;
Step 306, the label correspondence result set collected in all cachings, merge composition two-dimensional tag result set, terminate to look into
Inquiry is handled, and returns to two-dimensional result collection.
Preferably, query processing process is leading using XML stream data in step 3, enquiry navigation PAT trees are auxiliary direction
Effect, while the state change or change in context of record system;It is assumed that some child under current flow data node elements Nd
Child node Ndc, its label is Tdc, it is necessary to which the object for searching for matching is all child's labels under corresponding node Nq in query tree
Node label list TLqc, finds in label Tdc correspondence query trees after child's label node while detecting whether it is that inquiry gathers
Burnt label node.Because the list of labels is the complete or collected works of all child nodes under present node, in requisition for inquiry focus on
Child nodes list of labels be TLqfc, the relation of the two be TLqc include TLqfc;Matched and searched for using Patricia trie
Algorithm is matched, and is divided into following several situations:
A it is sky that) inquiry, which focuses on list of labels TLqfc, it is meant that need not continue to inquire about using Ndc as the data pathing of root node
Processing, can cut this data pathing;
B) TLqc is equal to TLqfc, it is meant that Ndc back end is necessarily that user inquires about the label focused on without matching,
Subsequent query processing can be continued;
C) TLqfc does not include TLqfc really for empty TLqc simultaneously, using Patricia searching algorithms, wherein PAT nodes records
The common ground length that label is skipped, the different object matching label of branch's correspondence searches for downward successively from PAT root nodes, directly
To arrival " receiving " or " refusal " PAT nodes;If " refusal " PAT nodes, it is meant that using Ndc as the data pathing of root node
Query processing need not be continued, this data pathing can be cut;If " receiving " PAT nodes, it is meant that this Ndc focuses on for inquiry
Label.
The querying method of the XML stream data of the present invention, carries out by multiple there is complicated hierarchical structure to close in XML stream data
The level fusion type inquiry of the tag path composition of system, and construct the quick search model for solving this problem --- quick XML
Multi-tag path query (QXMTQ).QXMTQ models are based on succinct query interface (QI), enquiry navigation PAT data tree structures
Model (QGPATT) and quick search processing engine (QQE), wherein the tag path to be inquired about is provided in QI, it is adaptive by model
The complicated hierarchical relationship between tag path is answered, predicate expressions parameter interface (optional) is supported;QGPATT shares label
Navigational structure and auxiliary fast search structure PAT can make QQE more rapidly, more accurately match target labels, unrelated point of filtering
Branch, the label value inquired about and obtain correlation.Show that QXMTQ models are directed to the extensive multiple layers of XML stream data query by test
Secondary complicated tag path has very prominent inquiry spatiotemporal efficiency.
Brief description of the drawings
Fig. 1:XML Schema define pretreatment process figure;
Fig. 2:Query argument pretreatment process figure;
Fig. 3:Query processing flow chart.
Embodiment
The present invention provides a kind of method for quickly querying of XML stream data, is merged using the quick level based on XML stream data
Formula interrogation model QXMTQ (Quick XML multiple tags query), the model receives multiple queries tag path, this
Relationship Comparison between a little paths is complicated, including:" set membership ", " brotherhood ", " uncle and nephew relation ", " ancestors-descendants pass
System ", " uncle-in-law-grandnephew's relation " etc..Result set after being extracted by parsing inquiry is a two-dimentional set, the first dimension be by
According to each corresponding result subset of specified " family "/branch of storage order in XML data, the result subset of the second dimension is one
Mapping table, wherein it is tag path that " key " is corresponding, " value " is the end value after the fusion specified by tag path.Using specific
Technical scheme is as follows:
The 1st, simple query interface is provided
User only needs to the circulation point tag path and predicate expressions for simply providing inquiry tag path list, specifying
(optional), it is not necessary to complicated structural relation between additional attention multiple queries tag path, it is not necessary to query process
Intervened, it is not necessary to do the work that " quadratic programming " or " searching again " collects result.System can be encapsulated and adaptively inquired about
Complicated hierarchical relationship between tag path, inquires about and automatically extracts corresponding result value set so that user can be more
It is absorbed in the business demand of oneself, rapidly adapts to business change.
Because XML definition file, inquiry request parameter are inconsistent with XML data change frequency, usually, XML definition text
Part hardly changes;The pretreatment of inquiry request parameter has relative independence, is not required to place before XML data per treatment
XML definition file or query argument are managed, so whole query process is divided into following three parts:
2nd, to the pretreatment of XML definition file
1) XML definition complete or collected works tree is built, XML Schema describe the total of XML document, including:Member in document
The information such as plain definition, attribute definition, daughter element definition and daughter element cycle-index.The XML stream data of all inputs all follow this
Description definition, makes full use of parsing and query process of these information to XML to be instructed while filtering unrelated branch, Neng Goujia
Fast processing speed.Based on the XML Schema elements defined and its structural relation, (, as node, structural relation is made for element and attribute
For guiding indicating pointer) build in XML definition complete or collected works tree (predecessor of enquiry navigation PAT complete or collected works trees only has label node), this tree
Node is referred to as label node;
2) the Patricia trie indexes of " parent-child " node are built, are correspondingly referred to accelerate to be found by father's node searching
Determine the process of the child nodes of tag names, build the Patricia trie indexes with " parent-child " node, searched as matching
The static secondary index structure of rope." shared " part (PAT structure nodes) is found in all child's tag characters strings, " no
Together " partly as PAT branches, the like the ultimately constructed enquiry navigation PAT complete or collected works trees with Patricia tries structures,
Node in the index structure is referred to as PAT nodes.
3rd, the pretreatment of inquiry request parameter
3.1st, build predicate grammer and calculate tree
Syntactic analysis is carried out to the predicate expressions of input, construction predicate grammer is calculated after decomposing predicate conditions expression formula
Tree, operand is as leaf node, using related operator as its father node, the like.Here operand includes:Number
Word, character string, TRUE, FALSE and tag path (are considered as script variable, its corresponding label before calculating
Value is substituted into wherein).Then calculate and set according to conditional expression semantic analysis predicate grammer, and be organized into what is be made up of condition subitem
By condition itemize constitute can the sequence of calculation.Finally the tag path being related in each conditional expression subitem is appended to and looked into
Ask in tag path list.
3.2nd, inquiry tag path is focused in enquiry navigation PAT complete or collected works trees
1) each paths in the inquiry tag path list of input are all user's concerns, and system is needed according to this road
Footpath is inquired about into XML stream data extracts related label value.Accordingly, it would be desirable to be directed in XML definition navigation PAT complete or collected works' tree constructions
Meet each intermediate label node label in inquiry tag path:" refusal " or " passing through ";Each leaf label node marks " refusal "
Or " receiving ";Other label nodes are all " refusal ".Meeting each " father and son " label node in inquiry tag path simultaneously
It is also required to increase related mark in Patricia trie configuration indexs:Each PAT node labels in the PAT chains of inquiry tag are " logical
Cross ", last PAT node label " receiving ", other PAT nodes are all " refusal ".Eventually pass for inquiry tag path
Tree construction after focusing is referred to as enquiry navigation PAT trees.
2) each node for constituting query tree will also have detailed navigation information in addition to the information of label to be had, side
Just to this inquiry traversal of tree, while also to mark the position that can be calculated predicate expressions.
4th, query processing and result set is collected
1) XML data stream document is read in, using SAX (Simple API for XML) technology.SAX is the XML of lightweight
Analytic method, its substantially course of work is:Reader (Reader) reads in partial XML flow data, follow-up parsing work first
By event-driven, including:StartDocument, EndDocument, StartElement, EndElement and
Characters, the rest may be inferred repeats said process until XML stream end of data.DOM parsings are contrasted, SAX is parsed when reading, nothing
Whole document need to be read in internal memory, be particularly suitable for parsing large XML document.The present invention is the inquiry just for inquiry tag path
Filtration treatment is also added in event call-back method, and further reducing needs the XML data amount of matching treatment, is conducive to saving storage
Space, reduces the parsing time, improves spatiotemporal efficiency.
2) query processing process is leading using XML stream data, and enquiry navigation PAT trees are auxiliary directive function, are recorded simultaneously
The state change or change in context of system.
It is assumed that some child nodes Ndc under current flow data node elements Nd, its label is Tdc, it is necessary to search for matching
Object be all child's label node list of labels TLqc in query tree under corresponding node Nq, find label Tdc correspondences and look into
Ask and detect whether it is to inquire about the label node focused on after child's label node simultaneously in tree.Because the list of labels is to work as prosthomere
Point under all child nodes complete or collected works, in requisition for inquiry focus on child nodes list of labels be TLqfc, the relation of the two
TLqfc is included for TLqc.In order to accelerate matching process, the present invention is calculated using efficient, quick Patricia trie matchings search
Method, is divided into following several situations:
A it is sky that) inquiry, which focuses on list of labels TLqfc, it is meant that need not continue to inquire about using Ndc as the data pathing of root node
Processing, can cut this data pathing;
B) TLqc is equal to TLqfc, it is meant that Ndc back end is necessarily that user inquires about the label focused on without matching,
Subsequent query processing can be continued;
C) TLqfc does not include TLqfc really for empty TLqc simultaneously, using Patricia searching algorithms, wherein PAT nodes records
The common ground length that label is skipped, the different object matching label of branch's correspondence searches for downward successively from PAT root nodes, directly
To arrival " receiving " or " refusal " PAT nodes.If " refusal " PAT nodes, it is meant that using Ndc as the data pathing of root node
Query processing need not be continued, this data pathing can be cut.If " receiving " PAT nodes, it is meant that this Ndc focuses on for inquiry
Label.
3) start to calculate predicate expressions in the calculating point position marked, be as a result the Directory Enquiries for very then continuing the branch
Reason, otherwise will skip the processing to this branch, jump to next loop branches.
As shown in Figure 1, 2, 3, the method for quickly querying of XML stream data of the present invention, specific handling process includes following step
Suddenly:
Step 1, XML Schema definition pretreatments:Build enquiry navigation PAT trees
Step 101 searches for root element and the root node of query tree is created according to its element information.
Step 102 determines whether the most end element in Schema definition documents, if terminating this pretreatment, otherwise jumps
Go to step 103.
Step 103 finds all daughter elements by currentElement, builds corresponding subtab node and is put into query tree
In, this specific details of daughter element label is put into the subtab node, is led while being put into parent-child label node
Boat information, including:" father and son " pointer, " sub- father " pointer etc..
Step 104 is according to node and its all build Patricia tries directly under child nodes and aids in fast search knot
Structure, the length of middle PAT nodes records " common " part, leaf PAT nodes point to corresponding directly under child's label node.
Step 2, query argument pretreatment:Build predicate expressions grammer and calculate tree, enquiry navigation PAT trees
Step 201 jumps to step 204 if predicate expressions are not present, if it does, order performs next step
Rapid 202.
Step 202 assign operand as leaf section according to expression formula EBNF normal forms, syntactic analysis predicate conditions expression formula
Point, related operator as its father node, the like construction predicate grammer calculate tree.
Step 203 is appended to the tag path in each conditional expression subitem in inquiry tag routing table.
Step 204 judges whether inquiry tag path list end, if not step 205 is performed, otherwise terminates inquiry
Parameter is pre-processed.
Step 205, for each inquiry tag path, is split into a sequence label, handled in sequence first to 207
Each label, the corresponding label node label in enquiry navigation PAT tree constructions focuses on to need to inquire about:This label node shape
State is " passing through ", while be " passing through " each PAT node labels state that father's label node leads to the label node, it is other
PAT nodes state be " refusal ".Then the next label of sequential search, repeats step 207, until this sequence label
Terminate, redirect execution step 204.
Step 3, query processing simultaneously provide two-dimensional result collection
According to the event call-back distinct methods of generation in step 301 parsing target XML stream data file, resolving, its
Middle event " StartElement " performs step 302, event " Characters " and performs step 303, event " EndElement "
Perform step 304.Filtration treatment is directly started a query in resolving, reduction needs the target XML stream data of matching treatment to advise
Mould, is conducive to saving memory space, reduces the parsing time, improve spatiotemporal efficiency.
I.e. from current label node father's label node in step 302 input label matching search inquiry navigation PAT trees
PAT supplementary structures fast search algorithm corresponding with its is begun through, can quickly check and determine whether to navigate to correct child
On child node label, context state is updated according to matching result, including:" receiving " and " refusal " state.
Step 303 is collected the corresponding label value of this event and is put into caching.
If step 304 reaches the label node of mark Predicate evaluation position, each tag path in expression formula is extracted
Respective value, then starts to carry out calculation expression according to predicate expressions grammer calculating tree construction, is as a result very then to perform step
305, as a result it is all label value collection of the false then refusal collection by the branch of root node of this label node.
If step 305 result of calculation is that true and this label node is " receiving " state, this secondary label correspondence is collected
It is worth and is put into caching.
Step 306 collects the label correspondence result set in all cachings, merges composition two-dimensional tag result set, terminates inquiry
Processing, and return to two-dimensional result collection.
Claims (2)
1. a kind of method for quickly querying of XML stream data, it is characterised in that comprise the following steps:
Step 1, XML Schema definition pretreatments:Build enquiry navigation PAT trees
Step 101, search root element and the root node that query tree is created according to its element information;
Step 102, the most end element determined whether in Schema definition documents, if terminating this pretreatment, are otherwise redirected
To step 103;
Step 103, find all daughter elements by currentElement, build corresponding subtab node and be simultaneously put into query tree,
This specific details of daughter element label is put into the subtab node, while being put into navigation in parent-child label node
Information;
Step 104, according to node and its it is all build Patricia tries directly under child nodes and aid in fast search structure, in
Between PAT nodes records " common " part length, leaf PAT nodes point to it is corresponding directly under child's label node;
Step 2, query argument pretreatment:Build predicate expressions grammer and calculate tree, enquiry navigation PAT trees
Step 201, if predicate expressions are not present, jump to step 204, if it does, order perform next step
202;
Step 202, according to expression formula EBNF normal forms, syntactic analysis predicate conditions expression formula, and using operand as leaf node,
Related operator as its father node, the like construction predicate grammer calculate tree;
Step 203, the tag path in each conditional expression subitem is appended in inquiry tag routing table;
Step 204, judge whether inquiry tag path list end, if not step 205 is performed, otherwise terminate query argument
Pretreatment;
Step 205, for each inquiry tag path, a sequence label is split into first, handle in sequence each mark
Label, the corresponding label node label in enquiry navigation PAT tree constructions focuses on to need to inquire about:This label node state is " logical
Cross ", while be " passing through " each PAT node labels state that father's label node leads to the label node, other PAT sections
The state of point is " refusal ";Then the next label of sequential search, until this sequence label terminates, redirects execution step 204;
Step 3, query processing simultaneously provide two-dimensional result collection
According to the event call-back distinct methods of generation, wherein thing in step 301, parsing target XML stream data file, resolving
Part " StartElement " performs step 302, event " Characters " and performs step 303, event " EndElement " execution
Step 304;
I.e. in step 302, input label matching search inquiry navigation PAT trees since current label node father's label node
By PAT supplementary structures fast search algorithm corresponding with its, it can quickly check and determine whether to navigate to correct child's section
On point label, context state is updated according to matching result, including:" receiving " and " refusal " state;
Step 303, collect the corresponding label value of this event and be put into caching;
If step 304, the label node for reaching mark Predicate evaluation position, each tag path correspondence in expression formula is extracted
Value, then starts to carry out calculation expression according to predicate expressions grammer calculating tree construction, is as a result very then to perform step 305, knot
Fruit is all label value collection of the false then refusal collection by the branch of root node of this label node;
If step 305, result of calculation are that true and this label node is " receiving " state, this secondary label respective value is collected simultaneously
It is put into caching;
Step 306, the label correspondence result set collected in all cachings, merge composition two-dimensional tag result set, terminate Directory Enquiries
Reason, and return to two-dimensional result collection.
2. the method for quickly querying of XML stream data as claimed in claim 1, it is characterised in that query processing process in step 3
It is leading using XML stream data, enquiry navigation PAT trees are auxiliary directive function, while the state change of record system or up and down
Text change;It is assumed that some child nodes Ndc under current flow data node elements Nd, its label is Tdc, it is necessary to search for matching
Object is all child's label node list of labels TLqc under corresponding node Nq in query tree, finds label Tdc correspondence inquiries
Detect whether it is to inquire about the label node focused on after child's label node simultaneously in tree.Because the list of labels is present node
Under all child nodes complete or collected works, in requisition for the child nodes list of labels that focuses on of inquiry be TLqfc, the relation of the two is
TLqc includes TLqfc;Matched using Patricia trie match search algorithms, be divided into following several situations:
A it is sky that) inquiry, which focuses on list of labels TLqfc, it is meant that need not continue Directory Enquiries using Ndc as the data pathing of root node
Reason, can cut this data pathing;
B) TLqc is equal to TLqfc, it is meant that Ndc back end, can be with without the label that matching is necessarily that user's inquiry is focused on
Continue subsequent query processing;
C) TLqfc does not include TLqfc really for empty TLqc simultaneously, using Patricia searching algorithms, wherein PAT nodes records label
The common ground length skipped, the different object matching labels of branch's correspondence, searches for downward successively from PAT root nodes, until to
Up to " receiving " or " refusal " PAT nodes;If " refusal " PAT nodes, it is meant that the data pathing using Ndc as root node need not
Continue query processing, this data pathing can be cut;If " receiving " PAT nodes, it is meant that the mark that this Ndc focuses on for inquiry
Label.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710256675.6A CN107256218B (en) | 2017-04-19 | 2017-04-19 | Quick query method of XML (extensive Makeup language) stream data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710256675.6A CN107256218B (en) | 2017-04-19 | 2017-04-19 | Quick query method of XML (extensive Makeup language) stream data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107256218A true CN107256218A (en) | 2017-10-17 |
CN107256218B CN107256218B (en) | 2021-01-05 |
Family
ID=60027537
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710256675.6A Active CN107256218B (en) | 2017-04-19 | 2017-04-19 | Quick query method of XML (extensive Makeup language) stream data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107256218B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1941743A (en) * | 2006-09-21 | 2007-04-04 | 复旦大学 | Method for inquiring and matching XML-flow data complex small-branch mode |
US7228296B2 (en) * | 2003-03-27 | 2007-06-05 | Fujitsu Limited | Devices for interpreting and retrieving XML documents, methods of interpreting and retrieving XML documents, and computer product |
CN101025760A (en) * | 2007-01-31 | 2007-08-29 | 王宏源 | Method for digitalizing family tree |
CN101089851A (en) * | 2007-07-12 | 2007-12-19 | 复旦大学 | XML flow buffer store manage method based on partial binary prefix code |
CN101093493A (en) * | 2006-06-23 | 2007-12-26 | 国际商业机器公司 | Speech conversion method for database inquiry, converter, and database inquiry system |
CN101247279A (en) * | 2007-10-23 | 2008-08-20 | 北京邮电大学 | Internet content safety detecting system |
-
2017
- 2017-04-19 CN CN201710256675.6A patent/CN107256218B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7228296B2 (en) * | 2003-03-27 | 2007-06-05 | Fujitsu Limited | Devices for interpreting and retrieving XML documents, methods of interpreting and retrieving XML documents, and computer product |
CN101093493A (en) * | 2006-06-23 | 2007-12-26 | 国际商业机器公司 | Speech conversion method for database inquiry, converter, and database inquiry system |
CN1941743A (en) * | 2006-09-21 | 2007-04-04 | 复旦大学 | Method for inquiring and matching XML-flow data complex small-branch mode |
CN101025760A (en) * | 2007-01-31 | 2007-08-29 | 王宏源 | Method for digitalizing family tree |
CN101089851A (en) * | 2007-07-12 | 2007-12-19 | 复旦大学 | XML flow buffer store manage method based on partial binary prefix code |
CN101247279A (en) * | 2007-10-23 | 2008-08-20 | 北京邮电大学 | Internet content safety detecting system |
Also Published As
Publication number | Publication date |
---|---|
CN107256218B (en) | 2021-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11481439B2 (en) | Evaluating XML full text search | |
CN103646032B (en) | A kind of based on body with the data base query method of limited natural language processing | |
US7739257B2 (en) | Search engine | |
CN107256217B (en) | Quick query method of XML data | |
US20060206466A1 (en) | Evaluating relevance of results in a semi-structured data-base system | |
CN101719156B (en) | System of seamless integrated pure XML query engine in relational database | |
US8972377B2 (en) | Efficient method of using XML value indexes without exact path information to filter XML documents for more specific XPath queries | |
CN101710318A (en) | Knowledge intelligent acquiring system of vegetable supply chains | |
KR102157218B1 (en) | Data transformation method for spatial data's semantic annotation | |
CN105117397A (en) | Method for searching semantic association of medical documents based on ontology | |
US20060161525A1 (en) | Method and system for supporting structured aggregation operations on semi-structured data | |
CN102819600B (en) | Keyword search methodology towards relational database of power production management system | |
Sanz et al. | Fragment-based approximate retrieval in highly heterogeneous XML collections | |
CN107256218A (en) | A kind of method for quickly querying of XML stream data | |
Panzeri et al. | An approach to define flexible structural constraints in xquery | |
Leela et al. | Schema-conscious XML indexing | |
Van de Maele et al. | An ontology-based crawler for the semantic web | |
Qtaish et al. | Query mapping techniques for XML documents: A comparative study | |
Finelli et al. | Semantic Search in Relational Databases | |
Córcoles et al. | A Spatio-Temporal Query Language for a data model based on XML. | |
Huang et al. | Accelerating XML Query Processing on Views | |
Liu et al. | A simple implementation of distributed vertical search and information integration technology | |
Özsu et al. | Web Data Management | |
Hu et al. | Query XML data in RDBMS | |
Georgiadis et al. | Efficient Physical Operators for a cost-based XPath Execution Engine |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |