CN107256218A - A kind of method for quickly querying of XML stream data - Google Patents

A kind of method for quickly querying of XML stream data Download PDF

Info

Publication number
CN107256218A
CN107256218A CN201710256675.6A CN201710256675A CN107256218A CN 107256218 A CN107256218 A CN 107256218A CN 201710256675 A CN201710256675 A CN 201710256675A CN 107256218 A CN107256218 A CN 107256218A
Authority
CN
China
Prior art keywords
label
node
pat
inquiry
xml
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710256675.6A
Other languages
Chinese (zh)
Other versions
CN107256218B (en
Inventor
谷晓钢
黄玲琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Normal University
Original Assignee
Jiangsu Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Normal University filed Critical Jiangsu Normal University
Priority to CN201710256675.6A priority Critical patent/CN107256218B/en
Publication of CN107256218A publication Critical patent/CN107256218A/en
Application granted granted Critical
Publication of CN107256218B publication Critical patent/CN107256218B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • G06F16/8373Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/832Query formulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a kind of method for quickly querying of XML stream data, the level fusion type inquiry being made up of multiple tag paths with complicated hierarchical relationship is carried out in XML stream data, the quick search model for solving this problem is constructed --- quick XML multi-tags path query (QXMTQ).QXMTQ models are based on succinct query interface (QI), enquiry navigation PAT data tree structures model (QGPATT) and quick search processing engine (QQE), the tag path to be inquired about is provided wherein in QI, complicated hierarchical relationship between model adaptation tag path, supports predicate expressions parameter interface (optional);QGPATT shares tab-navigation structure can make QQE more rapidly, more accurately match target labels, filter unrelated branch, inquire about and obtain related label value to auxiliary fast search structure PAT.Show that QXMTQ models have very prominent inquiry spatiotemporal efficiency for the complicated tag path of extensive XML stream data query many levels by test.

Description

A kind of method for quickly querying of XML stream data
Technical field
Exchanged the invention belongs to information and inquiry field, more particularly to a kind of method for quickly querying of XML stream data, specifically To carry out the quick level fusion type inquiry being made up of multiple tag paths with labyrinth relation in XML stream data, And return to two-dimensional result value set.
Background technology
XML (extensible markup language), you can the markup language of extension, is a set of definition semantic marker Specification.XML provides unified method to describe and exchange the structural data independently of application program.With network application It is fast-developing so that the data of XML type turn into the mainstream data form exchanged based on internet data.XML stream data be with The form on-line normalization of stream, real-time arrival, it is necessary to parse and handle, therefore be directed to the fast quick checking of extensive XML stream data immediately Inquiry processing is the study hotspot of continuous query model.
XML format data have natural hierarchical relationship i.e. tree structure relation, therefore under many application scenarios Also there is hierarchical relationship characteristic for the inquiry of XML data.A kind of common hierarchical data inquiry in traditional relational Using:The data of multiple fields are conditionally retrieved from the tables of data of different levels.It is assumed that following relation:Department (Department compiles Number, department name);Employee (employee number,Industry number, employee name, position, sex, age), be between department and employee One-to-many relation (tree), it is corresponding to write a Chinese character in simplified form:Dept (dep_id, dep_name) and Emp (emp_id, dep_id, name,title,gender,age).Obvious department table and employee's table are the tables of different levels relation, and the latter is the former sublist, It is now to retrieve " age is more than all employee names, position and affiliated function's title of 40 years old ", then corresponding SQL pin This sentence is as follows:
This stratification connection (amalgamation) inquiry is summarized to have the characteristics that:
1) circulating level of data, result set is loop-around data collection rather than the department centered on employee's level, in SQL In it is default using lowest level as circulation center;
2) oneself affiliated function is marked it is necessary to have industry number in employee's table, employee's table space need to be taken;
3) incidence relation that attended operation is embodied between level has isolation again simultaneously, i.e., numbered with same department Employee's (in employee's table) affiliated function's information (in department table) it is also identical, while different departments possesses different members Work, even if there is trans-departmental employee, this employee information also occurs repeatedly (industry number is different, many-one relationship);
4) field to be inquired about also has level amalgamation, and each employee also includes affiliated in addition to the Proprietary Information of oneself The name (department's name in department table, not in employee's table) of department.
As the main body of data exchange, XML has very strong hierarchical structure self-described characteristic again, and two above have The tables of data (department table and employee's table) of " set membership " can be readily converted to the XML lattice of secondary cycle (department and employee) Formula data, lower floor's XML branch is nested in some loop branches on upper strata, i.e., multiple employee informations (employee of same department) Branch of lower floor is embedded into the upper strata branch of affiliated function, with natural or default level join condition, it is not necessary in employee Layer indicates the industry number presence of ancestor node form of embedded employee's layer branch (industry number as), it is to avoid These characteristics 2) memory space is saved.Based on this, propose that above-mentioned level fusion type query demand is also natural for such XML data Thing.Whether the so current Query Processing Technique for XML can be good at solving the above problems
XPath is a kind of language that information is searched in XML document, is W3C proposed standards, so far, academia's collection The processing based on XML of middle discussion is all around XPath expansion.XPath chooses XML document by means of path expression In node, node set, value of atom and node and value of atom mixing.By along location paths expression formula (path) Or walk (steps) to choose interdependent node.
But it is due to phraseological limitation, what generally XPath was returned is all one-dimensional result set, the member of result set All it is brotherhood between element, can not be obtained even if merging the value set of two expression formulas returns by force using joint operation " | " To correct level fusion type result set.Therefore XPath, which is used alone, can not directly return with different levels structural relation The value set of multiple fields (or tag path).
XQuery set up XPath expression formulas basis on be used for XML data query language, XQuery XPath it Turn into W3C proposed standards afterwards.XQuery innately supports XPath and as a part for XQuery grammers, and XQuery is obvious Any task that XPath can be done can be completed.
Because XQuery is that figure is clever complete (Turing-complete), a kind of all-purpose language can be counted as, thus It is easy to overcome XPath many limitations, XQuery provides a collection of important built-in function and operator, but also provides Express the function that any conversion is carried out to result set.But the complexity that XQuery is used substantially increases, returned and had using XQuery The result set for having different levels structural relation generally requires to write the XQuery scripts of extremely complex multilayer nest, or even needs to compile Could complete the inquiry of fusion type with the help of Cheng Yuyan, and script execution to be depended on spatiotemporal efficiency it is selected XQuery query engines.
The content of the invention
In order to solve the above problems, the present invention provides a kind of method for quickly querying of XML stream data, using based on XML stream The quick level fusion type interrogation model QXMTQ (Quick XML multiple tags query) of data, disclosure satisfy that inquiry The requirement of adaptivity and higher spatiotemporal efficiency.
To achieve the above object, the present invention is adopted the following technical scheme that:
A kind of method for quickly querying of XML stream data, comprises the following steps:
Step 1, XML Schema definition pretreatments:Build enquiry navigation PAT trees
Step 101, search root element and the root node that query tree is created according to its element information;
Step 102, the most end element determined whether in Schema definition documents, if terminating this pretreatment, otherwise Jump to step 103;
Step 103, find all daughter elements by currentElement, build corresponding subtab node and be simultaneously put into query tree In, this specific details of daughter element label is put into the subtab node, is led while being put into parent-child label node Boat information;
Step 104, according to node and its all build Patricia tries directly under child nodes and aid in fast search knot Structure, the length of middle PAT nodes records " common " part, leaf PAT nodes point to corresponding directly under child's label node;
Step 2, query argument pretreatment:Build predicate expressions grammer and calculate tree, enquiry navigation PAT trees
Step 201, if predicate expressions are not present, jump to step 204, if it does, order perform it is next Step 202;
Step 202, according to expression formula EBNF normal forms, syntactic analysis predicate conditions expression formula, and assign operand as leaf Node, related operator as its father node, the like construction predicate grammer calculate tree;
Step 203, the tag path in each conditional expression subitem is appended in inquiry tag routing table;
Step 204, judge whether inquiry tag path list end, if not step 205 is performed, otherwise terminate inquiry Parameter is pre-processed;
Step 205, for each inquiry tag path, a sequence label is split into first, is handled in sequence each Individual label, the corresponding label node label in enquiry navigation PAT tree constructions focuses on to need to inquire about:This label node state is " passing through ", while be " passing through " each PAT node labels state that father's label node leads to the label node, it is other The state of PAT nodes is " refusal ";Then the next label of sequential search, until this sequence label terminates, redirects execution step 204;
Step 3, query processing simultaneously provide two-dimensional result collection
According to the event call-back distinct methods of generation in step 301, parsing target XML stream data file, resolving, its Middle event " StartElement " performs step 302, event " Characters " and performs step 303, event " EndElement " Perform step 304;
I.e. from current label node father's label node in step 302, input label matching search inquiry navigation PAT trees PAT supplementary structures fast search algorithm corresponding with its is begun through, can quickly check and determine whether to navigate to correct child On child node label, context state is updated according to matching result, including:" receiving " and " refusal " state;
Step 303, collect the corresponding label value of this event and be put into caching;
If step 304, the label node for reaching mark Predicate evaluation position, each tag path in expression formula is extracted Respective value, then starts to carry out calculation expression according to predicate expressions grammer calculating tree construction, is as a result very then to perform step 305, as a result it is all label value collection of the false then refusal collection by the branch of root node of this label node;
If step 305, result of calculation are that true and this label node is " receiving " state, this secondary label correspondence is collected It is worth and is put into caching;
Step 306, the label correspondence result set collected in all cachings, merge composition two-dimensional tag result set, terminate to look into Inquiry is handled, and returns to two-dimensional result collection.
Preferably, query processing process is leading using XML stream data in step 3, enquiry navigation PAT trees are auxiliary direction Effect, while the state change or change in context of record system;It is assumed that some child under current flow data node elements Nd Child node Ndc, its label is Tdc, it is necessary to which the object for searching for matching is all child's labels under corresponding node Nq in query tree Node label list TLqc, finds in label Tdc correspondence query trees after child's label node while detecting whether it is that inquiry gathers Burnt label node.Because the list of labels is the complete or collected works of all child nodes under present node, in requisition for inquiry focus on Child nodes list of labels be TLqfc, the relation of the two be TLqc include TLqfc;Matched and searched for using Patricia trie Algorithm is matched, and is divided into following several situations:
A it is sky that) inquiry, which focuses on list of labels TLqfc, it is meant that need not continue to inquire about using Ndc as the data pathing of root node Processing, can cut this data pathing;
B) TLqc is equal to TLqfc, it is meant that Ndc back end is necessarily that user inquires about the label focused on without matching, Subsequent query processing can be continued;
C) TLqfc does not include TLqfc really for empty TLqc simultaneously, using Patricia searching algorithms, wherein PAT nodes records The common ground length that label is skipped, the different object matching label of branch's correspondence searches for downward successively from PAT root nodes, directly To arrival " receiving " or " refusal " PAT nodes;If " refusal " PAT nodes, it is meant that using Ndc as the data pathing of root node Query processing need not be continued, this data pathing can be cut;If " receiving " PAT nodes, it is meant that this Ndc focuses on for inquiry Label.
The querying method of the XML stream data of the present invention, carries out by multiple there is complicated hierarchical structure to close in XML stream data The level fusion type inquiry of the tag path composition of system, and construct the quick search model for solving this problem --- quick XML Multi-tag path query (QXMTQ).QXMTQ models are based on succinct query interface (QI), enquiry navigation PAT data tree structures Model (QGPATT) and quick search processing engine (QQE), wherein the tag path to be inquired about is provided in QI, it is adaptive by model The complicated hierarchical relationship between tag path is answered, predicate expressions parameter interface (optional) is supported;QGPATT shares label Navigational structure and auxiliary fast search structure PAT can make QQE more rapidly, more accurately match target labels, unrelated point of filtering Branch, the label value inquired about and obtain correlation.Show that QXMTQ models are directed to the extensive multiple layers of XML stream data query by test Secondary complicated tag path has very prominent inquiry spatiotemporal efficiency.
Brief description of the drawings
Fig. 1:XML Schema define pretreatment process figure;
Fig. 2:Query argument pretreatment process figure;
Fig. 3:Query processing flow chart.
Embodiment
The present invention provides a kind of method for quickly querying of XML stream data, is merged using the quick level based on XML stream data Formula interrogation model QXMTQ (Quick XML multiple tags query), the model receives multiple queries tag path, this Relationship Comparison between a little paths is complicated, including:" set membership ", " brotherhood ", " uncle and nephew relation ", " ancestors-descendants pass System ", " uncle-in-law-grandnephew's relation " etc..Result set after being extracted by parsing inquiry is a two-dimentional set, the first dimension be by According to each corresponding result subset of specified " family "/branch of storage order in XML data, the result subset of the second dimension is one Mapping table, wherein it is tag path that " key " is corresponding, " value " is the end value after the fusion specified by tag path.Using specific Technical scheme is as follows:
The 1st, simple query interface is provided
User only needs to the circulation point tag path and predicate expressions for simply providing inquiry tag path list, specifying (optional), it is not necessary to complicated structural relation between additional attention multiple queries tag path, it is not necessary to query process Intervened, it is not necessary to do the work that " quadratic programming " or " searching again " collects result.System can be encapsulated and adaptively inquired about Complicated hierarchical relationship between tag path, inquires about and automatically extracts corresponding result value set so that user can be more It is absorbed in the business demand of oneself, rapidly adapts to business change.
Because XML definition file, inquiry request parameter are inconsistent with XML data change frequency, usually, XML definition text Part hardly changes;The pretreatment of inquiry request parameter has relative independence, is not required to place before XML data per treatment XML definition file or query argument are managed, so whole query process is divided into following three parts:
2nd, to the pretreatment of XML definition file
1) XML definition complete or collected works tree is built, XML Schema describe the total of XML document, including:Member in document The information such as plain definition, attribute definition, daughter element definition and daughter element cycle-index.The XML stream data of all inputs all follow this Description definition, makes full use of parsing and query process of these information to XML to be instructed while filtering unrelated branch, Neng Goujia Fast processing speed.Based on the XML Schema elements defined and its structural relation, (, as node, structural relation is made for element and attribute For guiding indicating pointer) build in XML definition complete or collected works tree (predecessor of enquiry navigation PAT complete or collected works trees only has label node), this tree Node is referred to as label node;
2) the Patricia trie indexes of " parent-child " node are built, are correspondingly referred to accelerate to be found by father's node searching Determine the process of the child nodes of tag names, build the Patricia trie indexes with " parent-child " node, searched as matching The static secondary index structure of rope." shared " part (PAT structure nodes) is found in all child's tag characters strings, " no Together " partly as PAT branches, the like the ultimately constructed enquiry navigation PAT complete or collected works trees with Patricia tries structures, Node in the index structure is referred to as PAT nodes.
3rd, the pretreatment of inquiry request parameter
3.1st, build predicate grammer and calculate tree
Syntactic analysis is carried out to the predicate expressions of input, construction predicate grammer is calculated after decomposing predicate conditions expression formula Tree, operand is as leaf node, using related operator as its father node, the like.Here operand includes:Number Word, character string, TRUE, FALSE and tag path (are considered as script variable, its corresponding label before calculating Value is substituted into wherein).Then calculate and set according to conditional expression semantic analysis predicate grammer, and be organized into what is be made up of condition subitem By condition itemize constitute can the sequence of calculation.Finally the tag path being related in each conditional expression subitem is appended to and looked into Ask in tag path list.
3.2nd, inquiry tag path is focused in enquiry navigation PAT complete or collected works trees
1) each paths in the inquiry tag path list of input are all user's concerns, and system is needed according to this road Footpath is inquired about into XML stream data extracts related label value.Accordingly, it would be desirable to be directed in XML definition navigation PAT complete or collected works' tree constructions Meet each intermediate label node label in inquiry tag path:" refusal " or " passing through ";Each leaf label node marks " refusal " Or " receiving ";Other label nodes are all " refusal ".Meeting each " father and son " label node in inquiry tag path simultaneously It is also required to increase related mark in Patricia trie configuration indexs:Each PAT node labels in the PAT chains of inquiry tag are " logical Cross ", last PAT node label " receiving ", other PAT nodes are all " refusal ".Eventually pass for inquiry tag path Tree construction after focusing is referred to as enquiry navigation PAT trees.
2) each node for constituting query tree will also have detailed navigation information in addition to the information of label to be had, side Just to this inquiry traversal of tree, while also to mark the position that can be calculated predicate expressions.
4th, query processing and result set is collected
1) XML data stream document is read in, using SAX (Simple API for XML) technology.SAX is the XML of lightweight Analytic method, its substantially course of work is:Reader (Reader) reads in partial XML flow data, follow-up parsing work first By event-driven, including:StartDocument, EndDocument, StartElement, EndElement and Characters, the rest may be inferred repeats said process until XML stream end of data.DOM parsings are contrasted, SAX is parsed when reading, nothing Whole document need to be read in internal memory, be particularly suitable for parsing large XML document.The present invention is the inquiry just for inquiry tag path Filtration treatment is also added in event call-back method, and further reducing needs the XML data amount of matching treatment, is conducive to saving storage Space, reduces the parsing time, improves spatiotemporal efficiency.
2) query processing process is leading using XML stream data, and enquiry navigation PAT trees are auxiliary directive function, are recorded simultaneously The state change or change in context of system.
It is assumed that some child nodes Ndc under current flow data node elements Nd, its label is Tdc, it is necessary to search for matching Object be all child's label node list of labels TLqc in query tree under corresponding node Nq, find label Tdc correspondences and look into Ask and detect whether it is to inquire about the label node focused on after child's label node simultaneously in tree.Because the list of labels is to work as prosthomere Point under all child nodes complete or collected works, in requisition for inquiry focus on child nodes list of labels be TLqfc, the relation of the two TLqfc is included for TLqc.In order to accelerate matching process, the present invention is calculated using efficient, quick Patricia trie matchings search Method, is divided into following several situations:
A it is sky that) inquiry, which focuses on list of labels TLqfc, it is meant that need not continue to inquire about using Ndc as the data pathing of root node Processing, can cut this data pathing;
B) TLqc is equal to TLqfc, it is meant that Ndc back end is necessarily that user inquires about the label focused on without matching, Subsequent query processing can be continued;
C) TLqfc does not include TLqfc really for empty TLqc simultaneously, using Patricia searching algorithms, wherein PAT nodes records The common ground length that label is skipped, the different object matching label of branch's correspondence searches for downward successively from PAT root nodes, directly To arrival " receiving " or " refusal " PAT nodes.If " refusal " PAT nodes, it is meant that using Ndc as the data pathing of root node Query processing need not be continued, this data pathing can be cut.If " receiving " PAT nodes, it is meant that this Ndc focuses on for inquiry Label.
3) start to calculate predicate expressions in the calculating point position marked, be as a result the Directory Enquiries for very then continuing the branch Reason, otherwise will skip the processing to this branch, jump to next loop branches.
As shown in Figure 1, 2, 3, the method for quickly querying of XML stream data of the present invention, specific handling process includes following step Suddenly:
Step 1, XML Schema definition pretreatments:Build enquiry navigation PAT trees
Step 101 searches for root element and the root node of query tree is created according to its element information.
Step 102 determines whether the most end element in Schema definition documents, if terminating this pretreatment, otherwise jumps Go to step 103.
Step 103 finds all daughter elements by currentElement, builds corresponding subtab node and is put into query tree In, this specific details of daughter element label is put into the subtab node, is led while being put into parent-child label node Boat information, including:" father and son " pointer, " sub- father " pointer etc..
Step 104 is according to node and its all build Patricia tries directly under child nodes and aids in fast search knot Structure, the length of middle PAT nodes records " common " part, leaf PAT nodes point to corresponding directly under child's label node.
Step 2, query argument pretreatment:Build predicate expressions grammer and calculate tree, enquiry navigation PAT trees
Step 201 jumps to step 204 if predicate expressions are not present, if it does, order performs next step Rapid 202.
Step 202 assign operand as leaf section according to expression formula EBNF normal forms, syntactic analysis predicate conditions expression formula Point, related operator as its father node, the like construction predicate grammer calculate tree.
Step 203 is appended to the tag path in each conditional expression subitem in inquiry tag routing table.
Step 204 judges whether inquiry tag path list end, if not step 205 is performed, otherwise terminates inquiry Parameter is pre-processed.
Step 205, for each inquiry tag path, is split into a sequence label, handled in sequence first to 207 Each label, the corresponding label node label in enquiry navigation PAT tree constructions focuses on to need to inquire about:This label node shape State is " passing through ", while be " passing through " each PAT node labels state that father's label node leads to the label node, it is other PAT nodes state be " refusal ".Then the next label of sequential search, repeats step 207, until this sequence label Terminate, redirect execution step 204.
Step 3, query processing simultaneously provide two-dimensional result collection
According to the event call-back distinct methods of generation in step 301 parsing target XML stream data file, resolving, its Middle event " StartElement " performs step 302, event " Characters " and performs step 303, event " EndElement " Perform step 304.Filtration treatment is directly started a query in resolving, reduction needs the target XML stream data of matching treatment to advise Mould, is conducive to saving memory space, reduces the parsing time, improve spatiotemporal efficiency.
I.e. from current label node father's label node in step 302 input label matching search inquiry navigation PAT trees PAT supplementary structures fast search algorithm corresponding with its is begun through, can quickly check and determine whether to navigate to correct child On child node label, context state is updated according to matching result, including:" receiving " and " refusal " state.
Step 303 is collected the corresponding label value of this event and is put into caching.
If step 304 reaches the label node of mark Predicate evaluation position, each tag path in expression formula is extracted Respective value, then starts to carry out calculation expression according to predicate expressions grammer calculating tree construction, is as a result very then to perform step 305, as a result it is all label value collection of the false then refusal collection by the branch of root node of this label node.
If step 305 result of calculation is that true and this label node is " receiving " state, this secondary label correspondence is collected It is worth and is put into caching.
Step 306 collects the label correspondence result set in all cachings, merges composition two-dimensional tag result set, terminates inquiry Processing, and return to two-dimensional result collection.

Claims (2)

1. a kind of method for quickly querying of XML stream data, it is characterised in that comprise the following steps:
Step 1, XML Schema definition pretreatments:Build enquiry navigation PAT trees
Step 101, search root element and the root node that query tree is created according to its element information;
Step 102, the most end element determined whether in Schema definition documents, if terminating this pretreatment, are otherwise redirected To step 103;
Step 103, find all daughter elements by currentElement, build corresponding subtab node and be simultaneously put into query tree, This specific details of daughter element label is put into the subtab node, while being put into navigation in parent-child label node Information;
Step 104, according to node and its it is all build Patricia tries directly under child nodes and aid in fast search structure, in Between PAT nodes records " common " part length, leaf PAT nodes point to it is corresponding directly under child's label node;
Step 2, query argument pretreatment:Build predicate expressions grammer and calculate tree, enquiry navigation PAT trees
Step 201, if predicate expressions are not present, jump to step 204, if it does, order perform next step 202;
Step 202, according to expression formula EBNF normal forms, syntactic analysis predicate conditions expression formula, and using operand as leaf node, Related operator as its father node, the like construction predicate grammer calculate tree;
Step 203, the tag path in each conditional expression subitem is appended in inquiry tag routing table;
Step 204, judge whether inquiry tag path list end, if not step 205 is performed, otherwise terminate query argument Pretreatment;
Step 205, for each inquiry tag path, a sequence label is split into first, handle in sequence each mark Label, the corresponding label node label in enquiry navigation PAT tree constructions focuses on to need to inquire about:This label node state is " logical Cross ", while be " passing through " each PAT node labels state that father's label node leads to the label node, other PAT sections The state of point is " refusal ";Then the next label of sequential search, until this sequence label terminates, redirects execution step 204;
Step 3, query processing simultaneously provide two-dimensional result collection
According to the event call-back distinct methods of generation, wherein thing in step 301, parsing target XML stream data file, resolving Part " StartElement " performs step 302, event " Characters " and performs step 303, event " EndElement " execution Step 304;
I.e. in step 302, input label matching search inquiry navigation PAT trees since current label node father's label node By PAT supplementary structures fast search algorithm corresponding with its, it can quickly check and determine whether to navigate to correct child's section On point label, context state is updated according to matching result, including:" receiving " and " refusal " state;
Step 303, collect the corresponding label value of this event and be put into caching;
If step 304, the label node for reaching mark Predicate evaluation position, each tag path correspondence in expression formula is extracted Value, then starts to carry out calculation expression according to predicate expressions grammer calculating tree construction, is as a result very then to perform step 305, knot Fruit is all label value collection of the false then refusal collection by the branch of root node of this label node;
If step 305, result of calculation are that true and this label node is " receiving " state, this secondary label respective value is collected simultaneously It is put into caching;
Step 306, the label correspondence result set collected in all cachings, merge composition two-dimensional tag result set, terminate Directory Enquiries Reason, and return to two-dimensional result collection.
2. the method for quickly querying of XML stream data as claimed in claim 1, it is characterised in that query processing process in step 3 It is leading using XML stream data, enquiry navigation PAT trees are auxiliary directive function, while the state change of record system or up and down Text change;It is assumed that some child nodes Ndc under current flow data node elements Nd, its label is Tdc, it is necessary to search for matching Object is all child's label node list of labels TLqc under corresponding node Nq in query tree, finds label Tdc correspondence inquiries Detect whether it is to inquire about the label node focused on after child's label node simultaneously in tree.Because the list of labels is present node Under all child nodes complete or collected works, in requisition for the child nodes list of labels that focuses on of inquiry be TLqfc, the relation of the two is TLqc includes TLqfc;Matched using Patricia trie match search algorithms, be divided into following several situations:
A it is sky that) inquiry, which focuses on list of labels TLqfc, it is meant that need not continue Directory Enquiries using Ndc as the data pathing of root node Reason, can cut this data pathing;
B) TLqc is equal to TLqfc, it is meant that Ndc back end, can be with without the label that matching is necessarily that user's inquiry is focused on Continue subsequent query processing;
C) TLqfc does not include TLqfc really for empty TLqc simultaneously, using Patricia searching algorithms, wherein PAT nodes records label The common ground length skipped, the different object matching labels of branch's correspondence, searches for downward successively from PAT root nodes, until to Up to " receiving " or " refusal " PAT nodes;If " refusal " PAT nodes, it is meant that the data pathing using Ndc as root node need not Continue query processing, this data pathing can be cut;If " receiving " PAT nodes, it is meant that the mark that this Ndc focuses on for inquiry Label.
CN201710256675.6A 2017-04-19 2017-04-19 Quick query method of XML (extensive Makeup language) stream data Active CN107256218B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710256675.6A CN107256218B (en) 2017-04-19 2017-04-19 Quick query method of XML (extensive Makeup language) stream data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710256675.6A CN107256218B (en) 2017-04-19 2017-04-19 Quick query method of XML (extensive Makeup language) stream data

Publications (2)

Publication Number Publication Date
CN107256218A true CN107256218A (en) 2017-10-17
CN107256218B CN107256218B (en) 2021-01-05

Family

ID=60027537

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710256675.6A Active CN107256218B (en) 2017-04-19 2017-04-19 Quick query method of XML (extensive Makeup language) stream data

Country Status (1)

Country Link
CN (1) CN107256218B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1941743A (en) * 2006-09-21 2007-04-04 复旦大学 Method for inquiring and matching XML-flow data complex small-branch mode
US7228296B2 (en) * 2003-03-27 2007-06-05 Fujitsu Limited Devices for interpreting and retrieving XML documents, methods of interpreting and retrieving XML documents, and computer product
CN101025760A (en) * 2007-01-31 2007-08-29 王宏源 Method for digitalizing family tree
CN101089851A (en) * 2007-07-12 2007-12-19 复旦大学 XML flow buffer store manage method based on partial binary prefix code
CN101093493A (en) * 2006-06-23 2007-12-26 国际商业机器公司 Speech conversion method for database inquiry, converter, and database inquiry system
CN101247279A (en) * 2007-10-23 2008-08-20 北京邮电大学 Internet content safety detecting system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7228296B2 (en) * 2003-03-27 2007-06-05 Fujitsu Limited Devices for interpreting and retrieving XML documents, methods of interpreting and retrieving XML documents, and computer product
CN101093493A (en) * 2006-06-23 2007-12-26 国际商业机器公司 Speech conversion method for database inquiry, converter, and database inquiry system
CN1941743A (en) * 2006-09-21 2007-04-04 复旦大学 Method for inquiring and matching XML-flow data complex small-branch mode
CN101025760A (en) * 2007-01-31 2007-08-29 王宏源 Method for digitalizing family tree
CN101089851A (en) * 2007-07-12 2007-12-19 复旦大学 XML flow buffer store manage method based on partial binary prefix code
CN101247279A (en) * 2007-10-23 2008-08-20 北京邮电大学 Internet content safety detecting system

Also Published As

Publication number Publication date
CN107256218B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
US11481439B2 (en) Evaluating XML full text search
CN103646032B (en) A kind of based on body with the data base query method of limited natural language processing
US7739257B2 (en) Search engine
CN107256217B (en) Quick query method of XML data
US20060206466A1 (en) Evaluating relevance of results in a semi-structured data-base system
CN101719156B (en) System of seamless integrated pure XML query engine in relational database
US8972377B2 (en) Efficient method of using XML value indexes without exact path information to filter XML documents for more specific XPath queries
CN101710318A (en) Knowledge intelligent acquiring system of vegetable supply chains
KR102157218B1 (en) Data transformation method for spatial data's semantic annotation
CN105117397A (en) Method for searching semantic association of medical documents based on ontology
US20060161525A1 (en) Method and system for supporting structured aggregation operations on semi-structured data
CN102819600B (en) Keyword search methodology towards relational database of power production management system
Sanz et al. Fragment-based approximate retrieval in highly heterogeneous XML collections
CN107256218A (en) A kind of method for quickly querying of XML stream data
Panzeri et al. An approach to define flexible structural constraints in xquery
Leela et al. Schema-conscious XML indexing
Van de Maele et al. An ontology-based crawler for the semantic web
Qtaish et al. Query mapping techniques for XML documents: A comparative study
Finelli et al. Semantic Search in Relational Databases
Córcoles et al. A Spatio-Temporal Query Language for a data model based on XML.
Huang et al. Accelerating XML Query Processing on Views
Liu et al. A simple implementation of distributed vertical search and information integration technology
Özsu et al. Web Data Management
Hu et al. Query XML data in RDBMS
Georgiadis et al. Efficient Physical Operators for a cost-based XPath Execution Engine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant