CN108427676A - A kind of method that XML tag is quickly positioned and handled - Google Patents
A kind of method that XML tag is quickly positioned and handled Download PDFInfo
- Publication number
- CN108427676A CN108427676A CN201710076346.3A CN201710076346A CN108427676A CN 108427676 A CN108427676 A CN 108427676A CN 201710076346 A CN201710076346 A CN 201710076346A CN 108427676 A CN108427676 A CN 108427676A
- Authority
- CN
- China
- Prior art keywords
- xml
- node
- processing
- context
- quickly
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/81—Indexing, e.g. XML tags; Data structures therefor; Storage structures
Abstract
The invention discloses a kind of methods that XML tag is quickly positioned and handled, wherein XML node rapid positioning operation includes creating and being arranged document context processor, and the quick processing operation of XML node includes XML element node efficient matchings, XML element attribute effective acquisition and immediately exits from mechanism after handling.Using the quick positioning of the XML node and processing method, the XML node of quasi- processing can be quickly navigated to, is disposed, can quickly terminate entire XML parsings.Compared to traditional XML SAX PARSER, having saved a large amount of parsing time (shows as:Node positioned before operating time and the vacant time after node processing), while additionally providing convenient, flexible XML node processing capacity, be highly susceptible to the maintenance and expansion of program.
Description
Technical field
The present invention relates to computer XML SAX dissection process technical fields, specifically a kind of XML tag quickly positions and place
The method of reason.
Background technology
XML tag:XML tag, also referred to as XML set mark.Most basic XML tag has 3 classes:XML element label (XML
Element), XML attribute label (XML Attribute), XML content of text label XML Text Content, it is usually first
The textual value of plain label).XML element label has beginning and end label, i.e., "<" and "/>”.Usual XML tag can also be simple
It is interpreted as XML tag name, also can simply understand that an XML tag is exactly an XML node (XML Node), tag name or section
It calls the roll and (is also referred to as QName) and be usually made of namespace prefix, colon and native name (also referred to as LName).Such as:
<uof:Font states uof:LocID=" u0041 " uof:AttrList=" identifier title type families replace word
Body " uof:Identifier=" Song typeface " uof:Title=" Song typeface " uof:Type family=" auto "/>
Wherein entitled " the uof of element tags:Font is stated ", attribute tags have 5 " uof:l ocID”、“uof:
attrList”、“uof:Identifier ", " uof:Title ", " uof:Type family " includes equal to number ("=") subsequent double quotation marks
It is exactly the attribute value of the attribute.These attributes form a list, are exactly the attribute of an element list.One simple XML document,
It is exactly made of the text string of several XML elements and its attribute list and the possible element, but requires entire document that can only have
One unique root element node, other elements are all its descendent nodes, and entire XML document is exactly a tree-shaped XML section
Point, similar family tree race relational graph.
XML attribute randomness:The position that XML attribute list and its attribute occur has disorder feature, such as:
<style:font-face style:Name=" Song typeface " svg:Font-famiy=" Song typeface " style:
Fontpitch=" variable "/>
<style:font-face svg:Font-family=" Song typeface " style:Font-pitch=" variable "
style:Name=" Song typeface "/>
The example has 3 XML attributes:style:name、svg:font-family、style:font-pitch.They
The position occurred in XML element is arbitrary and is in accordance with XML specification.XML element (the style in the example:Font-face)
Attribute list is exactly a list of its all properties composition possessed, and XML SAX parsers (Parser) are exactly with it come pipe
Manage each attribute belonging to XML element.
XML SAX parsers (Parser), reading (i.e. parsing) XML document, especially larger XML document, there is currently
One true specification makes XML SAX specifications, principle as follows:XML SAX Parser Analytic principles:Preset each sequence is read
The XML document stream size entered, since document, sequence (can not accomplish random access) reads in the byte of preset fixed size every time
They are just mapped as corresponding by stream in these byte streams whenever encountering the labels such as starting, end as defined in XML specification
Operation, handles these operations, until entire XML document terminates repeatedly.Wherein 3 basic operations are:
●startElement(char*element,char**attrList);
●characters();// this usually indicates the textual value of certain XML element;
●endElement(char*element);
In actual program, programmer is needed to provide the call back function of aforesaid operations collection equity.
Such a problem is usually faced in practical work:How in quick obtaining XML document somewhere a data, take
It is immediately finished after to result;And must this work be completed using XML SAX parsers.It is understood that XML SAX parsings
Device parses (i.e. sequence parses from the beginning to the end, can not accomplish arbitrarily to redirect) based on streaming, i.e., just ties from the beginning to the end always
Beam (even if all nodes before and after the XML node of quasi- processing do not do any work, can also consume the CPU time always accomplishes
Entire document terminates), while what good method also quickly to navigate to the quasi- position for obtaining data without.
Invention content
The purpose of the present invention is to provide a kind of methods that XML tag is quickly positioned and handled.
To achieve the above object, the present invention provides the following technical solutions:
A kind of method that XML tag is quickly positioned and handled, creates XML element node context processor and exits machine
System, wherein the establishment XML element node context processor includes:One is encapsulated on the basis of existing XML SAX parsers
Create XML root node context handlers, i.e. XML document processor;Wherein creating escape mechanism step includes:In existing XML
What one active of encapsulation terminated to parse on the basis of SAX parser exits exception.
As a further solution of the present invention:The context handler is provided including at least existing XML SAX parsers
3 basic operation interfaces, they support exception throws to handle simultaneously.
As a further solution of the present invention:The context handler further includes a sub- context handler, that is, is passed
Return definition.
As a further solution of the present invention:Wherein escape mechanism further includes catching the exception.
As a further solution of the present invention:Catch the exception further include context handler provide at least three grasp substantially
Make to be arranged in interface.
Compared with prior art, the beneficial effects of the invention are as follows:
Compared to traditional XML SAX PARSER, having saved a large amount of parsing time (shows as:Node positioned before operation
The vacant time after time and node processing), while convenient, flexible XML node processing capacity is additionally provided, it is highly susceptible to journey
The maintenance and expansion of sequence.
Description of the drawings
Fig. 1 is that the XML node of the present invention quickly positions and exit processing schematic diagram;
Fig. 2 is that the XML parser (can quickly localization of XML node) of the present invention compares schematic diagram with conventional analytic device;
Fig. 3 is that the XML node resolving of the present invention compares schematic diagram with the resolving of conventional analytic device;
Fig. 4 is that the XML node context handler of the present invention and conventional analytic device basic analytical unit compare schematic diagram;
Fig. 5 is the XML node data acquisition of the present invention and the code schematic diagram that quickly exits;
Fig. 6 is the code schematic diagram that the XML node of the present invention quickly positions.
Specific implementation mode
Below in conjunction with the embodiment of the present invention, technical scheme in the embodiment of the invention is clearly and completely described,
Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention
Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all
Belong to the scope of protection of the invention.
Fig. 1 is that the XML node of the present invention quickly positions and exit processing schematic diagram.The present invention is based on XML SAX parsers
Principle provides a kind of analytic method that XML tag is quickly positioned and handled.Specific implementation step:
(1) XML document processor (XuofDocumentHandler) is created, and is arranged and gives XML SAX parsers;
(11) the establishment process of wherein XML document processor is as follows:
(111) XML node context handler (XuofContextHandler) is pre-defined, using interface description language
IDL is defined as follows:
(1111) it defines XML element node and starts interface:
void startElement([in]long nElement,[in]XuofAttributeList Attribs)
raises(SAXException);Wherein the parameter of [in] mark thereafter is input parameter, and nElement indicates XML element label
Corresponding unique integral after string data is Tokenization, it is the same below, no longer describe.
(11111) wherein XML attribute list processor (XuofAttributeList) interface is defined as follows:
string getValueByIndex([in]long idx)raises(SAXException);Wherein idx is indicated
Attribute-bit in the attribute list of certain XML element node:It can be the call number in attribute list, can also be attribute word
Accord with the Tokenization rear corresponding unique integral of string data.
(1112) it defines XML element node and terminates interface:
void endElement([in]long nElement)raises(SAXException);
(1113) XML text node interfaces are defined:
void characters([in]string aChars)raises(SAXException);Wherein aChars is indicated
XML text character string datas, the text string of as common node element are null value if node element is without text string.
(1114) it defines unknown (meaning XML tag that i.e. markization is not handled in advance) XML element node and starts interface:
void startUnkownElement([in]string Namespace,[in]string Name,[in]
XuofAttributeList Attribs)raises(SAXException);Wherein Namespace is that the name of XML element is empty
Between URL values, Name be XML element name, it is the same below, no longer describe.
(1115) it defines unknown (meaning XML tag that i.e. markization is not handled in advance) XML element node and terminates interface:
void endUnkownElement([in]string Namespace,[in]string Name)raises
(SAXException);
(1116) XML element child node context handler interface is defined:
XuofContextHandler createChildContext([in]long nElement,[in]
XuofAttributeList Attribs)raises(SAXException);
(112) XML document processor (XuofDocumentHandler) is pre-defined, it is fixed using interface description language IDL
Justice is as follows:
(1121) it defines XML document and starts interface:
void startDocument()raises(SAXException);
(1122) it defines XML document and terminates interface:
void endDocument()raises(SAXException);
(1123) the ranks locator interface of XML document parsing error handle is defined:
void setDocumentLocator(sax::XLocator xloc)raises(SAXException);
(1124) 6 interfaces of the XML node context handler that above-mentioned (111) define are inherited.
(113) it realizes above-mentioned (112) XML document processor (XuofDocumentHandler), that is, realizes the institute of its definition
There is interface.
Especially emphasis synchronizes two interfaces of startElement and createChildContext, when specific implementation, foundation
The quasi- XML node for obtaining data, obtains desired data in startElement, can also be in the data of the last one acquisition
It actively dishes out afterwards and exits exception to shift to an earlier date parsing of the node to entire document;It is directed in createChildContext simultaneously
The quasi- XML node for obtaining data returns to the XuofContextHandler of a realization, other all nodes uniformly return to NULL.
It can reach quick localization of XML node in this way and actively exit parsing.
(12) the context handler process for being provided with XML SAX parsers is as follows:
(121) XML SAX parsers (XuofParser) are pre-defined, using interface description language IDL
It is defined as follows:
(1211) it defines XML document stream and parses interface:
void parseStream([in]sax::InputSource)
raises(SAXException,IOException);Wherein sax::InputSource is not described in detail.
(1212) definition setting XML document processor interface:
void setDocumentHandler([in]XuofDocumentHandler);
Wherein XuofDocumentHandler is described referring to above-mentioned (112).
(1213) definition setting XML document error handler interface:
void setErrorHandler([in]sax::XErrorHandler);Wherein sax::XErrorHandler is not
Detailed description.
(1214) the definition setting Tokenization processor interface of XML document:
void setTokenHandler([in]XuofTokenHandler);
(12141) the Tokenization processors of wherein XML (XuofTokenHandler) interface is defined as follows:
(121411) definition character string denotation interface:
long getToken([in]string xmlTag);Wherein xmlTag is xml elements and attribute tags character string.
It is used when parsing XML document.
(121412) the anti-Tokenization interface of definition character string:
sequence<byte>getUTF8([in]long nToken);Wherein nToken is whole after character string is Tokenization
Numerical value returns the result the byte sequence string encoded for utf8.This is used to serialize when XML document (writing XML document).
(121413) the Tokenization required NameSpace registration interface of processing of XML character strings is defined:
void registerNamespace([in]string NamespaceURL,[in]long
NamespaceToken)raises(illegalArgumentException);Usual NamespaceToken values with it is above-mentioned
NToken values are associated using certain skill.
(121414) optional XML document label data old version interface is defined:
void setVersion([in]string version);To compatible XML document difference old version.
(122) the XML SAX parsers for realizing (121) description, that is, realize the interface that all XuofParser are defined.
(123) the XML document processor (XuofDocumentHandler) that setting (113) describes,
Steps are as follows:
(1231) above-mentioned (122) XuofParser resolvers are initialized;
(1232) the initialization construction Tokenization processor of XML tag (XuofTokenHandler), and optionally initialize
The XML tag of different editions is set, so that Tokenization processor handles the XML tag of respective version.Then it calls
The XuofTokenHandler of the initialization is arranged setTokenHandler to XuorParser;
(1233) the XML document processor for calling setDocumentHandler to realize above-mentioned (113)
(XuofDocumentHandler) it is arranged to XuorParser.
(2) call XML SAX parser (XuofParser) interface parseStream parses XML document stream, and captures solution
Exception during analysis exits;During parsing, the processing of arbitrary XML node is all connect by step (1) wherein step (2)
Pipe;Wherein step (1) includes at least:(21) establishment of nested sub- context handler:Return 3 class context handlers (without, from
Body, completely new context handler), the crucial place of quick localization of XML node referring to above-mentioned (113) here, describe.
(22) start XML element node processing:That is startElement ([in] long nElem ent, [in]
XuofAttributeList Attribs) raises (SAXException), and the exception captured during this processing exits,
And exception of dishing out;
(23) text node is handled:That is characters ([in] string aChars) raises (SAXException),
And the exception captured during this processing exits, and the exception of dishing out;
(24) XML element node ends processing:That is endElement ([in] long nElement) raises
(SAXException), and the exception that captures during this processing exits, and exception of dishing out.
Figures 5 and 6 differentiate the critical instance generation of " mark text is logical " Doctype for one be embodied of the invention
Code.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation is included within the present invention.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped
Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should
It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art
The other embodiment being appreciated that.
Claims (5)
1. a kind of XML tag quickly positions and the method for processing, which is characterized in that create XML element node context processor
And escape mechanism, wherein the establishment XML element node context processor includes:On the basis of existing XML SAX parsers
One establishment XML root node context handler of encapsulation, i.e. XML document processor;Wherein creating escape mechanism step includes:
What one active of encapsulation terminated to parse on the basis of existing XML SAX parsers exits exception.
2. XML tag according to claim 1 quickly positions and the method for processing, which is characterized in that at the context
It manages device and includes at least 3 basic operation interfaces that existing XML SAX parsers provide, they support exception throws to handle simultaneously.
3. XML tag according to claim 1 quickly positions and the method for processing, which is characterized in that at the context
It further includes a sub- context handler, i.e. recursive definition to manage device.
4. XML tag according to claim 1 quickly positions and the method for processing, which is characterized in that wherein escape mechanism
Further include catching the exception.
5. XML tag according to claim 4 quickly positions and the method for processing, which is characterized in that catch the exception and also wrap
It includes and is arranged at least three basic operation interface of context handler offer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710076346.3A CN108427676A (en) | 2017-02-13 | 2017-02-13 | A kind of method that XML tag is quickly positioned and handled |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710076346.3A CN108427676A (en) | 2017-02-13 | 2017-02-13 | A kind of method that XML tag is quickly positioned and handled |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108427676A true CN108427676A (en) | 2018-08-21 |
Family
ID=63154995
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710076346.3A Pending CN108427676A (en) | 2017-02-13 | 2017-02-13 | A kind of method that XML tag is quickly positioned and handled |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108427676A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325149A (en) * | 2018-09-30 | 2019-02-12 | 中国银行股份有限公司 | XML message search method and device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1499368A (en) * | 2002-11-11 | 2004-05-26 | 华为技术有限公司 | Method for creating interface and generation system based on description |
CN1896992A (en) * | 2006-06-15 | 2007-01-17 | Ut斯达康通讯有限公司 | Method and device for analyzing XML file based on applied customization |
CN102006512A (en) * | 2010-10-29 | 2011-04-06 | 广东星海数字家庭产业技术研究院有限公司 | Digital television HSML (Hypertext Service Markup Language) analysis method and system applying SAX (The Simple API for XML) analysis engine |
CN102662725A (en) * | 2012-03-15 | 2012-09-12 | 中国科学院软件研究所 | Event-driven high concurrent process virtual machine realization method |
-
2017
- 2017-02-13 CN CN201710076346.3A patent/CN108427676A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1499368A (en) * | 2002-11-11 | 2004-05-26 | 华为技术有限公司 | Method for creating interface and generation system based on description |
CN1896992A (en) * | 2006-06-15 | 2007-01-17 | Ut斯达康通讯有限公司 | Method and device for analyzing XML file based on applied customization |
CN102006512A (en) * | 2010-10-29 | 2011-04-06 | 广东星海数字家庭产业技术研究院有限公司 | Digital television HSML (Hypertext Service Markup Language) analysis method and system applying SAX (The Simple API for XML) analysis engine |
CN102662725A (en) * | 2012-03-15 | 2012-09-12 | 中国科学院软件研究所 | Event-driven high concurrent process virtual machine realization method |
Non-Patent Citations (2)
Title |
---|
张迪: "基于SAX的XML解析与应用", 《计算机与数字工程》 * |
美斯坦福(中国)IT教育: "《SCM高级3G/4G通信工程师 项目实训》", 30 August 2012 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109325149A (en) * | 2018-09-30 | 2019-02-12 | 中国银行股份有限公司 | XML message search method and device |
CN109325149B (en) * | 2018-09-30 | 2020-08-11 | 中国银行股份有限公司 | XML message retrieval method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8255394B2 (en) | Apparatus, system, and method for efficient content indexing of streaming XML document content | |
Tidwell | XSLT: mastering XML transformations | |
US7873663B2 (en) | Methods and apparatus for converting a representation of XML and other markup language data to a data structure format | |
AU2003243169B2 (en) | System and method for processing of XML documents represented as an event stream | |
US7877366B2 (en) | Streaming XML data retrieval using XPath | |
US20050144556A1 (en) | XML schema token extension for XML document compression | |
US8533172B2 (en) | Method and device for coding and decoding information | |
CN107423391B (en) | Information extraction method of webpage structured data | |
US8566343B2 (en) | Searching backward to speed up query | |
US20120310868A1 (en) | Method and system for extracting and managing information contained in electronic documents | |
Chiarcos et al. | CoNLL-RDF: Linked corpora done in an NLP-friendly way | |
KR101311123B1 (en) | Programmability for xml data store for documents | |
WO2005041072A1 (en) | Expression grouping and evaluation | |
EP1723553A2 (en) | Device for structured data transformation | |
WO2017058047A1 (en) | Method of preparing documents in markup languages | |
CN101872350A (en) | Web page text extracting method and device thereof | |
JP5800441B2 (en) | Method and apparatus for document compression, decompression and query | |
Lee | JXON: an architecture for schema and annotation driven json/xml bidirectional transformations | |
CN101996190B (en) | Method and device for extracting information from webpage | |
CN108427676A (en) | A kind of method that XML tag is quickly positioned and handled | |
JP2006127235A (en) | Structured document management system, structured document management method and program | |
US20080082478A1 (en) | System, method, and apparatus for retrieving structured document and apparatus for managing structured document | |
US8291392B2 (en) | Dynamic specialization of XML parsing | |
Saadatfar et al. | Best Practice for DSDL-based Validation | |
Šandrih et al. | Towards Efficient and Unified XML/JSON Conversion-A New Conversion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180821 |