CN108427676A - A kind of method that XML tag is quickly positioned and handled - Google Patents

A kind of method that XML tag is quickly positioned and handled Download PDF

Info

Publication number
CN108427676A
CN108427676A CN201710076346.3A CN201710076346A CN108427676A CN 108427676 A CN108427676 A CN 108427676A CN 201710076346 A CN201710076346 A CN 201710076346A CN 108427676 A CN108427676 A CN 108427676A
Authority
CN
China
Prior art keywords
xml
node
processing
context
quickly
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710076346.3A
Other languages
Chinese (zh)
Inventor
王长胜
李新冬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing New Cloud Technology Co Ltd
Original Assignee
Beijing New Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing New Cloud Technology Co Ltd filed Critical Beijing New Cloud Technology Co Ltd
Priority to CN201710076346.3A priority Critical patent/CN108427676A/en
Publication of CN108427676A publication Critical patent/CN108427676A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures

Abstract

The invention discloses a kind of methods that XML tag is quickly positioned and handled, wherein XML node rapid positioning operation includes creating and being arranged document context processor, and the quick processing operation of XML node includes XML element node efficient matchings, XML element attribute effective acquisition and immediately exits from mechanism after handling.Using the quick positioning of the XML node and processing method, the XML node of quasi- processing can be quickly navigated to, is disposed, can quickly terminate entire XML parsings.Compared to traditional XML SAX PARSER, having saved a large amount of parsing time (shows as:Node positioned before operating time and the vacant time after node processing), while additionally providing convenient, flexible XML node processing capacity, be highly susceptible to the maintenance and expansion of program.

Description

A kind of method that XML tag is quickly positioned and handled
Technical field
The present invention relates to computer XML SAX dissection process technical fields, specifically a kind of XML tag quickly positions and place The method of reason.
Background technology
XML tag:XML tag, also referred to as XML set mark.Most basic XML tag has 3 classes:XML element label (XML Element), XML attribute label (XML Attribute), XML content of text label XML Text Content, it is usually first The textual value of plain label).XML element label has beginning and end label, i.e., "<" and "/>”.Usual XML tag can also be simple It is interpreted as XML tag name, also can simply understand that an XML tag is exactly an XML node (XML Node), tag name or section It calls the roll and (is also referred to as QName) and be usually made of namespace prefix, colon and native name (also referred to as LName).Such as:
<uof:Font states uof:LocID=" u0041 " uof:AttrList=" identifier title type families replace word Body " uof:Identifier=" Song typeface " uof:Title=" Song typeface " uof:Type family=" auto "/>
Wherein entitled " the uof of element tags:Font is stated ", attribute tags have 5 " uof:l ocID”、“uof: attrList”、“uof:Identifier ", " uof:Title ", " uof:Type family " includes equal to number ("=") subsequent double quotation marks It is exactly the attribute value of the attribute.These attributes form a list, are exactly the attribute of an element list.One simple XML document, It is exactly made of the text string of several XML elements and its attribute list and the possible element, but requires entire document that can only have One unique root element node, other elements are all its descendent nodes, and entire XML document is exactly a tree-shaped XML section Point, similar family tree race relational graph.
XML attribute randomness:The position that XML attribute list and its attribute occur has disorder feature, such as:
<style:font-face style:Name=" Song typeface " svg:Font-famiy=" Song typeface " style: Fontpitch=" variable "/>
<style:font-face svg:Font-family=" Song typeface " style:Font-pitch=" variable " style:Name=" Song typeface "/>
The example has 3 XML attributes:style:name、svg:font-family、style:font-pitch.They The position occurred in XML element is arbitrary and is in accordance with XML specification.XML element (the style in the example:Font-face) Attribute list is exactly a list of its all properties composition possessed, and XML SAX parsers (Parser) are exactly with it come pipe Manage each attribute belonging to XML element.
XML SAX parsers (Parser), reading (i.e. parsing) XML document, especially larger XML document, there is currently One true specification makes XML SAX specifications, principle as follows:XML SAX Parser Analytic principles:Preset each sequence is read The XML document stream size entered, since document, sequence (can not accomplish random access) reads in the byte of preset fixed size every time They are just mapped as corresponding by stream in these byte streams whenever encountering the labels such as starting, end as defined in XML specification Operation, handles these operations, until entire XML document terminates repeatedly.Wherein 3 basic operations are:
●startElement(char*element,char**attrList);
●characters();// this usually indicates the textual value of certain XML element;
●endElement(char*element);
In actual program, programmer is needed to provide the call back function of aforesaid operations collection equity.
Such a problem is usually faced in practical work:How in quick obtaining XML document somewhere a data, take It is immediately finished after to result;And must this work be completed using XML SAX parsers.It is understood that XML SAX parsings Device parses (i.e. sequence parses from the beginning to the end, can not accomplish arbitrarily to redirect) based on streaming, i.e., just ties from the beginning to the end always Beam (even if all nodes before and after the XML node of quasi- processing do not do any work, can also consume the CPU time always accomplishes Entire document terminates), while what good method also quickly to navigate to the quasi- position for obtaining data without.
Invention content
The purpose of the present invention is to provide a kind of methods that XML tag is quickly positioned and handled.
To achieve the above object, the present invention provides the following technical solutions:
A kind of method that XML tag is quickly positioned and handled, creates XML element node context processor and exits machine System, wherein the establishment XML element node context processor includes:One is encapsulated on the basis of existing XML SAX parsers Create XML root node context handlers, i.e. XML document processor;Wherein creating escape mechanism step includes:In existing XML What one active of encapsulation terminated to parse on the basis of SAX parser exits exception.
As a further solution of the present invention:The context handler is provided including at least existing XML SAX parsers 3 basic operation interfaces, they support exception throws to handle simultaneously.
As a further solution of the present invention:The context handler further includes a sub- context handler, that is, is passed Return definition.
As a further solution of the present invention:Wherein escape mechanism further includes catching the exception.
As a further solution of the present invention:Catch the exception further include context handler provide at least three grasp substantially Make to be arranged in interface.
Compared with prior art, the beneficial effects of the invention are as follows:
Compared to traditional XML SAX PARSER, having saved a large amount of parsing time (shows as:Node positioned before operation The vacant time after time and node processing), while convenient, flexible XML node processing capacity is additionally provided, it is highly susceptible to journey The maintenance and expansion of sequence.
Description of the drawings
Fig. 1 is that the XML node of the present invention quickly positions and exit processing schematic diagram;
Fig. 2 is that the XML parser (can quickly localization of XML node) of the present invention compares schematic diagram with conventional analytic device;
Fig. 3 is that the XML node resolving of the present invention compares schematic diagram with the resolving of conventional analytic device;
Fig. 4 is that the XML node context handler of the present invention and conventional analytic device basic analytical unit compare schematic diagram;
Fig. 5 is the XML node data acquisition of the present invention and the code schematic diagram that quickly exits;
Fig. 6 is the code schematic diagram that the XML node of the present invention quickly positions.
Specific implementation mode
Below in conjunction with the embodiment of the present invention, technical scheme in the embodiment of the invention is clearly and completely described, Obviously, described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Based in the present invention Embodiment, every other embodiment obtained by those of ordinary skill in the art without making creative efforts, all Belong to the scope of protection of the invention.
Fig. 1 is that the XML node of the present invention quickly positions and exit processing schematic diagram.The present invention is based on XML SAX parsers Principle provides a kind of analytic method that XML tag is quickly positioned and handled.Specific implementation step:
(1) XML document processor (XuofDocumentHandler) is created, and is arranged and gives XML SAX parsers;
(11) the establishment process of wherein XML document processor is as follows:
(111) XML node context handler (XuofContextHandler) is pre-defined, using interface description language IDL is defined as follows:
(1111) it defines XML element node and starts interface:
void startElement([in]long nElement,[in]XuofAttributeList Attribs) raises(SAXException);Wherein the parameter of [in] mark thereafter is input parameter, and nElement indicates XML element label Corresponding unique integral after string data is Tokenization, it is the same below, no longer describe.
(11111) wherein XML attribute list processor (XuofAttributeList) interface is defined as follows:
string getValueByIndex([in]long idx)raises(SAXException);Wherein idx is indicated Attribute-bit in the attribute list of certain XML element node:It can be the call number in attribute list, can also be attribute word Accord with the Tokenization rear corresponding unique integral of string data.
(1112) it defines XML element node and terminates interface:
void endElement([in]long nElement)raises(SAXException);
(1113) XML text node interfaces are defined:
void characters([in]string aChars)raises(SAXException);Wherein aChars is indicated XML text character string datas, the text string of as common node element are null value if node element is without text string.
(1114) it defines unknown (meaning XML tag that i.e. markization is not handled in advance) XML element node and starts interface:
void startUnkownElement([in]string Namespace,[in]string Name,[in] XuofAttributeList Attribs)raises(SAXException);Wherein Namespace is that the name of XML element is empty Between URL values, Name be XML element name, it is the same below, no longer describe.
(1115) it defines unknown (meaning XML tag that i.e. markization is not handled in advance) XML element node and terminates interface:
void endUnkownElement([in]string Namespace,[in]string Name)raises (SAXException);
(1116) XML element child node context handler interface is defined:
XuofContextHandler createChildContext([in]long nElement,[in] XuofAttributeList Attribs)raises(SAXException);
(112) XML document processor (XuofDocumentHandler) is pre-defined, it is fixed using interface description language IDL Justice is as follows:
(1121) it defines XML document and starts interface:
void startDocument()raises(SAXException);
(1122) it defines XML document and terminates interface:
void endDocument()raises(SAXException);
(1123) the ranks locator interface of XML document parsing error handle is defined:
void setDocumentLocator(sax::XLocator xloc)raises(SAXException);
(1124) 6 interfaces of the XML node context handler that above-mentioned (111) define are inherited.
(113) it realizes above-mentioned (112) XML document processor (XuofDocumentHandler), that is, realizes the institute of its definition There is interface.
Especially emphasis synchronizes two interfaces of startElement and createChildContext, when specific implementation, foundation The quasi- XML node for obtaining data, obtains desired data in startElement, can also be in the data of the last one acquisition It actively dishes out afterwards and exits exception to shift to an earlier date parsing of the node to entire document;It is directed in createChildContext simultaneously The quasi- XML node for obtaining data returns to the XuofContextHandler of a realization, other all nodes uniformly return to NULL. It can reach quick localization of XML node in this way and actively exit parsing.
(12) the context handler process for being provided with XML SAX parsers is as follows:
(121) XML SAX parsers (XuofParser) are pre-defined, using interface description language IDL
It is defined as follows:
(1211) it defines XML document stream and parses interface:
void parseStream([in]sax::InputSource)
raises(SAXException,IOException);Wherein sax::InputSource is not described in detail.
(1212) definition setting XML document processor interface:
void setDocumentHandler([in]XuofDocumentHandler);
Wherein XuofDocumentHandler is described referring to above-mentioned (112).
(1213) definition setting XML document error handler interface:
void setErrorHandler([in]sax::XErrorHandler);Wherein sax::XErrorHandler is not Detailed description.
(1214) the definition setting Tokenization processor interface of XML document:
void setTokenHandler([in]XuofTokenHandler);
(12141) the Tokenization processors of wherein XML (XuofTokenHandler) interface is defined as follows:
(121411) definition character string denotation interface:
long getToken([in]string xmlTag);Wherein xmlTag is xml elements and attribute tags character string. It is used when parsing XML document.
(121412) the anti-Tokenization interface of definition character string:
sequence<byte>getUTF8([in]long nToken);Wherein nToken is whole after character string is Tokenization Numerical value returns the result the byte sequence string encoded for utf8.This is used to serialize when XML document (writing XML document).
(121413) the Tokenization required NameSpace registration interface of processing of XML character strings is defined:
void registerNamespace([in]string NamespaceURL,[in]long NamespaceToken)raises(illegalArgumentException);Usual NamespaceToken values with it is above-mentioned NToken values are associated using certain skill.
(121414) optional XML document label data old version interface is defined:
void setVersion([in]string version);To compatible XML document difference old version.
(122) the XML SAX parsers for realizing (121) description, that is, realize the interface that all XuofParser are defined.
(123) the XML document processor (XuofDocumentHandler) that setting (113) describes,
Steps are as follows:
(1231) above-mentioned (122) XuofParser resolvers are initialized;
(1232) the initialization construction Tokenization processor of XML tag (XuofTokenHandler), and optionally initialize The XML tag of different editions is set, so that Tokenization processor handles the XML tag of respective version.Then it calls The XuofTokenHandler of the initialization is arranged setTokenHandler to XuorParser;
(1233) the XML document processor for calling setDocumentHandler to realize above-mentioned (113) (XuofDocumentHandler) it is arranged to XuorParser.
(2) call XML SAX parser (XuofParser) interface parseStream parses XML document stream, and captures solution Exception during analysis exits;During parsing, the processing of arbitrary XML node is all connect by step (1) wherein step (2) Pipe;Wherein step (1) includes at least:(21) establishment of nested sub- context handler:Return 3 class context handlers (without, from Body, completely new context handler), the crucial place of quick localization of XML node referring to above-mentioned (113) here, describe.
(22) start XML element node processing:That is startElement ([in] long nElem ent, [in] XuofAttributeList Attribs) raises (SAXException), and the exception captured during this processing exits, And exception of dishing out;
(23) text node is handled:That is characters ([in] string aChars) raises (SAXException), And the exception captured during this processing exits, and the exception of dishing out;
(24) XML element node ends processing:That is endElement ([in] long nElement) raises (SAXException), and the exception that captures during this processing exits, and exception of dishing out.
Figures 5 and 6 differentiate the critical instance generation of " mark text is logical " Doctype for one be embodied of the invention Code.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims Variation is included within the present invention.
In addition, it should be understood that although this specification is described in terms of embodiments, but not each embodiment is only wrapped Containing an independent technical solution, this description of the specification is merely for the sake of clarity, and those skilled in the art should It considers the specification as a whole, the technical solutions in the various embodiments may also be suitably combined, forms those skilled in the art The other embodiment being appreciated that.

Claims (5)

1. a kind of XML tag quickly positions and the method for processing, which is characterized in that create XML element node context processor And escape mechanism, wherein the establishment XML element node context processor includes:On the basis of existing XML SAX parsers One establishment XML root node context handler of encapsulation, i.e. XML document processor;Wherein creating escape mechanism step includes: What one active of encapsulation terminated to parse on the basis of existing XML SAX parsers exits exception.
2. XML tag according to claim 1 quickly positions and the method for processing, which is characterized in that at the context It manages device and includes at least 3 basic operation interfaces that existing XML SAX parsers provide, they support exception throws to handle simultaneously.
3. XML tag according to claim 1 quickly positions and the method for processing, which is characterized in that at the context It further includes a sub- context handler, i.e. recursive definition to manage device.
4. XML tag according to claim 1 quickly positions and the method for processing, which is characterized in that wherein escape mechanism Further include catching the exception.
5. XML tag according to claim 4 quickly positions and the method for processing, which is characterized in that catch the exception and also wrap It includes and is arranged at least three basic operation interface of context handler offer.
CN201710076346.3A 2017-02-13 2017-02-13 A kind of method that XML tag is quickly positioned and handled Pending CN108427676A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710076346.3A CN108427676A (en) 2017-02-13 2017-02-13 A kind of method that XML tag is quickly positioned and handled

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710076346.3A CN108427676A (en) 2017-02-13 2017-02-13 A kind of method that XML tag is quickly positioned and handled

Publications (1)

Publication Number Publication Date
CN108427676A true CN108427676A (en) 2018-08-21

Family

ID=63154995

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710076346.3A Pending CN108427676A (en) 2017-02-13 2017-02-13 A kind of method that XML tag is quickly positioned and handled

Country Status (1)

Country Link
CN (1) CN108427676A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325149A (en) * 2018-09-30 2019-02-12 中国银行股份有限公司 XML message search method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1499368A (en) * 2002-11-11 2004-05-26 华为技术有限公司 Method for creating interface and generation system based on description
CN1896992A (en) * 2006-06-15 2007-01-17 Ut斯达康通讯有限公司 Method and device for analyzing XML file based on applied customization
CN102006512A (en) * 2010-10-29 2011-04-06 广东星海数字家庭产业技术研究院有限公司 Digital television HSML (Hypertext Service Markup Language) analysis method and system applying SAX (The Simple API for XML) analysis engine
CN102662725A (en) * 2012-03-15 2012-09-12 中国科学院软件研究所 Event-driven high concurrent process virtual machine realization method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1499368A (en) * 2002-11-11 2004-05-26 华为技术有限公司 Method for creating interface and generation system based on description
CN1896992A (en) * 2006-06-15 2007-01-17 Ut斯达康通讯有限公司 Method and device for analyzing XML file based on applied customization
CN102006512A (en) * 2010-10-29 2011-04-06 广东星海数字家庭产业技术研究院有限公司 Digital television HSML (Hypertext Service Markup Language) analysis method and system applying SAX (The Simple API for XML) analysis engine
CN102662725A (en) * 2012-03-15 2012-09-12 中国科学院软件研究所 Event-driven high concurrent process virtual machine realization method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张迪: "基于SAX的XML解析与应用", 《计算机与数字工程》 *
美斯坦福(中国)IT教育: "《SCM高级3G/4G通信工程师 项目实训》", 30 August 2012 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325149A (en) * 2018-09-30 2019-02-12 中国银行股份有限公司 XML message search method and device
CN109325149B (en) * 2018-09-30 2020-08-11 中国银行股份有限公司 XML message retrieval method and device

Similar Documents

Publication Publication Date Title
US8255394B2 (en) Apparatus, system, and method for efficient content indexing of streaming XML document content
Tidwell XSLT: mastering XML transformations
US7873663B2 (en) Methods and apparatus for converting a representation of XML and other markup language data to a data structure format
AU2003243169B2 (en) System and method for processing of XML documents represented as an event stream
US7877366B2 (en) Streaming XML data retrieval using XPath
US20050144556A1 (en) XML schema token extension for XML document compression
US8533172B2 (en) Method and device for coding and decoding information
CN107423391B (en) Information extraction method of webpage structured data
US8566343B2 (en) Searching backward to speed up query
US20120310868A1 (en) Method and system for extracting and managing information contained in electronic documents
Chiarcos et al. CoNLL-RDF: Linked corpora done in an NLP-friendly way
KR101311123B1 (en) Programmability for xml data store for documents
WO2005041072A1 (en) Expression grouping and evaluation
EP1723553A2 (en) Device for structured data transformation
WO2017058047A1 (en) Method of preparing documents in markup languages
CN101872350A (en) Web page text extracting method and device thereof
JP5800441B2 (en) Method and apparatus for document compression, decompression and query
Lee JXON: an architecture for schema and annotation driven json/xml bidirectional transformations
CN101996190B (en) Method and device for extracting information from webpage
CN108427676A (en) A kind of method that XML tag is quickly positioned and handled
JP2006127235A (en) Structured document management system, structured document management method and program
US20080082478A1 (en) System, method, and apparatus for retrieving structured document and apparatus for managing structured document
US8291392B2 (en) Dynamic specialization of XML parsing
Saadatfar et al. Best Practice for DSDL-based Validation
Šandrih et al. Towards Efficient and Unified XML/JSON Conversion-A New Conversion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180821