CN106469137A

CN106469137A - XML document analysis method and device

Info

Publication number: CN106469137A
Application number: CN201510512712.6A
Authority: CN
Inventors: 马志远; 郭汉磊; 毛伟; 邢志杰; 高雷; 卢文哲; 马迪; 王伟; 童小海
Original assignee: BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd; INTERNET DOMAIN NAME SYSTEM BEIJING ENGINEERING RESEARCH CENTER LLC
Current assignee: BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd; INTERNET DOMAIN NAME SYSTEM BEIJING ENGINEERING RESEARCH CENTER LLC
Priority date: 2015-08-19
Filing date: 2015-08-19
Publication date: 2017-03-01

Abstract

The embodiment of the present invention provides a kind of XML document analysis method and device, and the method includes：Obtain XML document and read instruction, described reading instruction includes at least one line identifier to be read；According at least one line identifier to be read described, XML document reads out the described corresponding at least data line of at least one line identifier to be read；Described at least data line is converted to node tree, wherein, the element in described at least data line and attribute become the node on described node tree；Node on described node tree is parsed successively, obtains the analysis result of described XML document.Achieve the row data only needing to according to reading instruction reading needs reading, and whole document need not be read, greatly reduce the consumption of calculator memory, avoid the phenomenon of internal memory spilling, in addition, only need to for the row data of reading to be converted to node tree and parsed, analyzing efficiency can also be improved.

Description

XML document analysis method and device

Technical field

The present invention relates to language analytic technique, more particularly, to a kind of XML document analysis method and device.

Background technology

At present, extensible markup language (Extensible Markup Language, abbreviation XML) It is widely used, and wherein XML analytic technique is the key of XML application.Specifically, A kind of form that XML itself is simply encoded to data with plain text, wants to utilize XML, or Person says using coded data in XML file it is necessary to first parse data from plain text, Therefore, it is necessary to have a resolver being capable of identify that information in XML document, for explaining XML Document simultaneously extracts data therein.However, the different demands extracted according to data, exist multiple again Analysis mode, different analysis modes has respective pluses and minuses and suitable environment.It is suitable to select XML analytic technique can effectively lift the overall performance of application system.

XML analytic technique commonly used in the prior art has DOM Document Object Model (Document Object Model, abbreviation DOM) technology, specifically, during using DOM technology parsing XML document, Need first to read whole XML document, then again dissection process is carried out to whole XML document.

But, XML document is parsed using existing DOM technology, a large amount of of computer can be taken Internal memory, for jumbo XML document, results even in internal memory and overflows.

Content of the invention

The present invention provides a kind of XML document analysis method and device, for solving existing parsing XML literary composition The excessive problem of method committed memory of shelves.

First aspect present invention provides a kind of XML document analytic method, including：

Obtain expandable mark language XML document and read instruction, described reading instruction includes at least one Individual line identifier to be read；

According at least one line identifier to be read described, read out in XML document described at least one The corresponding at least data line of line identifier to be read；

Described at least data line is converted to node tree, wherein, element in described at least data line Become the node on described node tree with attribute；

Node on described node tree is parsed successively, obtains the analysis result of described XML document.

Second aspect present invention provides a kind of XML document resolver, including：

Acquisition module, reads instruction for obtaining expandable mark language XML document, described reading refers to Order includes at least one line identifier to be read；

Read module, for according at least one line identifier to be read described, reading in XML document Go out the described corresponding at least data line of at least one line identifier to be read；

Modular converter, for described at least data line is converted to node tree, wherein, described at least one Element in row data and attribute become the node on described node tree；

Parsing module, for parsing successively to the node on described node tree, obtains described XML The analysis result of document.

The XML document analysis method and device that the present invention provides, obtains XML document and reads instruction, should Read instruction and include at least one line identifier to be read, according at least one line identifier to be read above-mentioned, The corresponding at least data line of at least one line identifier to be read above-mentioned is read out in XML document, and This at least data line is converted to node tree, the node on this node tree is parsed, obtain parsing Result is it is achieved that only need to according to the row data reading instruction reading needs reading, and need not read Whole document, greatly reduces the consumption of calculator memory, it is to avoid the phenomenon that internal memory overflows, in addition, Only need to for the row data of reading to be converted to node tree and parsed, analyzing efficiency can also be improved.

Brief description

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality The accompanying drawing applying required use in example or description of the prior art be briefly described it should be apparent that, under Accompanying drawing in the description of face is some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative labor, other accompanying drawings can also be obtained according to these accompanying drawings.

The schematic flow sheet of the XML document analytic method embodiment one that Fig. 1 provides for the present invention；

The structural representation of the XML document resolver embodiment one that Fig. 2 provides for the present invention；

The structural representation of the XML document resolver embodiment two that Fig. 3 provides for the present invention.

Specific embodiment

Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of not making creative work The every other embodiment obtaining, broadly falls into the scope of protection of the invention.

The schematic flow sheet of the XML document analytic method embodiment one that Fig. 1 provides for the present invention, such as Fig. 1 Shown, the method includes：

S101, acquisition XML document read instruction, and this reading instruction includes at least one row to be read Mark.

Usually, the reading for reading XML document instructs can be one section of program, in this section of program Indicate the part needing to read in XML document to be read, specifically, row to be read can be passed through Identify and to indicate.

S102, according at least one line identifier to be read above-mentioned, read out in XML document above-mentioned extremely The corresponding at least data line of a few line identifier to be read.

It should be noted that XML document is made up of multirow data, can be with the line number of every row or key Word, as mark, is entered reading the line identifier to be read in instruction with the mark of each row data when reading Row coupling, to read out the corresponding at least data line of at least one line identifier to be read above-mentioned.

Specifically, can be by corresponding at least one line number of at least one line identifier to be read above-mentioned reading out Cache in (stringbuffer) space according to being first buffered in character string.

S103, above-mentioned at least data line is converted to node tree.Wherein, in above-mentioned at least data line Element and attribute become the node on this node tree.

By corresponding at least one line identifier to be read above-mentioned after at least data line has all read, will Data in stringbuffer is configured to node tree.In concrete building process, can be directly according to XML The logical relation of document script is built, father and son that will be in XML document between each element, attribute Relation, brotherhood etc. are rendered as node tree.

S104, the node on above-mentioned node tree is parsed successively, obtain the parsing of this XML document Result.This is not restricted for the process specifically node being parsed.

In the present embodiment, obtain XML document and read instruction, this reading instruction includes at least one and treats Read line identifier, according at least one line identifier to be read above-mentioned, read out above-mentioned in XML document The corresponding at least data line of at least one line identifier to be read, and this at least data line is converted to section Point tree, parses to the node on this node tree, obtains analysis result it is achieved that only needing to according to reading The row data needing to read is read in instruction fetch, and need not read whole document, greatly reduces calculating The consumption of machine internal memory, it is to avoid the phenomenon that internal memory overflows, in addition it is only necessary to the row data conversion that will read Parsed for node tree, analyzing efficiency can also be improved.

Specifically, above-mentioned according at least one line identifier to be read above-mentioned, read out in XML document The corresponding at least data line of above-mentioned at least one line identifier to be read, Ke Yishi, according to above-mentioned at least one Individual line identifier to be read, from the beginning of the first row data of above-mentioned XML document, travels through this XML literary composition line by line Shelves, read the corresponding at least a line of at least one line identifier to be read above-mentioned successively from this XML document Data.In concrete reading process, when reading certain row data, see the mark and above-mentioned at least of the row data Certain line identifier to be read in individual line identifier to be read is identical, then read out this row data, be temporarily stored into In stringbuffer space.More specifically, often reading data line, just this row data is inserted stringbuffer In space, until corresponding at least one line identifier to be read above-mentioned at least data line has all been read Then stop reading.

Further, above-mentioned node on above-mentioned node tree is parsed successively, obtain this XML literary composition The analysis result of shelves, can be specifically：Travel through all nodes on this node tree, successively each node is entered Row parsing, obtains the analysis result of this XML document.During implementing, the corresponding unit to node Element or attribute are parsed, and generate object and are stored in internal memory.

In order to preferably save memory source, above-mentioned, node on above-mentioned node tree is parsed successively, After obtaining the analysis result of this XML document, the above-mentioned at least data line being read is discharged. Specifically, the above-mentioned at least data line of interim storage in above-mentioned stringbuffer space is discharged, To save space.

The structural representation of the XML document resolver embodiment one that Fig. 2 provides for the present invention, such as Fig. 2 Shown, this device includes：Acquisition module 201, read module 202, modular converter 203 and parsing module 204, wherein：

Acquisition module 201, reads instruction, described reading for obtaining expandable mark language XML document Instruction includes at least one line identifier to be read.

Read module 202, for according at least one line identifier to be read described, reading in XML document Take out the described corresponding at least data line of at least one line identifier to be read.

Modular converter 203, for described at least data line is converted to node tree, wherein, described extremely Element in few data line and attribute become the node on described node tree.

Parsing module 204, for parsing successively to the node on described node tree, obtains described XML The analysis result of document.

Further, read module 202, specifically for according at least one line identifier to be read described, From the beginning of the first row data of described XML document, travel through described XML document line by line, and successively from institute State and in XML document, read out the described corresponding at least data line of at least one line identifier to be read.

Parsing module 204, specifically for traveling through all nodes on described node tree, successively to each described Node is parsed, and obtains the analysis result of described XML document.

The structural representation of the XML document resolver embodiment two that Fig. 3 provides for the present invention, such as Fig. 3 Shown, on the basis of Fig. 2, this device also includes：Release module 301.

Release module 301, for solving successively to the node on described node tree in parsing module 204 Analysis, obtain described XML document analysis result after, will be read described at least data line enter Row release.

This device is used for executing preceding method embodiment, and, with to realize principle similar, here is or not its technique effect Repeat again.

It should be understood that disclosed apparatus and method in several embodiments provided by the present invention, Can realize by another way.For example, device embodiment described above is only schematically, For example, the division of described unit, only a kind of division of logic function, in addition actual can have when realizing Dividing mode, for example multiple units or assembly can in conjunction with or be desirably integrated into another system, or Some features can be ignored, or does not execute.Another, shown or discussed coupling each other or Direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or communication link Connect, can be electrical, mechanical or other forms.

The described unit illustrating as separating component can be or may not be physically separate, make For the part that unit shows can be or may not be physical location, you can with positioned at a place, Or can also be distributed on multiple NEs.Can select according to the actual needs part therein or The whole unit of person is realizing the purpose of this embodiment scheme.

In addition, can be integrated in a processing unit in each functional unit in each embodiment of the present invention, Can also be that unit is individually physically present it is also possible to two or more units are integrated in a list In unit.Above-mentioned integrated unit both can be to be realized in the form of hardware, it would however also be possible to employ hardware adds software The form of functional unit is realized.

The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in a computer In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers Order is with so that a computer equipment (can be personal computer, server, or network equipment etc.) Or processor (English：Processor the part steps of each embodiment methods described of the present invention) are executed. And aforesaid storage medium includes：USB flash disk, portable hard drive, read only memory (English：Read-Only Memory, referred to as：ROM), random access memory (English：Random Access Memory, Referred to as：RAM), magnetic disc or CD etc. are various can be with the medium of store program codes.

Finally it should be noted that：Various embodiments above is only in order to illustrating technical scheme rather than right It limits；Although being described in detail to the present invention with reference to foregoing embodiments, this area common Technical staff should be understood：It still can be modified to the technical scheme described in foregoing embodiments, Or equivalent is carried out to wherein some or all of technical characteristic；And these modifications or replacement, and Do not make the scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution.

Claims

1. a kind of XML document analytic method is it is characterised in that include：

2. method according to claim 1 is it is characterised in that at least one is treated described in described basis Read line identifier, XML document reads out at least one line identifier to be read described corresponding at least Data line, including：

According at least one line identifier to be read described, from the beginning of the first row data of described XML document, Travel through described XML document line by line, and read out from described XML document successively described at least one treat Read the corresponding at least data line of line identifier.

3. method according to claim 1 it is characterised in that described to the section on described node tree Point is parsed successively, obtains the analysis result of described XML document, including：

Travel through all nodes on described node tree, successively each described node is parsed, obtain described The analysis result of XML document.

4. the method according to any one of claim 1-3 it is characterised in that described to described node Node on tree is parsed successively, after obtaining the analysis result of described XML document, also includes：

Described in being read, at least data line is discharged.

5. a kind of XML document resolver is it is characterised in that include：

6. device according to claim 5 is it is characterised in that described read module, specifically for According at least one line identifier to be read described, from the beginning of the first row data of described XML document, by Row travels through described XML document, and read out from described XML document successively described at least one continue Take the corresponding at least data line of line identifier.

7. device according to claim 5 is it is characterised in that described parsing module, specifically for Travel through all nodes on described node tree, successively each described node is parsed, obtain described XML The analysis result of document.

8. the device according to any one of claim 5-7 is it is characterised in that described device also includes： Release module；

Described release module, for solving successively to the node on described node tree in described parsing module Analysis, obtain described XML document analysis result after, will be read described at least data line enter Row release.