CN106469137A - XML document analysis method and device - Google Patents

XML document analysis method and device Download PDF

Info

Publication number
CN106469137A
CN106469137A CN201510512712.6A CN201510512712A CN106469137A CN 106469137 A CN106469137 A CN 106469137A CN 201510512712 A CN201510512712 A CN 201510512712A CN 106469137 A CN106469137 A CN 106469137A
Authority
CN
China
Prior art keywords
read
xml document
node
line
data line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510512712.6A
Other languages
Chinese (zh)
Inventor
马志远
郭汉磊
毛伟
邢志杰
高雷
卢文哲
马迪
王伟
童小海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd
INTERNET DOMAIN NAME SYSTEM BEIJING ENGINEERING RESEARCH CENTER LLC
Original Assignee
BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd
INTERNET DOMAIN NAME SYSTEM BEIJING ENGINEERING RESEARCH CENTER LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd, INTERNET DOMAIN NAME SYSTEM BEIJING ENGINEERING RESEARCH CENTER LLC filed Critical BEILONG KNET (BEIJING) TECHNOLOGY Co Ltd
Priority to CN201510512712.6A priority Critical patent/CN106469137A/en
Publication of CN106469137A publication Critical patent/CN106469137A/en
Pending legal-status Critical Current

Links

Abstract

The embodiment of the present invention provides a kind of XML document analysis method and device, and the method includes:Obtain XML document and read instruction, described reading instruction includes at least one line identifier to be read;According at least one line identifier to be read described, XML document reads out the described corresponding at least data line of at least one line identifier to be read;Described at least data line is converted to node tree, wherein, the element in described at least data line and attribute become the node on described node tree;Node on described node tree is parsed successively, obtains the analysis result of described XML document.Achieve the row data only needing to according to reading instruction reading needs reading, and whole document need not be read, greatly reduce the consumption of calculator memory, avoid the phenomenon of internal memory spilling, in addition, only need to for the row data of reading to be converted to node tree and parsed, analyzing efficiency can also be improved.

Description

XML document analysis method and device
Technical field
The present invention relates to language analytic technique, more particularly, to a kind of XML document analysis method and device.
Background technology
At present, extensible markup language (Extensible Markup Language, abbreviation XML) It is widely used, and wherein XML analytic technique is the key of XML application.Specifically, A kind of form that XML itself is simply encoded to data with plain text, wants to utilize XML, or Person says using coded data in XML file it is necessary to first parse data from plain text, Therefore, it is necessary to have a resolver being capable of identify that information in XML document, for explaining XML Document simultaneously extracts data therein.However, the different demands extracted according to data, exist multiple again Analysis mode, different analysis modes has respective pluses and minuses and suitable environment.It is suitable to select XML analytic technique can effectively lift the overall performance of application system.
XML analytic technique commonly used in the prior art has DOM Document Object Model (Document Object Model, abbreviation DOM) technology, specifically, during using DOM technology parsing XML document, Need first to read whole XML document, then again dissection process is carried out to whole XML document.
But, XML document is parsed using existing DOM technology, a large amount of of computer can be taken Internal memory, for jumbo XML document, results even in internal memory and overflows.
Content of the invention
The present invention provides a kind of XML document analysis method and device, for solving existing parsing XML literary composition The excessive problem of method committed memory of shelves.
First aspect present invention provides a kind of XML document analytic method, including:
Obtain expandable mark language XML document and read instruction, described reading instruction includes at least one Individual line identifier to be read;
According at least one line identifier to be read described, read out in XML document described at least one The corresponding at least data line of line identifier to be read;
Described at least data line is converted to node tree, wherein, element in described at least data line Become the node on described node tree with attribute;
Node on described node tree is parsed successively, obtains the analysis result of described XML document.
Second aspect present invention provides a kind of XML document resolver, including:
Acquisition module, reads instruction for obtaining expandable mark language XML document, described reading refers to Order includes at least one line identifier to be read;
Read module, for according at least one line identifier to be read described, reading in XML document Go out the described corresponding at least data line of at least one line identifier to be read;
Modular converter, for described at least data line is converted to node tree, wherein, described at least one Element in row data and attribute become the node on described node tree;
Parsing module, for parsing successively to the node on described node tree, obtains described XML The analysis result of document.
The XML document analysis method and device that the present invention provides, obtains XML document and reads instruction, should Read instruction and include at least one line identifier to be read, according at least one line identifier to be read above-mentioned, The corresponding at least data line of at least one line identifier to be read above-mentioned is read out in XML document, and This at least data line is converted to node tree, the node on this node tree is parsed, obtain parsing Result is it is achieved that only need to according to the row data reading instruction reading needs reading, and need not read Whole document, greatly reduces the consumption of calculator memory, it is to avoid the phenomenon that internal memory overflows, in addition, Only need to for the row data of reading to be converted to node tree and parsed, analyzing efficiency can also be improved.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality The accompanying drawing applying required use in example or description of the prior art be briefly described it should be apparent that, under Accompanying drawing in the description of face is some embodiments of the present invention, for those of ordinary skill in the art, On the premise of not paying creative labor, other accompanying drawings can also be obtained according to these accompanying drawings.
The schematic flow sheet of the XML document analytic method embodiment one that Fig. 1 provides for the present invention;
The structural representation of the XML document resolver embodiment one that Fig. 2 provides for the present invention;
The structural representation of the XML document resolver embodiment two that Fig. 3 provides for the present invention.
Specific embodiment
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with this Accompanying drawing in bright embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention, Obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of not making creative work The every other embodiment obtaining, broadly falls into the scope of protection of the invention.
The schematic flow sheet of the XML document analytic method embodiment one that Fig. 1 provides for the present invention, such as Fig. 1 Shown, the method includes:
S101, acquisition XML document read instruction, and this reading instruction includes at least one row to be read Mark.
Usually, the reading for reading XML document instructs can be one section of program, in this section of program Indicate the part needing to read in XML document to be read, specifically, row to be read can be passed through Identify and to indicate.
S102, according at least one line identifier to be read above-mentioned, read out in XML document above-mentioned extremely The corresponding at least data line of a few line identifier to be read.
It should be noted that XML document is made up of multirow data, can be with the line number of every row or key Word, as mark, is entered reading the line identifier to be read in instruction with the mark of each row data when reading Row coupling, to read out the corresponding at least data line of at least one line identifier to be read above-mentioned.
Specifically, can be by corresponding at least one line number of at least one line identifier to be read above-mentioned reading out Cache in (stringbuffer) space according to being first buffered in character string.
S103, above-mentioned at least data line is converted to node tree.Wherein, in above-mentioned at least data line Element and attribute become the node on this node tree.
By corresponding at least one line identifier to be read above-mentioned after at least data line has all read, will Data in stringbuffer is configured to node tree.In concrete building process, can be directly according to XML The logical relation of document script is built, father and son that will be in XML document between each element, attribute Relation, brotherhood etc. are rendered as node tree.
S104, the node on above-mentioned node tree is parsed successively, obtain the parsing of this XML document Result.This is not restricted for the process specifically node being parsed.
In the present embodiment, obtain XML document and read instruction, this reading instruction includes at least one and treats Read line identifier, according at least one line identifier to be read above-mentioned, read out above-mentioned in XML document The corresponding at least data line of at least one line identifier to be read, and this at least data line is converted to section Point tree, parses to the node on this node tree, obtains analysis result it is achieved that only needing to according to reading The row data needing to read is read in instruction fetch, and need not read whole document, greatly reduces calculating The consumption of machine internal memory, it is to avoid the phenomenon that internal memory overflows, in addition it is only necessary to the row data conversion that will read Parsed for node tree, analyzing efficiency can also be improved.
Specifically, above-mentioned according at least one line identifier to be read above-mentioned, read out in XML document The corresponding at least data line of above-mentioned at least one line identifier to be read, Ke Yishi, according to above-mentioned at least one Individual line identifier to be read, from the beginning of the first row data of above-mentioned XML document, travels through this XML literary composition line by line Shelves, read the corresponding at least a line of at least one line identifier to be read above-mentioned successively from this XML document Data.In concrete reading process, when reading certain row data, see the mark and above-mentioned at least of the row data Certain line identifier to be read in individual line identifier to be read is identical, then read out this row data, be temporarily stored into In stringbuffer space.More specifically, often reading data line, just this row data is inserted stringbuffer In space, until corresponding at least one line identifier to be read above-mentioned at least data line has all been read Then stop reading.
Further, above-mentioned node on above-mentioned node tree is parsed successively, obtain this XML literary composition The analysis result of shelves, can be specifically:Travel through all nodes on this node tree, successively each node is entered Row parsing, obtains the analysis result of this XML document.During implementing, the corresponding unit to node Element or attribute are parsed, and generate object and are stored in internal memory.
In order to preferably save memory source, above-mentioned, node on above-mentioned node tree is parsed successively, After obtaining the analysis result of this XML document, the above-mentioned at least data line being read is discharged. Specifically, the above-mentioned at least data line of interim storage in above-mentioned stringbuffer space is discharged, To save space.
The structural representation of the XML document resolver embodiment one that Fig. 2 provides for the present invention, such as Fig. 2 Shown, this device includes:Acquisition module 201, read module 202, modular converter 203 and parsing module 204, wherein:
Acquisition module 201, reads instruction, described reading for obtaining expandable mark language XML document Instruction includes at least one line identifier to be read.
Read module 202, for according at least one line identifier to be read described, reading in XML document Take out the described corresponding at least data line of at least one line identifier to be read.
Modular converter 203, for described at least data line is converted to node tree, wherein, described extremely Element in few data line and attribute become the node on described node tree.
Parsing module 204, for parsing successively to the node on described node tree, obtains described XML The analysis result of document.
In the present embodiment, obtain XML document and read instruction, this reading instruction includes at least one and treats Read line identifier, according at least one line identifier to be read above-mentioned, read out above-mentioned in XML document The corresponding at least data line of at least one line identifier to be read, and this at least data line is converted to section Point tree, parses to the node on this node tree, obtains analysis result it is achieved that only needing to according to reading The row data needing to read is read in instruction fetch, and need not read whole document, greatly reduces calculating The consumption of machine internal memory, it is to avoid the phenomenon that internal memory overflows, in addition it is only necessary to the row data conversion that will read Parsed for node tree, analyzing efficiency can also be improved.
Further, read module 202, specifically for according at least one line identifier to be read described, From the beginning of the first row data of described XML document, travel through described XML document line by line, and successively from institute State and in XML document, read out the described corresponding at least data line of at least one line identifier to be read.
Parsing module 204, specifically for traveling through all nodes on described node tree, successively to each described Node is parsed, and obtains the analysis result of described XML document.
The structural representation of the XML document resolver embodiment two that Fig. 3 provides for the present invention, such as Fig. 3 Shown, on the basis of Fig. 2, this device also includes:Release module 301.
Release module 301, for solving successively to the node on described node tree in parsing module 204 Analysis, obtain described XML document analysis result after, will be read described at least data line enter Row release.
This device is used for executing preceding method embodiment, and, with to realize principle similar, here is or not its technique effect Repeat again.
It should be understood that disclosed apparatus and method in several embodiments provided by the present invention, Can realize by another way.For example, device embodiment described above is only schematically, For example, the division of described unit, only a kind of division of logic function, in addition actual can have when realizing Dividing mode, for example multiple units or assembly can in conjunction with or be desirably integrated into another system, or Some features can be ignored, or does not execute.Another, shown or discussed coupling each other or Direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or communication link Connect, can be electrical, mechanical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, make For the part that unit shows can be or may not be physical location, you can with positioned at a place, Or can also be distributed on multiple NEs.Can select according to the actual needs part therein or The whole unit of person is realizing the purpose of this embodiment scheme.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the present invention, Can also be that unit is individually physically present it is also possible to two or more units are integrated in a list In unit.Above-mentioned integrated unit both can be to be realized in the form of hardware, it would however also be possible to employ hardware adds software The form of functional unit is realized.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in a computer In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers Order is with so that a computer equipment (can be personal computer, server, or network equipment etc.) Or processor (English:Processor the part steps of each embodiment methods described of the present invention) are executed. And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (English:Read-Only Memory, referred to as:ROM), random access memory (English:Random Access Memory, Referred to as:RAM), magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is only in order to illustrating technical scheme rather than right It limits;Although being described in detail to the present invention with reference to foregoing embodiments, this area common Technical staff should be understood:It still can be modified to the technical scheme described in foregoing embodiments, Or equivalent is carried out to wherein some or all of technical characteristic;And these modifications or replacement, and Do not make the scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution.

Claims (8)

1. a kind of XML document analytic method is it is characterised in that include:
Obtain expandable mark language XML document and read instruction, described reading instruction includes at least one Individual line identifier to be read;
According at least one line identifier to be read described, read out in XML document described at least one The corresponding at least data line of line identifier to be read;
Described at least data line is converted to node tree, wherein, element in described at least data line Become the node on described node tree with attribute;
Node on described node tree is parsed successively, obtains the analysis result of described XML document.
2. method according to claim 1 is it is characterised in that at least one is treated described in described basis Read line identifier, XML document reads out at least one line identifier to be read described corresponding at least Data line, including:
According at least one line identifier to be read described, from the beginning of the first row data of described XML document, Travel through described XML document line by line, and read out from described XML document successively described at least one treat Read the corresponding at least data line of line identifier.
3. method according to claim 1 it is characterised in that described to the section on described node tree Point is parsed successively, obtains the analysis result of described XML document, including:
Travel through all nodes on described node tree, successively each described node is parsed, obtain described The analysis result of XML document.
4. the method according to any one of claim 1-3 it is characterised in that described to described node Node on tree is parsed successively, after obtaining the analysis result of described XML document, also includes:
Described in being read, at least data line is discharged.
5. a kind of XML document resolver is it is characterised in that include:
Acquisition module, reads instruction for obtaining expandable mark language XML document, described reading refers to Order includes at least one line identifier to be read;
Read module, for according at least one line identifier to be read described, reading in XML document Go out the described corresponding at least data line of at least one line identifier to be read;
Modular converter, for described at least data line is converted to node tree, wherein, described at least one Element in row data and attribute become the node on described node tree;
Parsing module, for parsing successively to the node on described node tree, obtains described XML The analysis result of document.
6. device according to claim 5 is it is characterised in that described read module, specifically for According at least one line identifier to be read described, from the beginning of the first row data of described XML document, by Row travels through described XML document, and read out from described XML document successively described at least one continue Take the corresponding at least data line of line identifier.
7. device according to claim 5 is it is characterised in that described parsing module, specifically for Travel through all nodes on described node tree, successively each described node is parsed, obtain described XML The analysis result of document.
8. the device according to any one of claim 5-7 is it is characterised in that described device also includes: Release module;
Described release module, for solving successively to the node on described node tree in described parsing module Analysis, obtain described XML document analysis result after, will be read described at least data line enter Row release.
CN201510512712.6A 2015-08-19 2015-08-19 XML document analysis method and device Pending CN106469137A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510512712.6A CN106469137A (en) 2015-08-19 2015-08-19 XML document analysis method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510512712.6A CN106469137A (en) 2015-08-19 2015-08-19 XML document analysis method and device

Publications (1)

Publication Number Publication Date
CN106469137A true CN106469137A (en) 2017-03-01

Family

ID=58228759

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510512712.6A Pending CN106469137A (en) 2015-08-19 2015-08-19 XML document analysis method and device

Country Status (1)

Country Link
CN (1) CN106469137A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255494A (en) * 2018-01-30 2018-07-06 平安科技(深圳)有限公司 A kind of XML file analytic method, device, computer equipment and storage medium
CN110750960A (en) * 2018-07-05 2020-02-04 武汉斗鱼网络科技有限公司 Configuration file analysis method, storage medium, electronic device and system
CN111651406A (en) * 2020-05-21 2020-09-11 杭州明讯软件技术有限公司 Automatic carrier scheduling system file reading method and device
CN113128178A (en) * 2019-12-31 2021-07-16 安徽佰通教育科技发展有限公司 Method for analyzing office file through xml document

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777045A (en) * 2008-09-01 2010-07-14 西北工业大学 Method for analyzing XML file by indexing
CN102195959A (en) * 2010-03-11 2011-09-21 中兴通讯股份有限公司 Method and device for resolving extensible markup language (XML) data of session initiation protocol (SIP) signaling
CN102411602A (en) * 2011-08-15 2012-04-11 浙江大学 Extensive makeup language (XML) parallel speculation analysis method realized on basis of field programmable gate array (FPGA)
CN102841886A (en) * 2011-06-21 2012-12-26 北大方正集团有限公司 Method and device for splitting document
CN103635897A (en) * 2011-06-23 2014-03-12 微软公司 Dynamically updating a running page
CN104391796A (en) * 2014-12-05 2015-03-04 上海斐讯数据通信技术有限公司 Method for parsing test cases
CN104636265A (en) * 2015-01-21 2015-05-20 广东电网有限责任公司电力科学研究院 Access method for efficient memory model organization of CIMXML document

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101777045A (en) * 2008-09-01 2010-07-14 西北工业大学 Method for analyzing XML file by indexing
CN102195959A (en) * 2010-03-11 2011-09-21 中兴通讯股份有限公司 Method and device for resolving extensible markup language (XML) data of session initiation protocol (SIP) signaling
CN102841886A (en) * 2011-06-21 2012-12-26 北大方正集团有限公司 Method and device for splitting document
CN103635897A (en) * 2011-06-23 2014-03-12 微软公司 Dynamically updating a running page
CN102411602A (en) * 2011-08-15 2012-04-11 浙江大学 Extensive makeup language (XML) parallel speculation analysis method realized on basis of field programmable gate array (FPGA)
CN104391796A (en) * 2014-12-05 2015-03-04 上海斐讯数据通信技术有限公司 Method for parsing test cases
CN104636265A (en) * 2015-01-21 2015-05-20 广东电网有限责任公司电力科学研究院 Access method for efficient memory model organization of CIMXML document

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
范书义 等: "XML文件解析中SAX和DOM的结合应用", 《微型电脑应用》 *
达尔吉 等: "《无线传感器网络基础 理论和实践》", 31 January 2014 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108255494A (en) * 2018-01-30 2018-07-06 平安科技(深圳)有限公司 A kind of XML file analytic method, device, computer equipment and storage medium
WO2019148671A1 (en) * 2018-01-30 2019-08-08 平安科技(深圳)有限公司 Xml file parsing method, device, computer apparatus, and storage medium
CN110750960A (en) * 2018-07-05 2020-02-04 武汉斗鱼网络科技有限公司 Configuration file analysis method, storage medium, electronic device and system
CN113128178A (en) * 2019-12-31 2021-07-16 安徽佰通教育科技发展有限公司 Method for analyzing office file through xml document
CN111651406A (en) * 2020-05-21 2020-09-11 杭州明讯软件技术有限公司 Automatic carrier scheduling system file reading method and device

Similar Documents

Publication Publication Date Title
US10664660B2 (en) Method and device for extracting entity relation based on deep learning, and server
KR102170929B1 (en) User keyword extraction device, method, and computer-readable storage medium
JP6936888B2 (en) Training corpus generation methods, devices, equipment and storage media
US8375061B2 (en) Graphical models for representing text documents for computer analysis
US9396172B2 (en) Method for data chunk partitioning in XML parsing and method for XML parsing
CN106469137A (en) XML document analysis method and device
KR101617696B1 (en) Method and device for mining data regular expression
CN106682036A (en) Data exchange system and exchange method thereof
US20200193083A1 (en) Analyzing Document Content and Generating an Appendix
CN103995885A (en) Method and device for recognizing entity names
CN110196884A (en) Method for writing data, storage medium and electronic equipment based on distributed data base
CN102999480A (en) Method and system for editing document
CN108021632A (en) Unstructured data and the mutual conversion process method of structural data
CN110347390B (en) Method, storage medium, equipment and system for rapidly generating WEB page
CN112528013A (en) Text abstract extraction method and device, electronic equipment and storage medium
CN109901978A (en) A kind of Hadoop log lossless compression method and system
CN106844313A (en) A kind of method and apparatus that Word file is converted into html file
CN106293862B (en) A kind of analysis method and device of expandable mark language XML data
CN110119410A (en) Processing method and processing device, computer equipment and the storage medium of reference book data
CN105488171A (en) SSH (Secure Shell)-based batch uploading method for test questions of online education website
CN104536947A (en) Layout document processing method and device
KR101331383B1 (en) Method and apparatus for processing data
CN103377187A (en) Method, device and program for paragraph segmentation
CN109491679B (en) CPLD online upgrading method and device
CN104268093A (en) Memory allocation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170301

RJ01 Rejection of invention patent application after publication