CN106469137A - XML document analysis method and device - Google Patents
XML document analysis method and device Download PDFInfo
- Publication number
- CN106469137A CN106469137A CN201510512712.6A CN201510512712A CN106469137A CN 106469137 A CN106469137 A CN 106469137A CN 201510512712 A CN201510512712 A CN 201510512712A CN 106469137 A CN106469137 A CN 106469137A
- Authority
- CN
- China
- Prior art keywords
- read
- xml document
- node
- line
- data line
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Abstract
The embodiment of the present invention provides a kind of XML document analysis method and device, and the method includes:Obtain XML document and read instruction, described reading instruction includes at least one line identifier to be read;According at least one line identifier to be read described, XML document reads out the described corresponding at least data line of at least one line identifier to be read;Described at least data line is converted to node tree, wherein, the element in described at least data line and attribute become the node on described node tree;Node on described node tree is parsed successively, obtains the analysis result of described XML document.Achieve the row data only needing to according to reading instruction reading needs reading, and whole document need not be read, greatly reduce the consumption of calculator memory, avoid the phenomenon of internal memory spilling, in addition, only need to for the row data of reading to be converted to node tree and parsed, analyzing efficiency can also be improved.
Description
Technical field
The present invention relates to language analytic technique, more particularly, to a kind of XML document analysis method and device.
Background technology
At present, extensible markup language (Extensible Markup Language, abbreviation XML)
It is widely used, and wherein XML analytic technique is the key of XML application.Specifically,
A kind of form that XML itself is simply encoded to data with plain text, wants to utilize XML, or
Person says using coded data in XML file it is necessary to first parse data from plain text,
Therefore, it is necessary to have a resolver being capable of identify that information in XML document, for explaining XML
Document simultaneously extracts data therein.However, the different demands extracted according to data, exist multiple again
Analysis mode, different analysis modes has respective pluses and minuses and suitable environment.It is suitable to select
XML analytic technique can effectively lift the overall performance of application system.
XML analytic technique commonly used in the prior art has DOM Document Object Model (Document Object
Model, abbreviation DOM) technology, specifically, during using DOM technology parsing XML document,
Need first to read whole XML document, then again dissection process is carried out to whole XML document.
But, XML document is parsed using existing DOM technology, a large amount of of computer can be taken
Internal memory, for jumbo XML document, results even in internal memory and overflows.
Content of the invention
The present invention provides a kind of XML document analysis method and device, for solving existing parsing XML literary composition
The excessive problem of method committed memory of shelves.
First aspect present invention provides a kind of XML document analytic method, including:
Obtain expandable mark language XML document and read instruction, described reading instruction includes at least one
Individual line identifier to be read;
According at least one line identifier to be read described, read out in XML document described at least one
The corresponding at least data line of line identifier to be read;
Described at least data line is converted to node tree, wherein, element in described at least data line
Become the node on described node tree with attribute;
Node on described node tree is parsed successively, obtains the analysis result of described XML document.
Second aspect present invention provides a kind of XML document resolver, including:
Acquisition module, reads instruction for obtaining expandable mark language XML document, described reading refers to
Order includes at least one line identifier to be read;
Read module, for according at least one line identifier to be read described, reading in XML document
Go out the described corresponding at least data line of at least one line identifier to be read;
Modular converter, for described at least data line is converted to node tree, wherein, described at least one
Element in row data and attribute become the node on described node tree;
Parsing module, for parsing successively to the node on described node tree, obtains described XML
The analysis result of document.
The XML document analysis method and device that the present invention provides, obtains XML document and reads instruction, should
Read instruction and include at least one line identifier to be read, according at least one line identifier to be read above-mentioned,
The corresponding at least data line of at least one line identifier to be read above-mentioned is read out in XML document, and
This at least data line is converted to node tree, the node on this node tree is parsed, obtain parsing
Result is it is achieved that only need to according to the row data reading instruction reading needs reading, and need not read
Whole document, greatly reduces the consumption of calculator memory, it is to avoid the phenomenon that internal memory overflows, in addition,
Only need to for the row data of reading to be converted to node tree and parsed, analyzing efficiency can also be improved.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to reality
The accompanying drawing applying required use in example or description of the prior art be briefly described it should be apparent that, under
Accompanying drawing in the description of face is some embodiments of the present invention, for those of ordinary skill in the art,
On the premise of not paying creative labor, other accompanying drawings can also be obtained according to these accompanying drawings.
The schematic flow sheet of the XML document analytic method embodiment one that Fig. 1 provides for the present invention;
The structural representation of the XML document resolver embodiment one that Fig. 2 provides for the present invention;
The structural representation of the XML document resolver embodiment two that Fig. 3 provides for the present invention.
Specific embodiment
Purpose, technical scheme and advantage for making the embodiment of the present invention are clearer, below in conjunction with this
Accompanying drawing in bright embodiment, is clearly and completely described to the technical scheme in the embodiment of the present invention,
Obviously, described embodiment is a part of embodiment of the present invention, rather than whole embodiments.It is based on
Embodiment in the present invention, those of ordinary skill in the art are obtained under the premise of not making creative work
The every other embodiment obtaining, broadly falls into the scope of protection of the invention.
The schematic flow sheet of the XML document analytic method embodiment one that Fig. 1 provides for the present invention, such as Fig. 1
Shown, the method includes:
S101, acquisition XML document read instruction, and this reading instruction includes at least one row to be read
Mark.
Usually, the reading for reading XML document instructs can be one section of program, in this section of program
Indicate the part needing to read in XML document to be read, specifically, row to be read can be passed through
Identify and to indicate.
S102, according at least one line identifier to be read above-mentioned, read out in XML document above-mentioned extremely
The corresponding at least data line of a few line identifier to be read.
It should be noted that XML document is made up of multirow data, can be with the line number of every row or key
Word, as mark, is entered reading the line identifier to be read in instruction with the mark of each row data when reading
Row coupling, to read out the corresponding at least data line of at least one line identifier to be read above-mentioned.
Specifically, can be by corresponding at least one line number of at least one line identifier to be read above-mentioned reading out
Cache in (stringbuffer) space according to being first buffered in character string.
S103, above-mentioned at least data line is converted to node tree.Wherein, in above-mentioned at least data line
Element and attribute become the node on this node tree.
By corresponding at least one line identifier to be read above-mentioned after at least data line has all read, will
Data in stringbuffer is configured to node tree.In concrete building process, can be directly according to XML
The logical relation of document script is built, father and son that will be in XML document between each element, attribute
Relation, brotherhood etc. are rendered as node tree.
S104, the node on above-mentioned node tree is parsed successively, obtain the parsing of this XML document
Result.This is not restricted for the process specifically node being parsed.
In the present embodiment, obtain XML document and read instruction, this reading instruction includes at least one and treats
Read line identifier, according at least one line identifier to be read above-mentioned, read out above-mentioned in XML document
The corresponding at least data line of at least one line identifier to be read, and this at least data line is converted to section
Point tree, parses to the node on this node tree, obtains analysis result it is achieved that only needing to according to reading
The row data needing to read is read in instruction fetch, and need not read whole document, greatly reduces calculating
The consumption of machine internal memory, it is to avoid the phenomenon that internal memory overflows, in addition it is only necessary to the row data conversion that will read
Parsed for node tree, analyzing efficiency can also be improved.
Specifically, above-mentioned according at least one line identifier to be read above-mentioned, read out in XML document
The corresponding at least data line of above-mentioned at least one line identifier to be read, Ke Yishi, according to above-mentioned at least one
Individual line identifier to be read, from the beginning of the first row data of above-mentioned XML document, travels through this XML literary composition line by line
Shelves, read the corresponding at least a line of at least one line identifier to be read above-mentioned successively from this XML document
Data.In concrete reading process, when reading certain row data, see the mark and above-mentioned at least of the row data
Certain line identifier to be read in individual line identifier to be read is identical, then read out this row data, be temporarily stored into
In stringbuffer space.More specifically, often reading data line, just this row data is inserted stringbuffer
In space, until corresponding at least one line identifier to be read above-mentioned at least data line has all been read
Then stop reading.
Further, above-mentioned node on above-mentioned node tree is parsed successively, obtain this XML literary composition
The analysis result of shelves, can be specifically:Travel through all nodes on this node tree, successively each node is entered
Row parsing, obtains the analysis result of this XML document.During implementing, the corresponding unit to node
Element or attribute are parsed, and generate object and are stored in internal memory.
In order to preferably save memory source, above-mentioned, node on above-mentioned node tree is parsed successively,
After obtaining the analysis result of this XML document, the above-mentioned at least data line being read is discharged.
Specifically, the above-mentioned at least data line of interim storage in above-mentioned stringbuffer space is discharged,
To save space.
The structural representation of the XML document resolver embodiment one that Fig. 2 provides for the present invention, such as Fig. 2
Shown, this device includes:Acquisition module 201, read module 202, modular converter 203 and parsing module
204, wherein:
Acquisition module 201, reads instruction, described reading for obtaining expandable mark language XML document
Instruction includes at least one line identifier to be read.
Read module 202, for according at least one line identifier to be read described, reading in XML document
Take out the described corresponding at least data line of at least one line identifier to be read.
Modular converter 203, for described at least data line is converted to node tree, wherein, described extremely
Element in few data line and attribute become the node on described node tree.
Parsing module 204, for parsing successively to the node on described node tree, obtains described XML
The analysis result of document.
In the present embodiment, obtain XML document and read instruction, this reading instruction includes at least one and treats
Read line identifier, according at least one line identifier to be read above-mentioned, read out above-mentioned in XML document
The corresponding at least data line of at least one line identifier to be read, and this at least data line is converted to section
Point tree, parses to the node on this node tree, obtains analysis result it is achieved that only needing to according to reading
The row data needing to read is read in instruction fetch, and need not read whole document, greatly reduces calculating
The consumption of machine internal memory, it is to avoid the phenomenon that internal memory overflows, in addition it is only necessary to the row data conversion that will read
Parsed for node tree, analyzing efficiency can also be improved.
Further, read module 202, specifically for according at least one line identifier to be read described,
From the beginning of the first row data of described XML document, travel through described XML document line by line, and successively from institute
State and in XML document, read out the described corresponding at least data line of at least one line identifier to be read.
Parsing module 204, specifically for traveling through all nodes on described node tree, successively to each described
Node is parsed, and obtains the analysis result of described XML document.
The structural representation of the XML document resolver embodiment two that Fig. 3 provides for the present invention, such as Fig. 3
Shown, on the basis of Fig. 2, this device also includes:Release module 301.
Release module 301, for solving successively to the node on described node tree in parsing module 204
Analysis, obtain described XML document analysis result after, will be read described at least data line enter
Row release.
This device is used for executing preceding method embodiment, and, with to realize principle similar, here is or not its technique effect
Repeat again.
It should be understood that disclosed apparatus and method in several embodiments provided by the present invention,
Can realize by another way.For example, device embodiment described above is only schematically,
For example, the division of described unit, only a kind of division of logic function, in addition actual can have when realizing
Dividing mode, for example multiple units or assembly can in conjunction with or be desirably integrated into another system, or
Some features can be ignored, or does not execute.Another, shown or discussed coupling each other or
Direct-coupling or communication connection can be by some interfaces, the INDIRECT COUPLING of device or unit or communication link
Connect, can be electrical, mechanical or other forms.
The described unit illustrating as separating component can be or may not be physically separate, make
For the part that unit shows can be or may not be physical location, you can with positioned at a place,
Or can also be distributed on multiple NEs.Can select according to the actual needs part therein or
The whole unit of person is realizing the purpose of this embodiment scheme.
In addition, can be integrated in a processing unit in each functional unit in each embodiment of the present invention,
Can also be that unit is individually physically present it is also possible to two or more units are integrated in a list
In unit.Above-mentioned integrated unit both can be to be realized in the form of hardware, it would however also be possible to employ hardware adds software
The form of functional unit is realized.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can be stored in a computer
In read/write memory medium.Above-mentioned SFU software functional unit is stored in a storage medium, including some fingers
Order is with so that a computer equipment (can be personal computer, server, or network equipment etc.)
Or processor (English:Processor the part steps of each embodiment methods described of the present invention) are executed.
And aforesaid storage medium includes:USB flash disk, portable hard drive, read only memory (English:Read-Only
Memory, referred to as:ROM), random access memory (English:Random Access Memory,
Referred to as:RAM), magnetic disc or CD etc. are various can be with the medium of store program codes.
Finally it should be noted that:Various embodiments above is only in order to illustrating technical scheme rather than right
It limits;Although being described in detail to the present invention with reference to foregoing embodiments, this area common
Technical staff should be understood:It still can be modified to the technical scheme described in foregoing embodiments,
Or equivalent is carried out to wherein some or all of technical characteristic;And these modifications or replacement, and
Do not make the scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution.
Claims (8)
1. a kind of XML document analytic method is it is characterised in that include:
Obtain expandable mark language XML document and read instruction, described reading instruction includes at least one
Individual line identifier to be read;
According at least one line identifier to be read described, read out in XML document described at least one
The corresponding at least data line of line identifier to be read;
Described at least data line is converted to node tree, wherein, element in described at least data line
Become the node on described node tree with attribute;
Node on described node tree is parsed successively, obtains the analysis result of described XML document.
2. method according to claim 1 is it is characterised in that at least one is treated described in described basis
Read line identifier, XML document reads out at least one line identifier to be read described corresponding at least
Data line, including:
According at least one line identifier to be read described, from the beginning of the first row data of described XML document,
Travel through described XML document line by line, and read out from described XML document successively described at least one treat
Read the corresponding at least data line of line identifier.
3. method according to claim 1 it is characterised in that described to the section on described node tree
Point is parsed successively, obtains the analysis result of described XML document, including:
Travel through all nodes on described node tree, successively each described node is parsed, obtain described
The analysis result of XML document.
4. the method according to any one of claim 1-3 it is characterised in that described to described node
Node on tree is parsed successively, after obtaining the analysis result of described XML document, also includes:
Described in being read, at least data line is discharged.
5. a kind of XML document resolver is it is characterised in that include:
Acquisition module, reads instruction for obtaining expandable mark language XML document, described reading refers to
Order includes at least one line identifier to be read;
Read module, for according at least one line identifier to be read described, reading in XML document
Go out the described corresponding at least data line of at least one line identifier to be read;
Modular converter, for described at least data line is converted to node tree, wherein, described at least one
Element in row data and attribute become the node on described node tree;
Parsing module, for parsing successively to the node on described node tree, obtains described XML
The analysis result of document.
6. device according to claim 5 is it is characterised in that described read module, specifically for
According at least one line identifier to be read described, from the beginning of the first row data of described XML document, by
Row travels through described XML document, and read out from described XML document successively described at least one continue
Take the corresponding at least data line of line identifier.
7. device according to claim 5 is it is characterised in that described parsing module, specifically for
Travel through all nodes on described node tree, successively each described node is parsed, obtain described XML
The analysis result of document.
8. the device according to any one of claim 5-7 is it is characterised in that described device also includes:
Release module;
Described release module, for solving successively to the node on described node tree in described parsing module
Analysis, obtain described XML document analysis result after, will be read described at least data line enter
Row release.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510512712.6A CN106469137A (en) | 2015-08-19 | 2015-08-19 | XML document analysis method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510512712.6A CN106469137A (en) | 2015-08-19 | 2015-08-19 | XML document analysis method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106469137A true CN106469137A (en) | 2017-03-01 |
Family
ID=58228759
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510512712.6A Pending CN106469137A (en) | 2015-08-19 | 2015-08-19 | XML document analysis method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106469137A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255494A (en) * | 2018-01-30 | 2018-07-06 | 平安科技(深圳)有限公司 | A kind of XML file analytic method, device, computer equipment and storage medium |
CN110750960A (en) * | 2018-07-05 | 2020-02-04 | 武汉斗鱼网络科技有限公司 | Configuration file analysis method, storage medium, electronic device and system |
CN111651406A (en) * | 2020-05-21 | 2020-09-11 | 杭州明讯软件技术有限公司 | Automatic carrier scheduling system file reading method and device |
CN113128178A (en) * | 2019-12-31 | 2021-07-16 | 安徽佰通教育科技发展有限公司 | Method for analyzing office file through xml document |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101777045A (en) * | 2008-09-01 | 2010-07-14 | 西北工业大学 | Method for analyzing XML file by indexing |
CN102195959A (en) * | 2010-03-11 | 2011-09-21 | 中兴通讯股份有限公司 | Method and device for resolving extensible markup language (XML) data of session initiation protocol (SIP) signaling |
CN102411602A (en) * | 2011-08-15 | 2012-04-11 | 浙江大学 | Extensive makeup language (XML) parallel speculation analysis method realized on basis of field programmable gate array (FPGA) |
CN102841886A (en) * | 2011-06-21 | 2012-12-26 | 北大方正集团有限公司 | Method and device for splitting document |
CN103635897A (en) * | 2011-06-23 | 2014-03-12 | 微软公司 | Dynamically updating a running page |
CN104391796A (en) * | 2014-12-05 | 2015-03-04 | 上海斐讯数据通信技术有限公司 | Method for parsing test cases |
CN104636265A (en) * | 2015-01-21 | 2015-05-20 | 广东电网有限责任公司电力科学研究院 | Access method for efficient memory model organization of CIMXML document |
-
2015
- 2015-08-19 CN CN201510512712.6A patent/CN106469137A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101777045A (en) * | 2008-09-01 | 2010-07-14 | 西北工业大学 | Method for analyzing XML file by indexing |
CN102195959A (en) * | 2010-03-11 | 2011-09-21 | 中兴通讯股份有限公司 | Method and device for resolving extensible markup language (XML) data of session initiation protocol (SIP) signaling |
CN102841886A (en) * | 2011-06-21 | 2012-12-26 | 北大方正集团有限公司 | Method and device for splitting document |
CN103635897A (en) * | 2011-06-23 | 2014-03-12 | 微软公司 | Dynamically updating a running page |
CN102411602A (en) * | 2011-08-15 | 2012-04-11 | 浙江大学 | Extensive makeup language (XML) parallel speculation analysis method realized on basis of field programmable gate array (FPGA) |
CN104391796A (en) * | 2014-12-05 | 2015-03-04 | 上海斐讯数据通信技术有限公司 | Method for parsing test cases |
CN104636265A (en) * | 2015-01-21 | 2015-05-20 | 广东电网有限责任公司电力科学研究院 | Access method for efficient memory model organization of CIMXML document |
Non-Patent Citations (2)
Title |
---|
范书义 等: "XML文件解析中SAX和DOM的结合应用", 《微型电脑应用》 * |
达尔吉 等: "《无线传感器网络基础 理论和实践》", 31 January 2014 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108255494A (en) * | 2018-01-30 | 2018-07-06 | 平安科技(深圳)有限公司 | A kind of XML file analytic method, device, computer equipment and storage medium |
WO2019148671A1 (en) * | 2018-01-30 | 2019-08-08 | 平安科技(深圳)有限公司 | Xml file parsing method, device, computer apparatus, and storage medium |
CN110750960A (en) * | 2018-07-05 | 2020-02-04 | 武汉斗鱼网络科技有限公司 | Configuration file analysis method, storage medium, electronic device and system |
CN113128178A (en) * | 2019-12-31 | 2021-07-16 | 安徽佰通教育科技发展有限公司 | Method for analyzing office file through xml document |
CN111651406A (en) * | 2020-05-21 | 2020-09-11 | 杭州明讯软件技术有限公司 | Automatic carrier scheduling system file reading method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10664660B2 (en) | Method and device for extracting entity relation based on deep learning, and server | |
KR102170929B1 (en) | User keyword extraction device, method, and computer-readable storage medium | |
JP6936888B2 (en) | Training corpus generation methods, devices, equipment and storage media | |
US8375061B2 (en) | Graphical models for representing text documents for computer analysis | |
US9396172B2 (en) | Method for data chunk partitioning in XML parsing and method for XML parsing | |
CN106469137A (en) | XML document analysis method and device | |
KR101617696B1 (en) | Method and device for mining data regular expression | |
CN106682036A (en) | Data exchange system and exchange method thereof | |
US20200193083A1 (en) | Analyzing Document Content and Generating an Appendix | |
CN103995885A (en) | Method and device for recognizing entity names | |
CN110196884A (en) | Method for writing data, storage medium and electronic equipment based on distributed data base | |
CN102999480A (en) | Method and system for editing document | |
CN108021632A (en) | Unstructured data and the mutual conversion process method of structural data | |
CN110347390B (en) | Method, storage medium, equipment and system for rapidly generating WEB page | |
CN112528013A (en) | Text abstract extraction method and device, electronic equipment and storage medium | |
CN109901978A (en) | A kind of Hadoop log lossless compression method and system | |
CN106844313A (en) | A kind of method and apparatus that Word file is converted into html file | |
CN106293862B (en) | A kind of analysis method and device of expandable mark language XML data | |
CN110119410A (en) | Processing method and processing device, computer equipment and the storage medium of reference book data | |
CN105488171A (en) | SSH (Secure Shell)-based batch uploading method for test questions of online education website | |
CN104536947A (en) | Layout document processing method and device | |
KR101331383B1 (en) | Method and apparatus for processing data | |
CN103377187A (en) | Method, device and program for paragraph segmentation | |
CN109491679B (en) | CPLD online upgrading method and device | |
CN104268093A (en) | Memory allocation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170301 |
|
RJ01 | Rejection of invention patent application after publication |