CN106855866A - XML document storage method and device - Google Patents

XML document storage method and device Download PDF

Info

Publication number
CN106855866A
CN106855866A CN201510906388.6A CN201510906388A CN106855866A CN 106855866 A CN106855866 A CN 106855866A CN 201510906388 A CN201510906388 A CN 201510906388A CN 106855866 A CN106855866 A CN 106855866A
Authority
CN
China
Prior art keywords
fragment
xml document
target
decomposed
searched
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510906388.6A
Other languages
Chinese (zh)
Inventor
刘雨洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD
Peking University Founder Information Industry Group Co Ltd
Peking University Founder Group Co Ltd
Original Assignee
FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD
Peking University Founder Information Industry Group Co Ltd
Peking University Founder Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD, Peking University Founder Information Industry Group Co Ltd, Peking University Founder Group Co Ltd filed Critical FOUNDER DIGITAL PUBLISHING TECHNOLOGY (SHANGHAI) CO LTD
Priority to CN201510906388.6A priority Critical patent/CN106855866A/en
Publication of CN106855866A publication Critical patent/CN106855866A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Abstract

The embodiment of the present invention provides a kind of XML document storage method and device.The method includes:XML document is decomposed into multiple fragments, the fragment includes multiple nodes, and the beginning label and end mark of each node are in same fragment;The multiple fragment is stored respectively, so that each fragment is used as a memory space;Target fragment is searched from the multiple fragment, and destination node is searched from the target fragment.The embodiment of the present invention is decomposed into multiple fragments by by XML document, each fragment includes multiple nodes, and with fragment be inquiry unit, search the destination node in the fragment, lookup speed is improve compared to whole XML document is searched in the whole text, stored in the form of several nodes compared to XML document, saved substantial amounts of memory space.

Description

XML document storage method and device
Technical field
The present embodiments relate to field of computer technology, more particularly to a kind of XML document storage method and Device.
Background technology
Current XML document is stored with Oracle in the form of binary data, and Postgresql is representative Traditional Relational DataBase in, in addition, XML document be broken down into node storage with Founder XML DB, berkeley XML DB be representative exclusively for XML exploitation database in.
In oracle database, XML document is stored with integral form, searches a certain section in XML document Need to search whole XML document in the whole text when point, cause lookup speed low;In Founder XML DB In database, XML document is stored in the form of several nodes, causes the substantial amounts of storage of database Space is occupied.
In the prior art, speed is low, take a large amount of memory spaces in the presence of searching for the storage form of XML document Problem.
The content of the invention
The embodiment of the present invention provides a kind of XML document storage method and device, to improve looking into for XML document Speed is looked for, memory space is saved.
The one side of the embodiment of the present invention is to provide a kind of XML document storage method, including:
XML document is decomposed into multiple fragments, the fragment includes multiple nodes, the beginning of each node Mark and end mark are in same fragment;
The multiple fragment is stored respectively, so that each fragment is used as a memory space;
Target fragment is searched from the multiple fragment, and destination node is searched from the target fragment.
The other side of the embodiment of the present invention is to provide a kind of XML document storage device, including:
Decomposing module, for XML document to be decomposed into multiple fragments, the fragment includes multiple nodes, The beginning label and end mark of each node are in same fragment;
Memory module, for being stored respectively to the multiple fragment, so that each fragment is used as one Memory space;
Searching modul, for searching target fragment from the multiple fragment, and from the target fragment Search destination node.
XML document storage method provided in an embodiment of the present invention and device, are decomposed into by by XML document Multiple fragments, each fragment includes multiple nodes, and with fragment to inquire about unit, in searching the fragment Destination node, improves lookup speed, compared to XML document compared to whole XML document is searched in the whole text Stored in the form of several nodes, saved substantial amounts of memory space.
Brief description of the drawings
Fig. 1 is XML document storage method flow chart provided in an embodiment of the present invention;
Fig. 2 is the structure chart of XML document storage device provided in an embodiment of the present invention;
The structure chart of the XML document storage device that Fig. 3 is provided for another embodiment of the present invention.
Specific embodiment
Fig. 1 is XML document storage method flow chart provided in an embodiment of the present invention.Embodiment of the present invention pin There is a problem of searching that speed is low, take a large amount of memory spaces to the storage form of XML document, there is provided XML document storage method, the method is comprised the following steps that:
Step S101, XML document is decomposed into multiple fragments, the fragment includes multiple nodes, each The beginning label and end mark of node are in same fragment;
In embodiments of the present invention, whole XML document is decomposed into multiple fragments, each fragment includes many Individual node, the beginning label and end mark of each node can not be divided into two different fragments, example As an XML document includes<bid_tuple>……</bid_tuple>, then<bid_tuple>Need In one fragment,</bid_tuple>It is also required in a fragment, it is impossible to occur<Bid_ is in a fragment In, tuple>In another fragment.
Step S102, the multiple fragment is stored respectively, so that each fragment is used as a storage Space;
The multiple fragment is stored respectively by memory cell of each fragment, so that each fragment is made It is a memory space.
Step S103, target fragment is searched from the multiple fragment, and searched from the target fragment Destination node.
After XML document is decomposed into multiple fragments, an identification number is distributed to each fragment, according to mark Number corresponding fragment can be found, specifically, by the burst information storage of XML document to burst table, Record has multiple entries i.e. in burst table, each entry include the identification number of fragment, the size of fragment, The primary sign and terminating symbol of fragment;Mesh can be searched according to the identification number of target fragment from burst table Tap section, the destination node that can obtain in the target fragment is parsed to target fragment.
The embodiment of the present invention is decomposed into multiple fragments by by XML document, and each fragment includes multiple nodes, And with fragment be inquiry unit, the destination node in the fragment is searched, compared to searching whole XML in the whole text Document improves lookup speed, is stored in the form of several nodes compared to XML document, saves Substantial amounts of memory space.
It is described that XML document is decomposed into multiple fragments on the basis of above-described embodiment, including:Foundation The XML document is equally divided into multiple pieces by the size of the XML document and the clip size of user's setting Section.
In embodiments of the present invention, the clip size determination that the size of each fragment sets according to user, it is right The piece that the number of the fragment that XML document is obtained after being decomposed sets according to the size of XML document and user Duan great little determines, specifically, the number of fragment is equal to the fragment that the size of XML document sets divided by user Size.In addition, described be decomposed into after multiple fragments XML document, also include:Update the fragment Content.
In embodiments of the present invention, to update the content of XML document, where only need to updating the content Fragment, and whole XML document need not be updated.
It is described that XML document is decomposed into after multiple fragments, also include:It is every in for the multiple fragment Individual fragment sets an identification number.
XML document is decomposed into after multiple fragments, is that described each fragment sets an identification number, i.e., Each fragment one identification number of unique correspondence.
It is described that target fragment is searched from the multiple fragment, and target section is searched from the target fragment Point includes:According to the corresponding identification number of the target fragment target fragment is searched from the multiple fragment; Parsing is carried out to the target fragment and obtains the destination node.
After XML document is decomposed into multiple fragments, an identification number is distributed to each fragment, according to mark Number corresponding fragment can be found, specifically, by the burst information storage of XML document to burst table, Record has multiple entries i.e. in burst table, each entry include the identification number of fragment, the size of fragment, The primary sign and terminating symbol of fragment;Mesh can be searched according to the identification number of target fragment from burst table Tap section, the destination node that can obtain in the target fragment is parsed to target fragment.
The embodiment of the present invention avoids updating whole XML document by the content of more new segment, improves to whole The renewal operation of individual XML document, an identification number is set by for each fragment, is searched identification number and is indicated Fragment in destination node, improve search efficiency.
Fig. 2 is the structure chart of XML document storage device provided in an embodiment of the present invention.The embodiment of the present invention The XML document storage device of offer can perform the handling process of XML document storage method embodiment offer, As shown in Fig. 2 XML document storage device 20 includes decomposing module 21, memory module 22 and searches mould Block 23, wherein, decomposing module 21 is used to for XML document to be decomposed into multiple fragments, and the fragment includes Multiple nodes, the beginning label and end mark of each node are in same fragment;Memory module 22 is used for The multiple fragment is stored respectively, so that each fragment is used as a memory space;Searching modul 23 are used to search target fragment from the multiple fragment, and search destination node from the target fragment.
The embodiment of the present invention is decomposed into multiple fragments by by XML document, and each fragment includes multiple nodes, And with fragment be inquiry unit, the destination node in the fragment is searched, compared to searching whole XML in the whole text Document improves lookup speed, is stored in the form of several nodes compared to XML document, saves Substantial amounts of memory space.
The structure chart of the XML document storage device that Fig. 3 is provided for another embodiment of the present invention.In above-mentioned reality Apply on the basis of example, decomposing module 21 sets specifically for the size according to the XML document and user The XML document is equally divided into multiple fragments by clip size.
XML document storage device 20 also includes update module 24, and update module 24 is used to update described The content of section.
XML document storage device 20 also includes mark module 25, and mark module 25 is used to be the multiple Each fragment in fragment sets an identification number.
Searching modul 23 is specifically for according to the corresponding identification number of the target fragment from the multiple fragment Middle lookup target fragment;Parsing is carried out to the target fragment and obtains the destination node.
XML document storage device provided in an embodiment of the present invention can be carried specifically for performing above-mentioned Fig. 1 The embodiment of the method for confession, here is omitted for concrete function.
The embodiment of the present invention avoids updating whole XML document by the content of more new segment, improves to whole The renewal operation of individual XML document, an identification number is set by for each fragment, is searched identification number and is indicated Fragment in destination node, improve search efficiency.
In sum, the embodiment of the present invention is decomposed into multiple fragments by by XML document, each fragment bag Multiple nodes are included, and is inquiry unit with fragment, the destination node in the fragment is searched, compared in the whole text Search whole XML document and improve lookup speed, entered in the form of several nodes compared to XML document Row storage, saves substantial amounts of memory space;Avoid updating whole XML texts by the content of more new segment Shelves, improve the renewal operation to whole XML document, and an identification number is set by for each fragment, The destination node in the fragment that identification number is indicated is searched, search efficiency is improve.
In several embodiments provided by the present invention, it should be understood that disclosed apparatus and method, Can realize by another way.For example, device embodiment described above is only schematical, For example, the division of the unit, only a kind of division of logic function, can have in addition when actually realizing Dividing mode, such as multiple units or component can combine or be desirably integrated into another system, or Some features can be ignored, or not perform.It is another, shown or discussed coupling each other or Direct-coupling or communication connection can be the INDIRECT COUPLING or communication link of device or unit by some interfaces Connect, can be electrical, mechanical or other forms.
It is described as separating component illustrate unit can be or may not be it is physically separate, make For the part that unit shows can be or may not be physical location, you can with positioned at a place, Or can also be distributed on multiple NEs.Can select according to the actual needs part therein or Person whole units realize the purpose of this embodiment scheme.
In addition, during each functional unit in each embodiment of the invention can be integrated in a processing unit, Can also be that unit is individually physically present, it is also possible to which two or more units are integrated in a list In unit.Above-mentioned integrated unit can both be realized in the form of hardware, it would however also be possible to employ hardware adds software The form of functional unit is realized.
The above-mentioned integrated unit realized in the form of SFU software functional unit, can store in a computer In read/write memory medium.Above-mentioned SFU software functional unit storage is in a storage medium, including some fingers Order is used to so that a computer equipment (can be personal computer, server, or network equipment etc.) Or processor (processor) performs the part steps of each embodiment methods described of the invention.And it is foregoing Storage medium include:USB flash disk, mobile hard disk, read-only storage (Read-Only Memory, ROM), Random access memory (Random Access Memory, RAM), magnetic disc or CD etc. are various can be with The medium of store program codes.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each The division of functional module is carried out for example, in practical application, as needed can divide above-mentioned functions With being completed by different functional module, will the internal structure of device be divided into different functional modules, with Complete all or part of function described above.The specific work process of the device of foregoing description, can be with With reference to the corresponding process in preceding method embodiment, will not be repeated here.
Finally it should be noted that:Various embodiments above is merely illustrative of the technical solution of the present invention, rather than right Its limitation;Although being described in detail to the present invention with reference to foregoing embodiments, this area it is common Technical staff should be understood:It can still modify to the technical scheme described in foregoing embodiments, Or equivalent is carried out to which part or all technical characteristic;And these modifications or replacement, and The scope of the essence disengaging various embodiments of the present invention technical scheme of appropriate technical solution is not made.

Claims (10)

1. a kind of XML document storage method, it is characterised in that including:
XML document is decomposed into multiple fragments, the fragment includes multiple nodes, the beginning of each node Mark and end mark are in same fragment;
The multiple fragment is stored respectively, so that each fragment is used as a memory space;
Target fragment is searched from the multiple fragment, and destination node is searched from the target fragment.
2. method according to claim 1, it is characterised in that it is described XML document is decomposed into it is many Individual fragment, including:
The clip size set according to the size of the XML document and user is by the XML document average mark It is multiple fragments.
3. method according to claim 2, it is characterised in that it is described XML document is decomposed into it is many After individual fragment, also include:
Update the content of the fragment.
4. method according to claim 3, it is characterised in that it is described XML document is decomposed into it is many After individual fragment, also include:
For each fragment in the multiple fragment sets an identification number.
5. method according to claim 4, it is characterised in that described to be looked into from the multiple fragment Looking for target fragment, and destination node is searched from the target fragment includes:
According to the corresponding identification number of the target fragment target fragment is searched from the multiple fragment;
Parsing is carried out to the target fragment and obtains the destination node.
6. a kind of XML document storage device, it is characterised in that including:
Decomposing module, for XML document to be decomposed into multiple fragments, the fragment includes multiple nodes, The beginning label and end mark of each node are in same fragment;
Memory module, for being stored respectively to the multiple fragment, so that each fragment is used as one Memory space;
Searching modul, for searching target fragment from the multiple fragment, and from the target fragment Search destination node.
7. XML document storage device according to claim 6, it is characterised in that the decomposition mould The clip size that block is set specifically for the size according to the XML document and user is by the XML document It is equally divided into multiple fragments.
8. XML document storage device according to claim 7, it is characterised in that also include:
Update module, the content for updating the fragment.
9. XML document storage device according to claim 8, it is characterised in that also include:
Mark module, for setting an identification number for each fragment in the multiple fragment.
10. XML document storage device according to claim 9, it is characterised in that the lookup Module from the multiple fragment according to the corresponding identification number of the target fragment specifically for searching target patch Section;Parsing is carried out to the target fragment and obtains the destination node.
CN201510906388.6A 2015-12-09 2015-12-09 XML document storage method and device Pending CN106855866A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510906388.6A CN106855866A (en) 2015-12-09 2015-12-09 XML document storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510906388.6A CN106855866A (en) 2015-12-09 2015-12-09 XML document storage method and device

Publications (1)

Publication Number Publication Date
CN106855866A true CN106855866A (en) 2017-06-16

Family

ID=59132058

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510906388.6A Pending CN106855866A (en) 2015-12-09 2015-12-09 XML document storage method and device

Country Status (1)

Country Link
CN (1) CN106855866A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918369A (en) * 2017-12-13 2019-06-21 中兴通讯股份有限公司 Date storage method and device
CN110297947A (en) * 2019-05-17 2019-10-01 深圳市元征科技股份有限公司 A kind of data calling method, device and electronic equipment
CN111563065A (en) * 2020-07-09 2020-08-21 北京联想协同科技有限公司 Document storage method and device and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627297A (en) * 2003-12-13 2005-06-15 三星电子株式会社 Method and apparatus for managing data written in markup language
CN101196916A (en) * 2007-12-27 2008-06-11 腾讯科技(深圳)有限公司 Method and device for fragment storage file
CN101369268A (en) * 2007-08-15 2009-02-18 北京书生国际信息技术有限公司 Storage method for document data in document warehouse system
CN102325161A (en) * 2011-07-18 2012-01-18 北京航空航天大学 A kind of XML sharding method based on the estimation of query amount

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1627297A (en) * 2003-12-13 2005-06-15 三星电子株式会社 Method and apparatus for managing data written in markup language
CN101369268A (en) * 2007-08-15 2009-02-18 北京书生国际信息技术有限公司 Storage method for document data in document warehouse system
CN101196916A (en) * 2007-12-27 2008-06-11 腾讯科技(深圳)有限公司 Method and device for fragment storage file
CN102325161A (en) * 2011-07-18 2012-01-18 北京航空航天大学 A kind of XML sharding method based on the estimation of query amount

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109918369A (en) * 2017-12-13 2019-06-21 中兴通讯股份有限公司 Date storage method and device
CN109918369B (en) * 2017-12-13 2024-01-23 金篆信科有限责任公司 Data storage method and device
CN110297947A (en) * 2019-05-17 2019-10-01 深圳市元征科技股份有限公司 A kind of data calling method, device and electronic equipment
CN111563065A (en) * 2020-07-09 2020-08-21 北京联想协同科技有限公司 Document storage method and device and computer readable storage medium
CN111563065B (en) * 2020-07-09 2020-12-11 北京联想协同科技有限公司 Document storage method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
US11132346B2 (en) Information processing method and apparatus
WO2020140386A1 (en) Textcnn-based knowledge extraction method and apparatus, and computer device and storage medium
CN108255958B (en) Data query method, device and storage medium
CN102402605B (en) Mixed distribution model for search engine indexing
EP2924594B1 (en) Data encoding and corresponding data structure in a column-store database
CN108875064B (en) OpenFlow multidimensional data matching search method based on FPGA
CN104794123A (en) Method and device for establishing NoSQL database index for semi-structured data
US9852453B2 (en) High-throughput message generation
CN102024046B (en) Data repeatability checking method and device as well as system
CN104866502A (en) Data matching method and device
CN103559301A (en) Method of data update, database trigger and SE (search engine)
CN106156070A (en) A kind of querying method, Piece file mergence method and relevant apparatus
CN105589894B (en) Document index establishing method and device and document retrieval method and device
CN103714086A (en) Method and device used for generating non-relational data base module
CN104021123A (en) Method and system for data transfer
US20220005546A1 (en) Non-redundant gene set clustering method and system, and electronic device
CN106855866A (en) XML document storage method and device
CN102867049A (en) Chinese PINYIN quick word segmentation method based on word search tree
CN106649368A (en) Data storage method and device and data query method and device
CN102725754B (en) Method and device for processing index data
CN102682112B (en) Storage method and device
CN107169115A (en) Add the method and device of self-defined participle
CN107977381B (en) Data configuration method, index management method, related device and computing equipment
CN107943981A (en) HBase rows paging method, server and computer-readable recording medium
US11250064B2 (en) System and method for generating filters for K-mismatch search

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170616