CN100462973C - XML file preprocessing method, apparatus, file structure, reading method and device - Google Patents

XML file preprocessing method, apparatus, file structure, reading method and device Download PDF

Info

Publication number
CN100462973C
CN100462973C CNB200610145652XA CN200610145652A CN100462973C CN 100462973 C CN100462973 C CN 100462973C CN B200610145652X A CNB200610145652X A CN B200610145652XA CN 200610145652 A CN200610145652 A CN 200610145652A CN 100462973 C CN100462973 C CN 100462973C
Authority
CN
China
Prior art keywords
xml
parameter
xml file
file
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CNB200610145652XA
Other languages
Chinese (zh)
Other versions
CN1949225A (en
Inventor
林志贤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kingdee Software China Co Ltd
Original Assignee
Kingdee Software China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kingdee Software China Co Ltd filed Critical Kingdee Software China Co Ltd
Priority to CNB200610145652XA priority Critical patent/CN100462973C/en
Publication of CN1949225A publication Critical patent/CN1949225A/en
Application granted granted Critical
Publication of CN100462973C publication Critical patent/CN100462973C/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses XML preprocessing method and device. It includes the following steps: getting each XML node physical position and size parameter from the XML file; storing the parameter in the presetting data structure and storing it in the XML file; setting pointer pointed to the data structure in the presetting position of the XML file. The invention also discloses XML fetch method and device. The method includes the following steps: getting physical position and size parameter indicated for index cell; reading the index cell by the parameter indication; getting the physical position and size parameter indicated for each XML node; reading the XML node by the parameter indication. The invention also discloses XML file structure. It can avoid uneasy management caused by many index files and XML files while reading the many XML files.

Description

XML file preprocess method, device, read method and device
Technical field
The present invention relates to field of computer technology, more particularly, relate to XML file preprocess method, device, XML file structure, XML file read method and device.
Background technology
The traditional data file of robot calculator is mainly stored and is read in binary mode, the data file of this mode must be read and write and revise by specific program, therefore in data representation, extensibility, user management and Web application bigger difficulty is arranged.Along with development of computer, the application of Web more and more widely, because binary data file is replaced by a kind of new technology gradually in the mutual and weakness that shows of Web aspect---XML (Extensible MarkupLanguage, extend markup language).XML is a kind of unit mark language, the form that it provides a kind of structured data that can extensively carry out and be easy to dispose to describe, this form easy to understand and management, and structured data separates with data representation, therefore very flexible on using, and be easy to expansion, seamlessly the data in integrated numerous sources.Nowadays this technology has been applied to a plurality of fields such as high-level data library searching, Web bank, medicine, law and ecommerce, is just bringing into play great function.
The XML file is made up of the content-data in a series of mark and the mark, shown in example 1.
Example 1:
<document>
<section1>
<paragraph〉----node 1
Years are steamed grey hair,
</paragraph>
<paragraph〉----node 2
Double-edged sword is still bright.
</paragraph>
</section1>
<section2>
<paragraph〉----node 3
Warm blood is washed the battle field,
</paragraph>
<paragraph〉----node 4
Rivers are returned the native place.
</paragraph>
</section2>
</document>
Wherein, document, section and paragraph are user-defined marks, and node 1,2,3 and 4 is the content-datas in the mark.When this XML file content is read, need resolve described mark, there is the analytic technique of two kinds of XML file layouts often to be used: DOM (Document Object Model) and SAX (Simple API for XML); DOM is based on the analysis mode of object, when using DOM, data are read in the internal memory in the mode of tree structure, this analysis mode is lasting in internal memory, therefore can make amendment to its structure and content, can read the position by random jumping, in example 1, can arbitrarily read node 1,2, arbitrary node in 3 and 4, but this mode must be brought internal memory cost, as only wanting to read node 1,2, in 3 and 4 during arbitrary node, need be with node 1,2,3 and 4 all read into internal memory, when some mass files are read, may take a long time and set up a dom tree, efficient is not high, and expense is bigger.SAX is based on the resolving of incident, and it is that employing mode in proper order reads the XML file, just triggers the incident of a Start element when the mark of finding the XML file, and the content-data in the mark read, this processing does not need all the elements are read into internal memory, but can not read the position by random jumping, in example 1, if only want to read node 3, when using SAX, after node 1 and 2 need being read successively, could read by docking point 3, efficient is not high, and dirigibility is not enough.
Application number is the method that 200510132306.3 Chinese patent discloses a kind of XML document data access, and this method may further comprise the steps: whether the index file of judging the XML file of desiring access exists, and does not exist and sets up it earlier; The index file of XML file is read in internal memory; According to searching identification information in the predefined regular indexed file, and obtain the positional parameter of identification information; Positional parameter according to described identification information extracts corresponding data object or element from the XML file.This method to be setting up the mode of index file, realizes that efficient location rapidly wants data object or the element of operating, and accelerated the speed of large-scale XML file access data.But there is the problem of management inconvenience in this method, because work as to the necessary corresponding index file of an XML file, when a plurality of XML files are read, owing to there are a plurality of index files, can be loaded down with trivial details relatively in the management of file and have hidden danger, after writing the XML content and forming the XML file, need again XML is traveled through when setting up index file, cumbersome.
Summary of the invention
In view of this, the object of the present invention is to provide a kind of XML file preprocess method and device, and, make it possible to simply, efficiently large-scale XML file read, and be convenient to file management based on the XML file read method and the device of XML file preprocess method.
A kind of XML file of the present invention preprocess method is achieved in that
A kind of XML file preprocess method comprises step:
In the XML file, obtain the identification information of each XML node and indicate the physical location of each XML node and the parameter of size;
Described parameter is stored in the preset data structure, in this data structure, the key field that the identification information with the XML node is complementary is set with corresponding position, described parameter position; And, described data structure is arranged on the afterbody of described XML file;
A pointer that points to described data structure is set in the predefined position of described XML file.
Described method further comprises: described predefined position is the head of described XML file.
Described data structure is a string table.
The invention also discloses a kind of XML file pretreatment unit, described device comprises:
Acquiring unit is used for obtaining the identification information of each XML node and indicating each XML node physical location and the parameter of size at the XML file;
First dispensing unit is used for described parameter is stored into preset data structure, and described data structure is arranged in the described XML file;
Key word is provided with the unit, is used for the key field that corresponding position is provided with and described XML nodename is complementary in described data structure and described parameter position;
Second dispensing unit is used for being provided with in the predefined position of described XML file a pointer that points to described data structure;
The unit is set, be used for after described first dispensing unit has disposed described data structure, described data structure is arranged on the afterbody of XML file, and, after described second dispensing unit has disposed described pointer, described pointer is arranged on the head of described XML file.
Described data structure is a string table.
The present invention also further discloses a kind of XML file read method, and described method comprises step:
In described indexing units, search identification information correspondent keyword field with described XML node according to the predefine rule, obtain and be used to indicate the physical location of described XML node and the parameter of size; According to the indication of described parameter, read described indexing units, obtain the physical location of each XML node of indication and the parameter of size;
According to the indication of described parameter, read described XML node.
The identification information correspondent keyword field detailed process of searching in described indexing units with described XML node according to the predefine rule is: adopt the mode of string matching to search the character string that the identification information with described XML node is complementary in described indexing units.
Obtaining the physical location of indication index node and the parameter concrete grammar of size is:
Head or predeterminated position at described XML file obtain the physical location of indication indexing units and the parameter of size.
The present invention also discloses a kind of XML document reading apparatus, described device comprises:
Indicating member is used for searching identification information correspondent keyword field with described XML node according to the predefine rule in described indexing units, obtains to be used to indicate the physical location of described XML node and the parameter of size, and offers reading unit;
Reading unit is used for the reading pointer unit, and described indexing units and each XML node are read in the indication of the parameter that provides according to described indicating member.
Described indicating member comprises: first indicating member is used to reading unit that the parameter of described indexing units physical location of indication and size is provided; Second indicating member is used to described reading unit that the physical location of described each XML node of indication and the parameter of size are provided.
By technique scheme as can be known, compared with prior art, the present invention has following characteristics and advantage:
1, need not to increase independently index file, but index node is arranged in the XML file, can avoid when reading a plurality of XML file, the generation of the situation of inconvenience management owing to have a plurality of index files and a plurality of XML file;
2, after writing the XML content, can determine the size and the physical location of described XML content, index node is set, and place it in the end of XML file, and, be provided with one and preserve the physical location of the described data structure of indication and the pointer of the parameter of size, described pointer is placed on the XML top of file, form the XML file, avoided after writing XML content formation XML file, also need travel through generation to the content of XML file, reduce processing links, raise the efficiency with the situation that generates index file.
Description of drawings
Fig. 1 is the realization flow figure of a kind of XML file of the present invention preprocess method;
Fig. 2 is the process flow diagram of the embodiment one of a kind of XML file of the present invention preprocess method;
Fig. 3 is the structural representation of a kind of XML pretreatment unit of the present invention;
Fig. 4 is for being the structural representation of the embodiment one of a kind of XML file of the present invention pretreatment unit;
Fig. 5 is for being the structural drawing of a kind of XML file structure of the present invention;
Fig. 6 is the structural representation of the embodiment one of a kind of XML file structure of the present invention;
Fig. 7 is the realization flow figure of a kind of XML file of the present invention read method;
Fig. 8 is the process flow diagram of the embodiment one of a kind of XML file of the present invention read method;
Fig. 9 is the structural representation of a kind of XML document reading apparatus of the present invention.
Embodiment
The core concept of a kind of XML file of the present invention preprocess method is: obtain each XML node physical location and big or small parameter in the described XML file of indication in the XML file; Described parameter is stored in the preset data structure, and, described data structure is arranged in the described XML file; A pointer that points to described data structure is set in the predefined position of described XML file.
In order to enable those skilled in the art to fully understand technical scheme of the present invention, further describe in detail below in conjunction with specific embodiments and the drawings.
Please refer to Fig. 1, be the realization flow figure of a kind of XML file of the present invention preprocess method.
At first, enter step S101.
Step S101: in the XML file, obtain each XML node physical location and big or small parameter in the described XML file of indication.
Step S102: described parameter is stored in the preset data structure, and, described data structure is arranged in the described XML file.
Step S103 a: pointer that points to described data structure is set in the predefined position of described XML file.
Described pointer comprises the information of the physical location of the described data structure of indication in described XML file and size, described predefined position can be in any position in the described XML file, be preferably disposed on the head of file, be convenient to when reading file, read described data structure fast and accurately, certainly, also can be arranged on the position of distance X ML file first address fixed offset value, described data structure can be read according to described fixed offset value.
For when file content is read, further find corresponding object content quickly and accurately, with corresponding position, described parameter position key field information is set in described data structure, the identification information of described keyword message and described XML node is complementary.
Below by accompanying drawing 2 and a pair of technical scheme of the present invention of embodiment explanation is described further.
As shown in Figure 2, be the process flow diagram of the embodiment one of a kind of XML file of the present invention preprocess method.
Step S201: obtain the identification information of each XML node in the described XML file and the parameter of each XML node physical location of indication and size.
Described identification information generally refers to name information, as the section1 in the example 1, section2; Described parameter comprises that physical location value and storage space occupy value, and described physical location value refers to the off-set value of described data structure apart from the file first address; Described storage space occupies the size that value refers to described data structure.
Step S202: config string table.
According to described parameter and identification information config string table, include a plurality of character strings in the described string table, each character string is made up of the numerical value of key field, starting position value, expression size, as:
<index key=" section1 " start=" 3 " length=" 8 " 〉, wherein section1 represents in the described XML file that name is called the XML node of " section1 ", the off-set value of the first address of the starting position distance X ML file of the described XML node of start=" 3 " expression is 3, and the described XML size of node of length=" 8 " expression is 8.
A plurality of string assembles form string table, as:
<indexs〉--the gauge outfit of-expression string table.
<index?key=″section1″start=″3″length=″8″/>
<index?key=″section2″start=″11″length=″8″/>
……
</indexs〉--the table tail of-expression string table.
Step S203: described string table is arranged at the XML end of file.
Described string table is arranged on the end of described XML file, mainly be because when the XML content is write file, before the XML content not having be exported, can not determine how many XML contents has on earth, have only to the end and could determine, and the length of string table is determined by the XML content, just determine so the length of string table also will arrive the output file end, just string table is write end of file then, the position and the size of the string table of determining simultaneously to know clearly.
Step S204: the pointer of the described string table of configuration index.
Information such as title, physical location and size according to described string table, the pointer of the described string table of index is set, and described pointer is preserved the character string information that is complementary with described string table information, the parameter that is used to indicate described string table physical location and size.
As:<indexOfIndex start=" 20 " length=" 2 "/
Wherein, indexOfIndex is the title of the pointer of the described string table indexs of index, the starting position of start=" 20 " length=" 2 " expression string table indexs in described XML file is 20 for the off-set value of the described XML file first address of distance, and the size of described string table indexs is 2.
Step S205: the pointer of the described string table of index is arranged at the XML top of file.
Reserve a node space (being generally 32 long integers) at the file header of described XML file in advance and placed described pointer, when writing described string table indexs in end of file, just know the off-set value of the described XML file first address of string table indexs distance, thereby the pointer of the record size of described string table indexs and physical location is write the reserved location of file header, can not be modified to the position of the content of writing before like this.
Thus, having formed has the XML of index function file, shown in example 2:
Example 2
<indexOfIndex start=" 20 " length=" 2 "/〉---pointer
<document>
<section1〉------node 1
<paragraph11>
Years are steamed grey hair,
</paragraph11>
<paragraph12>
Double-edged sword is still bright.
</paragraph12>
</section1>
<section2〉-----node 2
<paragraph21>
Warm blood is washed the battle field,
</paragraph21>
<paragraph22>
Rivers are returned the native place.
</paragraph22>
</section2>
</document>
<indexs〉---string table
<index?key=″section1″start=″3″length=″8″/>
<index?key=″section2″start=″11″length=″8″/>
</indexs>
Please refer to shown in Figure 3ly, be the structural representation of a kind of XML file of the present invention pretreatment unit.
A kind of XML file pretreatment unit 100 comprises: acquiring unit 111 is used for obtaining each XML node physical location and big or small parameter in the described XML file of indication at the XML file; First dispensing unit 112 is used for described parameter is stored into preset data structure, and described data structure is arranged in the described XML file; Second dispensing unit 113 is used for being provided with in the predefined position of described XML file a pointer that points to described data structure.
The course of work of this pretreatment unit 100:
Acquiring unit 111 obtains the parameter of each XML node physical location and size in the described XML file of indication, and first dispensing unit 112 is stored described parameter in the preset data structure, and described data structure is arranged in the described XML file; When described data structure is set, described data structure in described XML file physical location and size also along with having determined, according to physical location and the size of described data structure in described XML file, described second dispensing unit 113 is provided with a pointer that points to described data structure in the predefined position of described XML file, described pointer has been stored and has been used for indicating described data structure in the physical location of described XML file and the parameter of size.
Described parameter comprises that physical location value and storage space occupy value, and described physical location value refers to the off-set value of described data structure apart from the file first address; Described storage space occupies the size that value refers to described data structure.
For when file content is read, further find corresponding object content quickly and accurately, described device can comprise that key word is provided with unit 114, be used in described data structure and corresponding position, described parameter position key field being set, the identification information of described key field and described XML node is complementary.
Described preset data structure can be a string table, includes a plurality of character strings in the described string table, and each character string is made up of the numerical value of key field, starting position value, expression size, as:
<index key=" section1 " start=" 3 " length=" 8 " 〉, wherein section1 represents in the XML file that name is called the XML node of " section1 ", the off-set value of the first address of the starting position distance described XML file of the described XML node of start=" 3 " expression in described XML file is 3, and the described XML size of node of length=" 8 " expression is 8.
A plurality of string assembles form string table, as:
<indexs〉--the gauge outfit of-expression string table.
<index?key=″section1″start=″3″length=″8″/>
<index?key=″section2″start=″11″length=″8″/>
  ……
</indexs〉--the table tail of-expression string table.
Described data structure can also be other forms, as adopting the multiple index table.
When the XML node that needs index is many, be fit to adopt the multiple index table, as:
Figure C200610145652D00131
In this concordance list, when needs read paragraph1, can be the reference position that document/section1/paragrahp1 obtains paragrahp1 by the path: the off-set value of the first address of distance X ML file be 20, and according to described reference position, reads described paragraph1.
As shown in Figure 4, be the structural representation of the embodiment one of a kind of XML file of the present invention pretreatment unit.
Described pretreatment unit 100 comprises that also a position is provided with unit 115, be used for after described first dispensing unit 112 has disposed described data structure, described data structure is arranged on the afterbody of XML file, and after described second dispensing unit 113 has disposed the pointer of the described data structure of index, described pointer is arranged on the head of described XML file.
Described data structure is arranged on the end of described XML file, mainly be because when the XML content is write file, before the XML content not having be exported, can not determine how many XML contents has on earth, have only to the end and could determine, and the length of data structure is determined by the XML content, just determine so the length of data structure also will arrive the output file end, just described data structure is write end of file then, determined the position of described data structure simultaneously.
Reserve a node space (being generally 32 long integers) at the file header of described XML file in advance and placed described pointer, when the off-set value of when end of file writes described data structure, just knowing the described XML file first address of data structure distance, thereby the pointer of record size of described data structure and physical location is write the reserved location of file header, can not be modified to the position of the content of writing before like this.
Please refer to Fig. 5, be the structural drawing of a kind of XML file structure of the present invention.
A kind of XML file is made up of indexing units 10, pointer unit 20 and at least one XML node 30.
Described indexing units 10 is arranged on first predeterminated position in the described XML file, is used to preserve the physical location of each XML node 30 of indication and the parameter of size; Described pointer unit 20 is arranged in the described XML file at second predeterminated position, is used to preserve the physical location of the described indexing units 10 of indication and the parameter of size; Described XML node is used to write down the XML content information.
Described indexing units 10 can be a string table.As:
<indexs〉--the gauge outfit of-expression string table.
<index?key=″section1″start=″3″length=″8″/>
<index?key=″section2″start=″11″length=″8″/>
……
</indexs〉--the table tail of-expression string table.
Described indexing units 10 can also be other forms, as adopting the multiple index table.
When the XML node that needs index is many, be fit to adopt the multiple index table, as:
Figure C200610145652D00151
In this concordance list, when needs read paragraph1, can be the reference position that document/section1/paragrahp1 obtains paragrahp1 by the path: the off-set value of the first address of distance X ML file be 20, and according to described reference position, reads described paragraph1.
Described first predeterminated position can be any position of XML file, the optimum position is the afterbody at the XML file, mainly be because when the XML content is write file, before the XML content not having be exported, can not determine how many XML contents has on earth, have only to the end and could determine, and the length of indexing units 10 is determined by the XML content, so also will arriving the output file end, the length of indexing units 10 just determines, directly indexing units 10 is write afterbody then, determine simultaneously the to know clearly position of indexing units 10, more convenient.
Described second predeterminated position can be any position of XML file, the optimum position is the head at the XML file, reserve a node space (being generally 32 long integers) at the file header of described XML file in advance and placed described pointer 20, when writing described indexing units 10 in end of file, just know the off-set value of the described XML file first address of indexing units 10 distances, thereby the pointer of record size of described indexing units 10 and physical location is write the reserved location of file header, can not be modified to the position of the content of writing before like this.
Certainly, also can be arranged on indexing units 10 position of XML file middle distance file first address fixed offset value as required, in addition, described indexing units 10 can also be arranged on the head of described XML file, pointer unit 20 is arranged on the head of described index node, as shown in Figure 6, be the structural representation of the embodiment one of a kind of XML file structure of the present invention.Wherein, pointer unit 20 is the parameter of described indexing units 10 sizes of storage indication only, when described XML file is read, read the pointer unit 20 of the head of the indexing units 10 that is arranged on top of file, obtain the parameter of described indexing units 10 sizes of indication indication, described indexing units can be read.
Please refer to Fig. 7, be the realization flow figure of a kind of XML file of the present invention read method.
Step S301: obtain the physical location of indication indexing units and the parameter of size.
Described parameter comprises that physical location value and storage space occupy value, and described physical location value refers to the off-set value of described data structure apart from the file first address; Described storage space occupies the size that value refers to described data structure.
Step S302: according to the indication of described parameter, read described indexing units, obtain the physical location of each XML node of indication and the parameter of size.
Step S303:, described XML node is read into internal memory according to the indication of the parameter in the described indexing units.
Below by embodiment a kind of XML file of the present invention read method is described further by explanation.
Please refer to Fig. 8, be the process flow diagram of the embodiment one of a kind of XML file of the present invention read method.
Step S401: reading pointer.
At the head of file or apart from the position reading pointer of file first address constant offset, obtain the parameter of the physical location of storage indication indexing units wherein in described XML file and size.
The form of described pointer can be:<indexOfIndexs start=" 20 " length=" 2 "/, wherein the reference position of the described indexing units of start=" 20 " length=" 2 " expression in described XML file is 20 position for the off-set value apart from the file first address, and the size of described indexing units is 2.
Step S402:, read indexing units according to the indication of described parameter.
Step S403: in described indexing units, search identification information correspondent keyword field with the XML node according to predefined rule, obtain and the corresponding locational parameter in described key field position.
Described predefined rule can be the rule of string matching, and owing to the XML file is described data by plain text, and string matching usually is applied to the form of plain text, therefore adopts the rule of string matching.Certainly, can be according to the constituted mode of indexing units, described predefine rule also can change thereupon, is not limited to the mode of string matching.
Described parameter can be arranged in the same delegation of key field, and be provided with described key field the adjacent position.
Described parameter promptly is to be used to indicate the physical location of described XML node and the information of size.As<index key=" section1 " start=" 3 " length=" 8 "/meaning of expression is: the XML nodename is section1, and initial physical location is 3 for the off-set value apart from the file first address, and size is 8.
Step S404:, read described XM L node according to the indication of described parameter.
As information, be 3 place value in the off-set value of distance file first address with node size be that node 3, that name is called section1 reads according to key=" section1 " start=" 3 " length=" 8 ".
Please refer to Fig. 9, be the structural representation of a kind of XML document reading apparatus of the present invention.
This XML document reading apparatus is primarily aimed at and advanced that pretreated XML file operates.
Described pre-service promptly is after the XML content is write file, obtains each XML node physical location and big or small parameter in the described XML file of indication in the XML file; Described parameter is stored in the preset data structure, and, described data structure is arranged in the described XML file; A pointer that points to described data structure is set in the predefined position of described XML file.
The pretreated XML file of described process promptly is the XML file with index function, and described file comprises indexing units, pointer unit and at least one XML node.
This XML document reading apparatus 200 comprises:
Indicating member 211 is used to reading unit 212 that the physical location of indexing units and each XML node and the parameter of size are provided; Reading unit 212 is used for the reading pointer unit, and described indexing units and each XML node are read in the indication of the parameter that provides according to described indicating member.
Described indicating member 211 comprises: first indicating member 2111 is used to reading unit 212 that the parameter of described indexing units physical location of indication and size is provided; Second indicating member 2112 is used to described reading unit 212 that the physical location and the size of described each XML node are provided.
Described parameter comprises that physical location value and storage space occupy value, and described physical location value refers to the off-set value of described data structure apart from the file first address; Described storage space occupies the size that value refers to described data structure.
The course of work of this XML document reading apparatus: after reading the pointer that is arranged on file header at reading unit 212, first indicating member 2111 picks out the physical location of indication indexing units and the parameter of size, and described parameter offered reading unit 212, after described reading unit 212 reads described indexing units according to the indication of described parameter, second indicating member 2112 picks out the physical location of indicating target XML node and the parameter of size according to the predefine rule in described indexing units, and described parameter offered reading unit 212, described reading unit 212 reads described target XML node according to the indication of described parameter.
Described target XML node promptly is to want certain XML node of reading, and it can be in a plurality of XML nodes in the described XML file any one.
Described predefined rule can be the rule of string matching, and owing to the XML file is described data by plain text, and string matching usually is applied to the form of plain text, therefore adopts the rule of string matching.
Certainly, can be according to the constituted mode of indexing units, described predefine rule also can change thereupon, is not limited to the mode of string matching.
XML file with index function is shown in example 3:
Example 3
<indexOfIndex?start=″20″length=″2″/>
<document>
<section1〉------node 1
<paragraph11>
Years are steamed grey hair,
</paragraph11>
<paragraph12>
Double-edged sword is still bright.
</paragraph12>
</section1>
<section2〉-----node 2
<paragraph21>
Warm blood is washed the battle field,
</paragraph21>
<paragraph22>
Rivers are returned the native place.
</paragraph22>
</section2>
</document>
<indexs>
<index?key=″section1″start=″3″length=″8″/>
<index?key=″section2″start=″11″length=″8″/>
</indexs>
When the XML file that reads shown in example 3, at first, reading unit 212 reads the header information that is arranged on described XML file, it is a pointer that first indicating member 2111 picks out the information that is read according to the key field indexOfIndex in the information, and the adjacent information of critical field: start=" 20 " length=" 2 " in the described pointer is offered reading unit 212, reading unit 212 reads described indexing units according to the indication of described information, after definite target XML node is section2, second indicating member 2112 is searched the key field that is complementary with section2: key=" section2 " in described indexing units, and the information start=that described key field is adjacent " 11 " length=" 8 " offer reading unit 212, and described reading unit 212 reads the section2 node according to described information.
Adopt the technical program to need not to increase independently index file, but will indicate the physical location of XML node and the parameter of size to store in the preset data structure, and described data structure is arranged in the XML file, can avoid when reading a plurality of XML file the generation of the situation of inconvenience management owing to have a plurality of index files and a plurality of XML file.
In addition, after writing the XML content, can determine the size and the physical location of described XML content, the parameter of indication physical location and size is set, and after being stored in described parameter in the default data structure, place it in the end of XML file, and, be provided with one and preserve the physical location of the described data structure of indication and the pointer of the parameter of size, described pointer is placed on the XML top of file, form the XML file, avoided after writing XML content formation XML file, also need travel through generation to the content of XML file, reduce the intermediate treatment link, raise the efficiency with the situation that generates index file.
More than disclosed only be preferred implementation of the present invention; but the present invention is not limited thereto; any those skilled in the art can think do not have a creationary variation, and, all should drop in protection scope of the present invention not breaking away from some improvements and modifications of being done under the principle of the invention prerequisite.

Claims (8)

1. an XML file preprocess method is characterized in that, comprises step:
In the XML file, obtain the identification information of each XML node and indicate the physical location of each XML node and the parameter of size;
Described parameter is stored in the preset data structure, in this data structure, the key field that the identification information with the XML node is complementary is set with corresponding position, described parameter position; And, described data structure is arranged on the afterbody of described XML file;
A pointer that points to described data structure is set in the predefined position of described XML file.
2. XML file preprocess method as claimed in claim 1 is characterized in that described predefined position is the head of described XML file; Described data structure is a string table.
3. XML file pretreatment unit is characterized in that described device comprises:
Acquiring unit is used for obtaining the identification information of each XML node and indicating each XML node physical location and the parameter of size at the XML file;
First dispensing unit is used for described parameter is stored into preset data structure, and described data structure is arranged in the described XML file;
Key word is provided with the unit, is used for the key field that corresponding position is provided with and described XML nodename is complementary in described data structure and described parameter position;
Second dispensing unit is used for being provided with in the predefined position of described XML file a pointer that points to described data structure;
The unit is set, be used for after described first dispensing unit has disposed described data structure, described data structure is arranged on the afterbody of XML file, and, after described second dispensing unit has disposed described pointer, described pointer is arranged on the head of described XML file.
4. an XML file read method is characterized in that, comprises step:
In described indexing units, search identification information correspondent keyword field with described XML node according to the predefine rule, obtain and be used to indicate the physical location of described XML node and the parameter of size;
According to the indication of described parameter, read described indexing units, obtain the physical location of each XML node of indication and the parameter of size;
According to the indication of described parameter, read described XML node.
5. XML file read method as claimed in claim 4, it is characterized in that the identification information correspondent keyword field detailed process of searching with described XML node according to the predefine rule is in described indexing units: adopt the mode of string matching in described indexing units, to search the character string that the identification information with described XML node is complementary.
6. XML file read method as claimed in claim 5, it is characterized in that the physical location and the big or small parameter concrete grammar that obtain the indication index node are: head or predeterminated position at described XML file obtain the physical location of indication indexing units and the parameter of size.
7. an XML document reading apparatus is characterized in that, comprising:
Indicating member is used for searching identification information correspondent keyword field with described XML node according to the predefine rule in described indexing units, obtains to be used to indicate the physical location of described XML node and the parameter of size, and offers reading unit;
Reading unit is used for the reading pointer unit, and described indexing units and each XML node are read in the indication of the parameter that provides according to described indicating member.
8. XML document reading apparatus as claimed in claim 7 is characterized in that, described indicating member comprises: first indicating member is used to reading unit that the parameter of described indexing units physical location of indication and size is provided; Second indicating member is used to described reading unit that the physical location of described each XML node of indication and the parameter of size are provided.
CNB200610145652XA 2006-11-23 2006-11-23 XML file preprocessing method, apparatus, file structure, reading method and device Active CN100462973C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB200610145652XA CN100462973C (en) 2006-11-23 2006-11-23 XML file preprocessing method, apparatus, file structure, reading method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB200610145652XA CN100462973C (en) 2006-11-23 2006-11-23 XML file preprocessing method, apparatus, file structure, reading method and device

Publications (2)

Publication Number Publication Date
CN1949225A CN1949225A (en) 2007-04-18
CN100462973C true CN100462973C (en) 2009-02-18

Family

ID=38018740

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB200610145652XA Active CN100462973C (en) 2006-11-23 2006-11-23 XML file preprocessing method, apparatus, file structure, reading method and device

Country Status (1)

Country Link
CN (1) CN100462973C (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100557611C (en) * 2007-11-15 2009-11-04 深圳华为通信技术有限公司 A kind of disposal route of file and device
CN101661481B (en) * 2008-08-29 2012-09-26 国际商业机器公司 XML data storing method, method and device thereof for executing XML query
CN101996251B (en) * 2010-11-17 2012-09-05 浙江省电力试验研究院 Rapid processing method of large SCL (substation configuration language) document
CN104036026B (en) * 2014-06-27 2018-02-23 吴涛军 Storage and location structure document choose the method and system of content
CN107885492B (en) * 2017-11-14 2021-03-12 中国银行股份有限公司 Method and device for dynamically generating data structure in host
CN113505269B (en) * 2021-07-02 2024-03-29 卡斯柯信号(成都)有限公司 Binary file detection method and device based on XML

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020046205A1 (en) * 2000-10-13 2002-04-18 Neocore Inc. Method of operating a hierarchical data document system having a duplicate tree structure
JP2005018811A (en) * 2000-10-25 2005-01-20 Matsushita Electric Ind Co Ltd Character string retrieval device
CN1790335A (en) * 2005-12-19 2006-06-21 无锡永中科技有限公司 XML file data access method
CN1825306A (en) * 2005-10-31 2006-08-30 北京神舟航天软件技术有限公司 XML data storage and access method based on relational database
CN1831828A (en) * 2006-04-10 2006-09-13 无锡永中科技有限公司 Method for saving XML file

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020046205A1 (en) * 2000-10-13 2002-04-18 Neocore Inc. Method of operating a hierarchical data document system having a duplicate tree structure
JP2005018811A (en) * 2000-10-25 2005-01-20 Matsushita Electric Ind Co Ltd Character string retrieval device
CN1825306A (en) * 2005-10-31 2006-08-30 北京神舟航天软件技术有限公司 XML data storage and access method based on relational database
CN1790335A (en) * 2005-12-19 2006-06-21 无锡永中科技有限公司 XML file data access method
CN1831828A (en) * 2006-04-10 2006-09-13 无锡永中科技有限公司 Method for saving XML file

Also Published As

Publication number Publication date
CN1949225A (en) 2007-04-18

Similar Documents

Publication Publication Date Title
CN100462973C (en) XML file preprocessing method, apparatus, file structure, reading method and device
US5978801A (en) Character and/or character-string retrieving method and storage medium for use for this method
JP3178475B2 (en) Data processing device
JP4438448B2 (en) Structured document display processing device, structured document display method, structured document display program
CN103365992B (en) Method for realizing dictionary search of Trie tree based on one-dimensional linear space
CN101145157B (en) XML format embedded type apparatus characteristic information analysis method
CN102768681A (en) Recommending system and method used for search input
US8140533B1 (en) Harvesting relational tables from lists on the web
US11222067B2 (en) Multi-index method and apparatus, cloud system and computer-readable storage medium
CN104794177A (en) Data storing method and device
CN100371936C (en) Data search method for tree-type structural file
CN105653697B (en) Recommended word retrieval method and system
Tan et al. Microsearch: When search engines meet small devices
CN105550354A (en) Configuration file management method and system
CN103309879A (en) Method and device for managing marks in WORD document
CN104516920A (en) Data inquiry method and data inquiry system
CN103914483A (en) File storage method and device and file reading method and device
CN109299152B (en) Suffix array indexing method and device for real-time data stream
CN101430685B (en) Downloading method and system
CN102959548A (en) Data storage method, search method and device
CN115525580A (en) Namespace setting method and device and readable storage medium
US6470334B1 (en) Document retrieval apparatus
CN100334582C (en) Method and apparatus for storing and searching data in hand-held device
US20130159315A1 (en) Methods for prefix indexing
CN103020186A (en) File searching method, device and equipment based on embedded device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant