CN100347704C - Converting method for processor of spatial information issuing forms - Google Patents

Converting method for processor of spatial information issuing forms Download PDF

Info

Publication number
CN100347704C
CN100347704C CNB2004100611853A CN200410061185A CN100347704C CN 100347704 C CN100347704 C CN 100347704C CN B2004100611853 A CNB2004100611853 A CN B2004100611853A CN 200410061185 A CN200410061185 A CN 200410061185A CN 100347704 C CN100347704 C CN 100347704C
Authority
CN
China
Prior art keywords
style sheet
document
processor
dom tree
gml
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CNB2004100611853A
Other languages
Chinese (zh)
Other versions
CN1614592A (en
Inventor
关佶红
周水庚
边馥苓
陈俊鹏
张俊
张建华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CNB2004100611853A priority Critical patent/CN100347704C/en
Publication of CN1614592A publication Critical patent/CN1614592A/en
Application granted granted Critical
Publication of CN100347704C publication Critical patent/CN100347704C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The present invention relates to a new converting method of a processor for a spatial information issuing style sheet, which comprises the following procedures: a GML style sheet file is resolved in a DOM mode; a GML source file is resolved in an SAX mode, and the style sheet is divided into positioning information and structure information to generate two style sheet trees. When resolving source documents, an SAX resolver goes once over the two style sheet trees in each time when encountering an element mark; if the element is the required element, data is extracted; if the element is not the required element, the data is ignored. The present invention is characterized in that the present invention can rapidly realize the conversion among XML documents and can process the conversion from GML spatial information with large data quantity to SVG to issue the spatial information.

Description

A kind of conversion method of new space information promulgation style sheet processor
Technical field
The spatial information that the invention belongs in the infotech obtains and processing technology field, particularly relates to a kind of conversion method of new space information promulgation style sheet processor.
Background technology
Geography information SGML GML (Geography Markup Language) document is as expandable mark language XML (the eXtensible Markup Language) document that comprises geography information, be used for being encoded in space and non-spatial information, can be used for carrying out integrated the isomery spatial data.But itself does not design GML for demonstration, and the demonstration of geodata is a very important part in the space information promulgation, and therefore, the GML document need be converted to displayable form and issue.SVG (Scalable Vector Graphics) the vector format standard that we select W3C to recommend.SVG itself also is based on XML's, the GML document is transformed into the XML document that the SVG form comes down to a kind of XML document of form is converted to another kind of form, can pass through style sheet XSLT (XML style sheet converter) and realize, pass to by the style sheet (XSLT) that will write and GML source document that the execution of XSL processor realizes.
We (organize as W3C from some professional technique websites relevant with XML, XSL, XSLT Http:// www.w3c.org/Style/XSL/, Http:// www.w3c.org/DOM/, Http:// www.w3c.org/XML/, IBM Corporation Http:// www-900.ibm.com/developerWorks/cn/xml/index.shtmlDeng) can see, when making the XML format conversion, at first to resolve two input files respectively: source document, style sheet file.The method that traditional xslt processor is handled is to be that dom tree leaves in the internal memory with these two file conversions, the size of the dom tree in the internal memory generally reach 10 times of the raw data size or more than, this is little to small data quantity XML data influence, all little as the style sheet file, be no more than hundreds of K byte, it is converted into dom tree does not take too many internal memory and time, so be not the bottleneck of efficient; And the GML data source document may be very big, may tens even up to a hundred million, if transfer dom tree to, the system resource that takies is too many, is easy to cause internal memory to overflow.Generate the shared cpu resource of dom tree if add, the user is difficult to receive such conversion efficiency, the bottleneck of efficient when being the conversion of XML file.Therefore, when excessive as if source document, adopting the DOM mode is not a good selection.
Traditional method does not directly adopt XSLT to handle to big file (as tens or up to a hundred millions') processing, and adopts other two kinds of methods: a kind of method is that data are divided into piece, changes each piece respectively by XSLT, at last result block is merged.Do programming like this than being easier to, applicability is wider.But if data are excessive, piecemeal is too much, and the time that this method consumes is oversize, for big data quantity GML data processing poor efficiency still.Another kind method is to use the application program of SAX interface to realize XSLT style sheet identical functions by writing one.It moves speed can be than the former fast a lot of times, and efficient is very high.But the programming based on the SAX interface is very complicated, and extendability is relatively poor.
Xslt processor commonly used based on above-mentioned technical method mainly is at present: the Xalan-java processor of Apache company ( Http:// xml.apache.org/index.html#xalan, nearest October 25 2004 access time) and the Saxon processor of Michael H.Kay ( Http:// www.saxonica.com/, nearest October 25 2004 access time).But relevant test data of experiment shows that data volume such as Xalan that these xslt processors can be handled can't handle for 20MB and above GML document, Saxon can't handle for 40MB and above GML document, be that the GML document is excessive, produce internal memory in the processor transfer process and overflow, cause convert failed.
So how to design the xslt processor that efficiently to handle big document, become efficiently with the key of GML to the SVG conversion.
Summary of the invention
At the problems referred to above, the invention provides a kind of xslt processor that does not need source document is converted to a dom tree, having avoided generating when dom tree consumed null resource and internal memory overflows, improved the conversion efficiency of big data quantity XML data greatly, and applicability is than the conversion method of extensively new space information promulgation style sheet processor.
In order to address the above problem, the conversion method of processor of the present invention is: for the conversion of GML document to the SVG form, resolve the style sheet file in the DOM mode respectively, resolve source document in the SAX mode.Above-mentioned style sheet is divided into locating information and structural information, generate two style sheet trees, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever, if the element that needs just comes out data extract, just ignore as not needing.
Above-mentioned concrete conversion method is:
First step: style sheet according to the definition burst, and is numbered in order;
Second step: burst is divided into two classes by definition, and a class is first's (containing locating information), and a class is second portion (structural information that contains output XML file);
Third step: will belong to each sheet in the burst of first, and read in the document flow, the corresponding document flow of each sheet produces a document flow array;
The 4th step: will belong to each sheet in the burst of second portion, and generate a dom tree, and produce a dom tree array, and be that each dom tree distributes a document flow, and produce a document flow array;
The 5th step: when SAX resolver resolves source document, just each dom tree in the dom tree array is traveled through one time to a rubidium marking whenever,, and put into corresponding dom tree corresponding file stream if the element that needs just comes out data extract;
The 6th step: after SAX resolver resolves source document finishes, first's corresponding file fluxion group and second portion corresponding file fluxion group are merged in proper order by the burst in the first step, the document flow output that generates at last is exactly result document.
Characteristics of the present invention are: resolve the style sheet file in the DOM mode, resolve source document in the SAX mode.And style sheet is divided into locating information and structural information, generate two style sheet trees, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever, if the element that needs just comes out data extract, just ignore as not needing.Transfer algorithm can be realized the conversion between XML document apace, and can handle the conversion of the GML spatial information of big data quantity to SVG.
Description of drawings
Fig. 1 is the time efficiency comparison diagram of the present invention and two common processor.
Fig. 2 is the space efficiency comparison diagram of the present invention and two common processor.
Embodiment
Further specify the present invention below in conjunction with accompanying drawing.
The present invention resolves to a dom tree with the style sheet file in the DOM mode to deposit in the internal memory, resolves source document in the SAX mode.Just the style sheet tree is traveled through one time to a rubidium marking whenever,, just ignore as not needing if the element that needs just comes out data extract.
Because in the ordinary course of things, the style sheet file is very little, has only several KB or tens KB, and the dom tree of generation is just very little, and it is very high to scan XML file efficient in the SAX mode, so processor processing speed of the present invention is faster than general xslt processor speed.Especially when the source document data volume is very big, the general time that xslt processor consumed is to rise with exponential speed, and processor of the present invention is under the situation that style sheet is determined, the time that is consumed rises with linear speed, and this is because the SAX mode scans the used time of XML file along with the increase of XML file size increases with linear speed.So the source document data volume is big more, general xslt processor decrease in efficiency fast more, and reach certain data volume as about 4,000,000, general xslt processor can't be handled, and just there is not this situation in processor of the present invention.
For further improving the efficient of xslt processor, can start with: improve the speed of SAX resolver on the one hand, on the other hand style sheet is optimized to the XML document traversal from two aspects.Since the general API that programming all is to use the software package of realization DOM interface and SAX interface to provide to XML, as the Xerces bag that Aparche company provides, the DOM4J bag of IBM Corporation, the products such as JDOM of SUN company.General xslt processor all is to be that development platform is done secondary development with some such products, as using xslt processor quite widely---and the Xalan processor just is based on Xerces and unwraps and send.Though these software packages are slightly different for the execution efficient of the support of SAX interface and SAX resolver, gap is little.So, be not a kind of effective method to the execution efficient that the speed of XML document traversal improves xslt processor by improving the SAX resolver.
Processor of the present invention when by SAX resolver ergodic source document, just travels through the style sheet tree one time to a rubidium marking when doing conversion whenever, and the size of Gu Yangshibiaoshu has considerable influence to the execution efficient of processor of the present invention.Especially when source document is quite big, the element that is source document is quite a lot of, as 100,000 or a hundreds of thousands element (perhaps this situation is quite rare this XML document in traditional XML uses, as the XML data in the ecommerce, have only tens or a hundreds of element, this is quite common for the GML document), if the style sheet tree can effectively dwindle under the situation of the function that does not influence it, execution efficient then of the present invention can be greatly improved.It is to improve a kind of effective ways that processor of the present invention is carried out efficient that style sheet is optimized, and the present invention analyzes by the structure to style sheet, has designed a kind of style sheet conversion method.
The style sheet file is made up of two parts, first is the locating information to the source document corresponding data, promptly by the defined node that has mark (as xsl:for-each, xsl:variable, xsl:copy-of, xsl:value-of etc.) to be constituted of XSL standard, and be included in the defined element of non-XSL standard in these nodes from source document locator data function.Second portion is the structural information of output XML file, require and all elements that belong to the result document mark of a part before not meeting.Two parts play not same-action in whole XSLT processing procedure, the second portion of style sheet is only relevant with result document, and are irrelevant with source document; So the first of style sheet is because have the function of data locking and source document closely related.Processor space-time consumption the best part of the present invention is to resolve source document in the SAX mode, and extracts the process of data by the data locking function of style sheet.The more little efficient of style sheet tree is high more in this process, and really participates in the first that has only in the middle of this process in the whole style sheet, and second portion has no truck with fully.So there is no need whole style sheet is converted into the style sheet tree, only first need be converted to this process of tree participation and get final product.So the present invention generates two style sheet trees with style sheet, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever.
In order to test performance of the present invention, we have chosen 14 groups of GML spatial datas between the 1MB-50MB, adopt identical style sheet, on identical machine, use respectively at present that the most frequently used xslt processor--the Xalan-java processor of Apache company, Saxon processor and the present invention of Michael H.Kay are the contrast experiment.Table 1 is to the test data of three processors and test result, altogether 14 groups of data.Xalan represented the Xalan-java processor during Method was listed as in the table 1, and Saxon represents the Saxon processor, Name tabulation diagrammatic sketch layer name; GML, XSL, SVG row are represented the size of GML, XSL, SVG file respectively; The TotalTime tabulation shows that the GML document is converted to the time that SVG figure is consumed, and unit is ms; The Memory tabulation shows that the GML document is converted to the internal memory that SVG figure is consumed, and unit is MB.
Table 1
ID Method Name Gml Xsl Svg TotalT ime Memory
1 Xalan Test1M 1M 19.3K 701K 4034 3169360
2 Xalan Test2M 2M 19.3K 1.44M 5235 6882808
3 Xalan Test3M 3M 19.3K 2.11M 6938 11867504
4 Xalan Test4M 4M 19.3K 2.98M 8550 11999792
5 Xalan Test5M 5M 19.3K 3.54M 11284 17121784
6 Xalan Test6M 6M 19.3K 4.21M 13006 23121816
7 Xalan Test7M 7M 19.3K 5.17M 14768 23522512
8 Xalan Test8M 8M 19.3K 6.04M 16551 25603880
9 Xalan Test9M 9M 19.3K 6.70M 18244 41024696
10 Xalan Test10M 10M 19.3K 7.37M 22244 51024696
11 Xalan Test20M 20M 19.3K 0 0 0
12 Xalan Test30M 30M 19.3K 0 0 0
13 Xalan Test40M 40M 19.3K 0 0 0
14 Xalan Test50M 50M 19.3K 0 0 0
15 Saxon Test1M 1M 19.3K 801K 1834 11155336
16 Saxon Test2M 2M 19.3K 1.54M 2835 11800152
17 Saxon Test3M 3M 19.3K 2.31M 4638 17440168
18 Saxon Test4M 4M 19.3K 3.08M 6150 29721600
19 Saxon Test5M 5M 19.3K 3.84M 8284 30942560
20 Saxon Test6M 6M 19.3K 4.61M 10006 40650072
21 Saxon Test7M 7M 19.3K 5.37M 11768 41607208
22 Saxon Test8M 8M 19.3K 6.14M 13551 61426736
23 Saxon Test9M 9M 19.3K 6.90M 15244 74339704
24 Saxon Test10M 10M 19.3K 7.67M 17046 85590960
25 Saxon Test20M 20M 19.3K 15.3M 34882 113047400
26 Saxon Test30M 30M 19.3K 22.9M 53128 122587560
27 Saxon Test40M 40M 19.3K 0 0 0
28 Saxon Test50M 50M 19.3K 0 0 0
29 The present invention Test1M 1M 19.3K 471K 1223 2769360
ID Method Name Gml Xsl Svg TotalT ime Memory
30 The present invention Test2M 2M 19.3K 925K 1312 5182808
31 The present invention Test3M 3M 19.3K 1.34M 1843 6304880
32 The present invention Test4M 4M 19.3K 1.79M 2353 7272208
33 The present invention Test5M 5M 19.3K 2.23M 2975 8646184
34 The present invention Test6M 6M 19.3K 2.67M 3635 9006984
35 The present invention Test7M 7M 19.3K 3.12M 4146 10759824
36 The present invention Test8M 8M 19.3K 3.56M 4656 11511200
37 The present invention Test9M 9M 19.3K 4.01M 5248 23315136
38 The present invention Test10M 10M 19.3K 4.45M 5869 35950400
39 The present invention Test20M 20M 19.3K 8.89M 12958 46793240
4 0 The present invention Test30M 30M 19.3K 13.3M 25436 58735264
4 1 The present invention Test40M 40M 19.3K 17.7M 36142 69626592
4 2 The present invention Test50M 50M 19.3K 22.2M 50382 80763968
As can be seen from Table 1, having some in the SVG row in Xalan and Saxon experimental result is zero (can't the processing for 20M and above GML document as Xalan, Saxon can't handle for 40M and above GML document), expression GML document is excessive, use processor can produce internal memory and overflow, cause convert failed.And there is not such problem in processor of the present invention.Fig. 1 is the curve maps of three processors to 14 groups of test data conversion institute time-consumings.Ordinate express time, unit are ms; Horizontal ordinate is represented the GML document size, and unit is MB.Fig. 2 is the curve maps of three processors to 14 groups of internal memories that test data conversion consumes.Ordinate represent consume internal memory, unit is MB; Horizontal ordinate is represented the GML document size, and unit is MB.A nethermost line is represented the execution result of processor of the present invention among two figure; A line in the middle of Fig. 1, a line above Fig. 2 are represented Saxon processor execution result; A line in the middle of a line above Fig. 1, Fig. 2 is represented Xalan processor execution result.As can be seen from the figure the time efficiency of processor of the present invention and space efficiency are all good than Xalan processor and Saxon processor, and the internal memory that is consumed for the big document the present invention more than the 10M is linear growth basically.
The content that is not described in detail in this instructions belongs to this area professional and technical personnel's known prior art.

Claims (2)

1, a kind of conversion method of new space information promulgation style sheet processor, it is characterized in that: for of the conversion of GML document to the SVG form, resolve the style sheet file in the DOM mode respectively, resolve source document in the SAX mode, style sheet is divided into locating information and structural information, generate two style sheet trees, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever,, just ignore as not needing if the element that needs just comes out data extract.
2, the conversion method of new space information promulgation style sheet processor as claimed in claim 1, it is characterized in that: the concrete steps of this method are:
First step: style sheet according to the definition burst, and is numbered in order;
Second step: burst is divided into two classes by definition, and a class is the first that contains locating information, and a class is the second portion that contains the structural information of output XML file;
Third step: will belong to each sheet in the burst of first, and read in the document flow, the corresponding document flow of each sheet produces a document flow array;
The 4th step: will belong to each sheet in the burst of second portion, and generate a dom tree, and produce a dom tree array, and be that each dom tree distributes a document flow, and produce a document flow array;
The 5th step: when SAX resolver resolves source document, just each dom tree in the dom tree array is traveled through one time to a rubidium marking whenever,, and put into corresponding dom tree corresponding file stream if the element that needs just comes out data extract;
The 6th step: after SAX resolver resolves source document finishes, first's corresponding file fluxion group and second portion corresponding file fluxion group are merged in proper order by the burst in the first step, the document flow output that generates at last is exactly result document.
CNB2004100611853A 2004-11-25 2004-11-25 Converting method for processor of spatial information issuing forms Expired - Fee Related CN100347704C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB2004100611853A CN100347704C (en) 2004-11-25 2004-11-25 Converting method for processor of spatial information issuing forms

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB2004100611853A CN100347704C (en) 2004-11-25 2004-11-25 Converting method for processor of spatial information issuing forms

Publications (2)

Publication Number Publication Date
CN1614592A CN1614592A (en) 2005-05-11
CN100347704C true CN100347704C (en) 2007-11-07

Family

ID=34764450

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB2004100611853A Expired - Fee Related CN100347704C (en) 2004-11-25 2004-11-25 Converting method for processor of spatial information issuing forms

Country Status (1)

Country Link
CN (1) CN100347704C (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458711B (en) * 2008-12-30 2011-01-05 国家电网公司 Image describing and transforming method and system
CN102075555B (en) * 2009-11-20 2013-05-15 武汉大学 Dynamic spatial information processing service automatic composition method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1537285A (en) * 2001-08-03 2004-10-13 �ʼҷ����ֵ��ӹɷ����޹�˾ Method and system for updating document

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1537285A (en) * 2001-08-03 2004-10-13 �ʼҷ����ֵ��ӹɷ����޹�˾ Method and system for updating document

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
WebGIS应用中GML文档到SVG的转换 刘旭军,关佶红.计算机应用,第24卷第2期 2004 *
基于GML的GIS空间要素描述与应用研究 童小华,许谷声.同济大学学报(自然科学版),第32卷第6期 2004 *
基于GML的Web空间信息查询和集成方法 安杨,关佶红,陈俊鹏,赵波.计算机工程,第30卷第9期 2004 *

Also Published As

Publication number Publication date
CN1614592A (en) 2005-05-11

Similar Documents

Publication Publication Date Title
US8151183B2 (en) System and method for facilitating content display on portable devices
US7899826B2 (en) Semantic reconstruction
US7721195B2 (en) RTF template and XSL/FO conversion: a new way to create computer reports
US7703009B2 (en) Extensible stylesheet designs using meta-tag information
US20080301545A1 (en) Method and system for the intelligent adaption of web content for mobile and handheld access
CN1687926A (en) Method of PDF file information extraction system based on XML
US8205153B2 (en) Information extraction combining spatial and textual layout cues
CN103049439A (en) Processing method for markup language documents, browser and network operating system
Lu et al. Advances in GML for geospatial applications
Guo et al. G2ST: a novel method to transform GML to SVG
Liu et al. Hiindex: An efficient spatial index for rapid visualization of large-scale geographic vector data
CN102236713A (en) Digital television interaction service page information extraction method and device
CN100347704C (en) Converting method for processor of spatial information issuing forms
CN101587470A (en) The edit methods of document and device
Yan et al. Automatic construction of RDF with web tables
Antoniou et al. The potential of XML encoding in geomatics converting raster images to XML and SVG
Williams et al. Data storage and extraction in engineering software using XML
CN103870543A (en) Method and device for reconstructing document file
CN116340259A (en) Document management method, document management system and computing device
Chochev et al. Design Techniques and Practices of Grid Layouts and Content of Web Pages
CN117236282B (en) Intelligent typesetting method, device, terminal and medium based on XML data
Hughes et al. Encoding and presenting interlinear text using XML technologies
Dashun et al. Study of WebGIS architechture based on GML and SVG
Wang et al. GML data management: framework and prototype
Khamsom et al. Smartphone Information Extraction and Integration from Web

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
C17 Cessation of patent right
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20071107

Termination date: 20111125