CN100347704C - Converting method for processor of spatial information issuing forms - Google Patents
Converting method for processor of spatial information issuing forms Download PDFInfo
- Publication number
- CN100347704C CN100347704C CNB2004100611853A CN200410061185A CN100347704C CN 100347704 C CN100347704 C CN 100347704C CN B2004100611853 A CNB2004100611853 A CN B2004100611853A CN 200410061185 A CN200410061185 A CN 200410061185A CN 100347704 C CN100347704 C CN 100347704C
- Authority
- CN
- China
- Prior art keywords
- style sheet
- document
- processor
- dom tree
- gml
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Landscapes
- Document Processing Apparatus (AREA)
Abstract
The present invention relates to a new converting method of a processor for a spatial information issuing style sheet, which comprises the following procedures: a GML style sheet file is resolved in a DOM mode; a GML source file is resolved in an SAX mode, and the style sheet is divided into positioning information and structure information to generate two style sheet trees. When resolving source documents, an SAX resolver goes once over the two style sheet trees in each time when encountering an element mark; if the element is the required element, data is extracted; if the element is not the required element, the data is ignored. The present invention is characterized in that the present invention can rapidly realize the conversion among XML documents and can process the conversion from GML spatial information with large data quantity to SVG to issue the spatial information.
Description
Technical field
The spatial information that the invention belongs in the infotech obtains and processing technology field, particularly relates to a kind of conversion method of new space information promulgation style sheet processor.
Background technology
Geography information SGML GML (Geography Markup Language) document is as expandable mark language XML (the eXtensible Markup Language) document that comprises geography information, be used for being encoded in space and non-spatial information, can be used for carrying out integrated the isomery spatial data.But itself does not design GML for demonstration, and the demonstration of geodata is a very important part in the space information promulgation, and therefore, the GML document need be converted to displayable form and issue.SVG (Scalable Vector Graphics) the vector format standard that we select W3C to recommend.SVG itself also is based on XML's, the GML document is transformed into the XML document that the SVG form comes down to a kind of XML document of form is converted to another kind of form, can pass through style sheet XSLT (XML style sheet converter) and realize, pass to by the style sheet (XSLT) that will write and GML source document that the execution of XSL processor realizes.
We (organize as W3C from some professional technique websites relevant with XML, XSL, XSLT
Http:// www.w3c.org/Style/XSL/,
Http:// www.w3c.org/DOM/,
Http:// www.w3c.org/XML/, IBM Corporation
Http:// www-900.ibm.com/developerWorks/cn/xml/index.shtmlDeng) can see, when making the XML format conversion, at first to resolve two input files respectively: source document, style sheet file.The method that traditional xslt processor is handled is to be that dom tree leaves in the internal memory with these two file conversions, the size of the dom tree in the internal memory generally reach 10 times of the raw data size or more than, this is little to small data quantity XML data influence, all little as the style sheet file, be no more than hundreds of K byte, it is converted into dom tree does not take too many internal memory and time, so be not the bottleneck of efficient; And the GML data source document may be very big, may tens even up to a hundred million, if transfer dom tree to, the system resource that takies is too many, is easy to cause internal memory to overflow.Generate the shared cpu resource of dom tree if add, the user is difficult to receive such conversion efficiency, the bottleneck of efficient when being the conversion of XML file.Therefore, when excessive as if source document, adopting the DOM mode is not a good selection.
Traditional method does not directly adopt XSLT to handle to big file (as tens or up to a hundred millions') processing, and adopts other two kinds of methods: a kind of method is that data are divided into piece, changes each piece respectively by XSLT, at last result block is merged.Do programming like this than being easier to, applicability is wider.But if data are excessive, piecemeal is too much, and the time that this method consumes is oversize, for big data quantity GML data processing poor efficiency still.Another kind method is to use the application program of SAX interface to realize XSLT style sheet identical functions by writing one.It moves speed can be than the former fast a lot of times, and efficient is very high.But the programming based on the SAX interface is very complicated, and extendability is relatively poor.
Xslt processor commonly used based on above-mentioned technical method mainly is at present: the Xalan-java processor of Apache company (
Http:// xml.apache.org/index.html#xalan, nearest October 25 2004 access time) and the Saxon processor of Michael H.Kay (
Http:// www.saxonica.com/, nearest October 25 2004 access time).But relevant test data of experiment shows that data volume such as Xalan that these xslt processors can be handled can't handle for 20MB and above GML document, Saxon can't handle for 40MB and above GML document, be that the GML document is excessive, produce internal memory in the processor transfer process and overflow, cause convert failed.
So how to design the xslt processor that efficiently to handle big document, become efficiently with the key of GML to the SVG conversion.
Summary of the invention
At the problems referred to above, the invention provides a kind of xslt processor that does not need source document is converted to a dom tree, having avoided generating when dom tree consumed null resource and internal memory overflows, improved the conversion efficiency of big data quantity XML data greatly, and applicability is than the conversion method of extensively new space information promulgation style sheet processor.
In order to address the above problem, the conversion method of processor of the present invention is: for the conversion of GML document to the SVG form, resolve the style sheet file in the DOM mode respectively, resolve source document in the SAX mode.Above-mentioned style sheet is divided into locating information and structural information, generate two style sheet trees, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever, if the element that needs just comes out data extract, just ignore as not needing.
Above-mentioned concrete conversion method is:
First step: style sheet according to the definition burst, and is numbered in order;
Second step: burst is divided into two classes by definition, and a class is first's (containing locating information), and a class is second portion (structural information that contains output XML file);
Third step: will belong to each sheet in the burst of first, and read in the document flow, the corresponding document flow of each sheet produces a document flow array;
The 4th step: will belong to each sheet in the burst of second portion, and generate a dom tree, and produce a dom tree array, and be that each dom tree distributes a document flow, and produce a document flow array;
The 5th step: when SAX resolver resolves source document, just each dom tree in the dom tree array is traveled through one time to a rubidium marking whenever,, and put into corresponding dom tree corresponding file stream if the element that needs just comes out data extract;
The 6th step: after SAX resolver resolves source document finishes, first's corresponding file fluxion group and second portion corresponding file fluxion group are merged in proper order by the burst in the first step, the document flow output that generates at last is exactly result document.
Characteristics of the present invention are: resolve the style sheet file in the DOM mode, resolve source document in the SAX mode.And style sheet is divided into locating information and structural information, generate two style sheet trees, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever, if the element that needs just comes out data extract, just ignore as not needing.Transfer algorithm can be realized the conversion between XML document apace, and can handle the conversion of the GML spatial information of big data quantity to SVG.
Description of drawings
Fig. 1 is the time efficiency comparison diagram of the present invention and two common processor.
Fig. 2 is the space efficiency comparison diagram of the present invention and two common processor.
Embodiment
Further specify the present invention below in conjunction with accompanying drawing.
The present invention resolves to a dom tree with the style sheet file in the DOM mode to deposit in the internal memory, resolves source document in the SAX mode.Just the style sheet tree is traveled through one time to a rubidium marking whenever,, just ignore as not needing if the element that needs just comes out data extract.
Because in the ordinary course of things, the style sheet file is very little, has only several KB or tens KB, and the dom tree of generation is just very little, and it is very high to scan XML file efficient in the SAX mode, so processor processing speed of the present invention is faster than general xslt processor speed.Especially when the source document data volume is very big, the general time that xslt processor consumed is to rise with exponential speed, and processor of the present invention is under the situation that style sheet is determined, the time that is consumed rises with linear speed, and this is because the SAX mode scans the used time of XML file along with the increase of XML file size increases with linear speed.So the source document data volume is big more, general xslt processor decrease in efficiency fast more, and reach certain data volume as about 4,000,000, general xslt processor can't be handled, and just there is not this situation in processor of the present invention.
For further improving the efficient of xslt processor, can start with: improve the speed of SAX resolver on the one hand, on the other hand style sheet is optimized to the XML document traversal from two aspects.Since the general API that programming all is to use the software package of realization DOM interface and SAX interface to provide to XML, as the Xerces bag that Aparche company provides, the DOM4J bag of IBM Corporation, the products such as JDOM of SUN company.General xslt processor all is to be that development platform is done secondary development with some such products, as using xslt processor quite widely---and the Xalan processor just is based on Xerces and unwraps and send.Though these software packages are slightly different for the execution efficient of the support of SAX interface and SAX resolver, gap is little.So, be not a kind of effective method to the execution efficient that the speed of XML document traversal improves xslt processor by improving the SAX resolver.
Processor of the present invention when by SAX resolver ergodic source document, just travels through the style sheet tree one time to a rubidium marking when doing conversion whenever, and the size of Gu Yangshibiaoshu has considerable influence to the execution efficient of processor of the present invention.Especially when source document is quite big, the element that is source document is quite a lot of, as 100,000 or a hundreds of thousands element (perhaps this situation is quite rare this XML document in traditional XML uses, as the XML data in the ecommerce, have only tens or a hundreds of element, this is quite common for the GML document), if the style sheet tree can effectively dwindle under the situation of the function that does not influence it, execution efficient then of the present invention can be greatly improved.It is to improve a kind of effective ways that processor of the present invention is carried out efficient that style sheet is optimized, and the present invention analyzes by the structure to style sheet, has designed a kind of style sheet conversion method.
The style sheet file is made up of two parts, first is the locating information to the source document corresponding data, promptly by the defined node that has mark (as xsl:for-each, xsl:variable, xsl:copy-of, xsl:value-of etc.) to be constituted of XSL standard, and be included in the defined element of non-XSL standard in these nodes from source document locator data function.Second portion is the structural information of output XML file, require and all elements that belong to the result document mark of a part before not meeting.Two parts play not same-action in whole XSLT processing procedure, the second portion of style sheet is only relevant with result document, and are irrelevant with source document; So the first of style sheet is because have the function of data locking and source document closely related.Processor space-time consumption the best part of the present invention is to resolve source document in the SAX mode, and extracts the process of data by the data locking function of style sheet.The more little efficient of style sheet tree is high more in this process, and really participates in the first that has only in the middle of this process in the whole style sheet, and second portion has no truck with fully.So there is no need whole style sheet is converted into the style sheet tree, only first need be converted to this process of tree participation and get final product.So the present invention generates two style sheet trees with style sheet, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever.
In order to test performance of the present invention, we have chosen 14 groups of GML spatial datas between the 1MB-50MB, adopt identical style sheet, on identical machine, use respectively at present that the most frequently used xslt processor--the Xalan-java processor of Apache company, Saxon processor and the present invention of Michael H.Kay are the contrast experiment.Table 1 is to the test data of three processors and test result, altogether 14 groups of data.Xalan represented the Xalan-java processor during Method was listed as in the table 1, and Saxon represents the Saxon processor, Name tabulation diagrammatic sketch layer name; GML, XSL, SVG row are represented the size of GML, XSL, SVG file respectively; The TotalTime tabulation shows that the GML document is converted to the time that SVG figure is consumed, and unit is ms; The Memory tabulation shows that the GML document is converted to the internal memory that SVG figure is consumed, and unit is MB.
Table 1
ID | Method | Name | Gml | Xsl | Svg | TotalT ime | Memory |
1 | Xalan | Test1M | 1M | 19.3K | 701K | 4034 | 3169360 |
2 | Xalan | Test2M | 2M | 19.3K | 1.44M | 5235 | 6882808 |
3 | Xalan | Test3M | 3M | 19.3K | 2.11M | 6938 | 11867504 |
4 | Xalan | Test4M | 4M | 19.3K | 2.98M | 8550 | 11999792 |
5 | Xalan | Test5M | 5M | 19.3K | 3.54M | 11284 | 17121784 |
6 | Xalan | Test6M | 6M | 19.3K | 4.21M | 13006 | 23121816 |
7 | Xalan | Test7M | 7M | 19.3K | 5.17M | 14768 | 23522512 |
8 | Xalan | Test8M | 8M | 19.3K | 6.04M | 16551 | 25603880 |
9 | Xalan | Test9M | 9M | 19.3K | 6.70M | 18244 | 41024696 |
10 | Xalan | Test10M | 10M | 19.3K | 7.37M | 22244 | 51024696 |
11 | Xalan | Test20M | 20M | 19.3K | 0 | 0 | 0 |
12 | Xalan | Test30M | 30M | 19.3K | 0 | 0 | 0 |
13 | Xalan | Test40M | 40M | 19.3K | 0 | 0 | 0 |
14 | Xalan | Test50M | 50M | 19.3K | 0 | 0 | 0 |
15 | Saxon | Test1M | 1M | 19.3K | 801K | 1834 | 11155336 |
16 | Saxon | Test2M | 2M | 19.3K | 1.54M | 2835 | 11800152 |
17 | Saxon | Test3M | 3M | 19.3K | 2.31M | 4638 | 17440168 |
18 | Saxon | Test4M | 4M | 19.3K | 3.08M | 6150 | 29721600 |
19 | Saxon | Test5M | 5M | 19.3K | 3.84M | 8284 | 30942560 |
20 | Saxon | Test6M | 6M | 19.3K | 4.61M | 10006 | 40650072 |
21 | Saxon | Test7M | 7M | 19.3K | 5.37M | 11768 | 41607208 |
22 | Saxon | Test8M | 8M | 19.3K | 6.14M | 13551 | 61426736 |
23 | Saxon | Test9M | 9M | 19.3K | 6.90M | 15244 | 74339704 |
24 | Saxon | Test10M | 10M | 19.3K | 7.67M | 17046 | 85590960 |
25 | Saxon | Test20M | 20M | 19.3K | 15.3M | 34882 | 113047400 |
26 | Saxon | Test30M | 30M | 19.3K | 22.9M | 53128 | 122587560 |
27 | Saxon | Test40M | 40M | 19.3 | 0 | 0 | 0 |
28 | Saxon | Test50M | 50M | 19.3 | 0 | 0 | 0 |
29 | The present invention | Test1M | 1M | 19.3K | 471K | 1223 | 2769360 |
ID | Method | Name | Gml | Xsl | Svg | TotalT ime | Memory |
30 | The present invention | Test2M | 2M | 19.3K | 925K | 1312 | 5182808 |
31 | The present invention | Test3M | 3M | 19.3K | 1.34M | 1843 | 6304880 |
32 | The present invention | Test4M | 4M | 19.3K | 1.79M | 2353 | 7272208 |
33 | The present invention | Test5M | 5M | 19.3K | 2.23M | 2975 | 8646184 |
34 | The present invention | Test6M | 6M | 19.3K | 2.67M | 3635 | 9006984 |
35 | The present invention | Test7M | 7M | 19.3K | 3.12M | 4146 | 10759824 |
36 | The present invention | Test8M | 8M | 19.3K | 3.56M | 4656 | 11511200 |
37 | The present invention | Test9M | 9M | 19.3K | 4.01M | 5248 | 23315136 |
38 | The present invention | Test10M | 10M | 19.3K | 4.45M | 5869 | 35950400 |
39 | The present invention | Test20M | 20M | 19.3K | 8.89M | 12958 | 46793240 |
4 0 | The present invention | Test30M | 30M | 19.3K | 13.3M | 25436 | 58735264 |
4 1 | The present invention | Test40M | 40M | 19.3K | 17.7M | 36142 | 69626592 |
4 2 | The present invention | Test50M | 50M | 19.3K | 22.2M | 50382 | 80763968 |
As can be seen from Table 1, having some in the SVG row in Xalan and Saxon experimental result is zero (can't the processing for 20M and above GML document as Xalan, Saxon can't handle for 40M and above GML document), expression GML document is excessive, use processor can produce internal memory and overflow, cause convert failed.And there is not such problem in processor of the present invention.Fig. 1 is the curve maps of three processors to 14 groups of test data conversion institute time-consumings.Ordinate express time, unit are ms; Horizontal ordinate is represented the GML document size, and unit is MB.Fig. 2 is the curve maps of three processors to 14 groups of internal memories that test data conversion consumes.Ordinate represent consume internal memory, unit is MB; Horizontal ordinate is represented the GML document size, and unit is MB.A nethermost line is represented the execution result of processor of the present invention among two figure; A line in the middle of Fig. 1, a line above Fig. 2 are represented Saxon processor execution result; A line in the middle of a line above Fig. 1, Fig. 2 is represented Xalan processor execution result.As can be seen from the figure the time efficiency of processor of the present invention and space efficiency are all good than Xalan processor and Saxon processor, and the internal memory that is consumed for the big document the present invention more than the 10M is linear growth basically.
The content that is not described in detail in this instructions belongs to this area professional and technical personnel's known prior art.
Claims (2)
1, a kind of conversion method of new space information promulgation style sheet processor, it is characterized in that: for of the conversion of GML document to the SVG form, resolve the style sheet file in the DOM mode respectively, resolve source document in the SAX mode, style sheet is divided into locating information and structural information, generate two style sheet trees, when SAX resolver resolves source document, just these two style sheet trees are traveled through one time to a rubidium marking whenever,, just ignore as not needing if the element that needs just comes out data extract.
2, the conversion method of new space information promulgation style sheet processor as claimed in claim 1, it is characterized in that: the concrete steps of this method are:
First step: style sheet according to the definition burst, and is numbered in order;
Second step: burst is divided into two classes by definition, and a class is the first that contains locating information, and a class is the second portion that contains the structural information of output XML file;
Third step: will belong to each sheet in the burst of first, and read in the document flow, the corresponding document flow of each sheet produces a document flow array;
The 4th step: will belong to each sheet in the burst of second portion, and generate a dom tree, and produce a dom tree array, and be that each dom tree distributes a document flow, and produce a document flow array;
The 5th step: when SAX resolver resolves source document, just each dom tree in the dom tree array is traveled through one time to a rubidium marking whenever,, and put into corresponding dom tree corresponding file stream if the element that needs just comes out data extract;
The 6th step: after SAX resolver resolves source document finishes, first's corresponding file fluxion group and second portion corresponding file fluxion group are merged in proper order by the burst in the first step, the document flow output that generates at last is exactly result document.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100611853A CN100347704C (en) | 2004-11-25 | 2004-11-25 | Converting method for processor of spatial information issuing forms |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CNB2004100611853A CN100347704C (en) | 2004-11-25 | 2004-11-25 | Converting method for processor of spatial information issuing forms |
Publications (2)
Publication Number | Publication Date |
---|---|
CN1614592A CN1614592A (en) | 2005-05-11 |
CN100347704C true CN100347704C (en) | 2007-11-07 |
Family
ID=34764450
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CNB2004100611853A Expired - Fee Related CN100347704C (en) | 2004-11-25 | 2004-11-25 | Converting method for processor of spatial information issuing forms |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN100347704C (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458711B (en) * | 2008-12-30 | 2011-01-05 | 国家电网公司 | Image describing and transforming method and system |
CN102075555B (en) * | 2009-11-20 | 2013-05-15 | 武汉大学 | Dynamic spatial information processing service automatic composition method |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1537285A (en) * | 2001-08-03 | 2004-10-13 | �ʼҷ����ֵ��ӹɷ�����˾ | Method and system for updating document |
-
2004
- 2004-11-25 CN CNB2004100611853A patent/CN100347704C/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1537285A (en) * | 2001-08-03 | 2004-10-13 | �ʼҷ����ֵ��ӹɷ�����˾ | Method and system for updating document |
Non-Patent Citations (3)
Title |
---|
WebGIS应用中GML文档到SVG的转换 刘旭军,关佶红.计算机应用,第24卷第2期 2004 * |
基于GML的GIS空间要素描述与应用研究 童小华,许谷声.同济大学学报(自然科学版),第32卷第6期 2004 * |
基于GML的Web空间信息查询和集成方法 安杨,关佶红,陈俊鹏,赵波.计算机工程,第30卷第9期 2004 * |
Also Published As
Publication number | Publication date |
---|---|
CN1614592A (en) | 2005-05-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8151183B2 (en) | System and method for facilitating content display on portable devices | |
US7899826B2 (en) | Semantic reconstruction | |
US7721195B2 (en) | RTF template and XSL/FO conversion: a new way to create computer reports | |
US7703009B2 (en) | Extensible stylesheet designs using meta-tag information | |
US20080301545A1 (en) | Method and system for the intelligent adaption of web content for mobile and handheld access | |
CN1687926A (en) | Method of PDF file information extraction system based on XML | |
US8205153B2 (en) | Information extraction combining spatial and textual layout cues | |
CN103049439A (en) | Processing method for markup language documents, browser and network operating system | |
Lu et al. | Advances in GML for geospatial applications | |
Guo et al. | G2ST: a novel method to transform GML to SVG | |
Liu et al. | Hiindex: An efficient spatial index for rapid visualization of large-scale geographic vector data | |
CN102236713A (en) | Digital television interaction service page information extraction method and device | |
CN100347704C (en) | Converting method for processor of spatial information issuing forms | |
CN101587470A (en) | The edit methods of document and device | |
Yan et al. | Automatic construction of RDF with web tables | |
Antoniou et al. | The potential of XML encoding in geomatics converting raster images to XML and SVG | |
Williams et al. | Data storage and extraction in engineering software using XML | |
CN103870543A (en) | Method and device for reconstructing document file | |
CN116340259A (en) | Document management method, document management system and computing device | |
Chochev et al. | Design Techniques and Practices of Grid Layouts and Content of Web Pages | |
CN117236282B (en) | Intelligent typesetting method, device, terminal and medium based on XML data | |
Hughes et al. | Encoding and presenting interlinear text using XML technologies | |
Dashun et al. | Study of WebGIS architechture based on GML and SVG | |
Wang et al. | GML data management: framework and prototype | |
Khamsom et al. | Smartphone Information Extraction and Integration from Web |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
C17 | Cessation of patent right | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20071107 Termination date: 20111125 |