CN112559767A - Method for automatically constructing RDF data based on XML data - Google Patents
Method for automatically constructing RDF data based on XML data Download PDFInfo
- Publication number
- CN112559767A CN112559767A CN202011445817.1A CN202011445817A CN112559767A CN 112559767 A CN112559767 A CN 112559767A CN 202011445817 A CN202011445817 A CN 202011445817A CN 112559767 A CN112559767 A CN 112559767A
- Authority
- CN
- China
- Prior art keywords
- rdf
- elements
- xml
- data
- propval
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 230000002776 aggregation Effects 0.000 claims abstract description 15
- 238000004220 aggregation Methods 0.000 claims abstract description 15
- 238000013507 mapping Methods 0.000 claims abstract description 13
- 230000004931 aggregating effect Effects 0.000 claims abstract description 3
- 238000006116 polymerization reaction Methods 0.000 claims description 9
- 230000003252 repetitive effect Effects 0.000 claims description 4
- 230000010354 integration Effects 0.000 claims description 3
- 239000002904 solvent Substances 0.000 claims description 3
- 238000012545 processing Methods 0.000 abstract description 5
- 238000012216 screening Methods 0.000 abstract description 3
- 238000006243 chemical reaction Methods 0.000 description 10
- 230000008569 process Effects 0.000 description 9
- 238000013499 data model Methods 0.000 description 5
- 238000010276 construction Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 239000003795 chemical substances by application Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/80—Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
- G06F16/84—Mapping; Conversion
Abstract
The invention discloses a method for automatically constructing RDF data based on XML data, which comprises the following steps of firstly, extracting semantics of different types of XML data; aggregating elements with the same label name for XML data without format limitation in a traversal mode, then sorting the aggregation classes to obtain abstract models corresponding to different aggregation classes, and constructing an RDF Schema according to corresponding mapping rules; for XML data limited by XML Schema, obtaining relevant classes and attributes through the analysis processing of the XML Schema, and constructing an RDF Schema body according to the obtained classes and attributes; then, screening repeated data entities for the element numbers in the XML, traversing the repeated elements in the XML, adding unique codes to different elements according to equivalent element judgment conditions, and giving the same codes to the repeated elements; finally, corresponding mapping rules are constructed for different aggregation classes, and RDF triples corresponding to the elements are constructed; the method realizes the purpose of converting the RDF data into the RDF data, and has higher universality.
Description
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method for automatically constructing RDF data based on XML data.
Background
The development of the world wide web technology changes the development process of the human society, almost all aspects of human life exist today, and the revolution of the world wide web technology drives the progress of the human society. The semantic web technology has been a great progress since its birth as one of the directions of such revolution. The semantic WEB technology adopts a representation method which is easier to be understood by a machine to describe data information on the WEB, so that a computer can process data more intelligently. RDF is a data model for describing the relationship between objects (resources), and the data model is used as a data description model to endow semantics to the data, and the semantic data can realize logic reasoning in a semantic network, so that the network application is more intelligent. RDF (resource description frame) is composed of a series of statements, namely "object-attribute-value" triples. RDF is domain independent, and a user can use the RDF Schema to define terms used in a certain domain, can use the terms as a vocabulary description language for describing classes and attributes, and can describe the hierarchical semantics related to the classes and attributes.
XML is a document markup language, which can effectively describe the interrelation between data through the user-defined tags and the nesting relationship between the tags, and as a standard format suitable for describing network semi-structured data, XML has been developed as a main medium for data representation and data exchange in the information field and has been applied in many fields. XML provides support in grammar for data construction through tag nesting and self-defining, but semantics hidden in data can only be understood through manual analysis, and the purpose of processing data in an intelligent agent mode depicted by a semantic network cannot be achieved, so that data described based on XML needs to be converted, semantics between related data and data are described through an RDF data model, and the converted data can reach a data construction standard required by the semantic network.
The invention starts from the structure and the content of the XML and extracts the implicit semantics in the data. In order to unify the structure of the XML data in a specific field, dtd (document type Definition) or XML Schema Definition is generally used to specify the elements and attributes used in the XML document and the organization of the data. Most XML documents are also DTD or XML Schema free. The invention focuses on various different types of XML documents and realizes a universal conversion method, and because DTD is gradually replaced by XML Schema, the invention does not discuss DTD.
The invention is based on XML Schema XML document, through the XML Schema analysis to obtain XML document structure information, mainly element and attribute mutual nesting relation; for XML documents without structure specification, the mutual nesting relation of elements and attributes is obtained by directly analyzing the XML documents to classify and aggregate the elements and attributes in the XML documents. No matter which type of XML is used, the obtained nested relation of the elements and the attributes is classified and defined, and a mapping rule which is converted into corresponding RDF domain vocabularies (classes and attributes) is constructed, namely a conceptual model-ontology of the related domain is obtained, and an RDF Schema is used as a description language of the ontology in the invention. The ontology is the basis of logical reasoning, and the building of the ontology conforming to the semantics contained in the source data is the basis of building RDF built based on XML. In many existing conversion methods, only the conversion method of XML with structural description (i.e. XML Schema) is considered, and there are some cases that the semantics are unreasonable in the process of constructing domain vocabulary (RDF Schema), and some semantics which are helpful for the conversion process but not in the source data are artificially added. The invention constructs a set of ontology vocabulary which accords with the semantics of source data (XML documents) based on the structure of XML and the contents of elements and attributes.
And converting the data in the XML instance into RDF instance data based on the constructed ontology, namely the RDF Schema, wherein the RDF describes the relationship between the entities and the attribute values, the entities are instances corresponding to the classes contained in the RDF Schema, and the relationship is the attribute specified in the relationships. XML is a semi-structured data in which elements and attribute tags appear repeatedly, especially in large-scale documents. Through analysis of related XML data documents, the embedded content of a part of tags in the same element tags and attribute tags can repeatedly appear in the XML documents, and the other part of tags is different, the repeatedly appearing data is only expressed and stored once in other data models (such as relational databases and RDFs), if the data is used in the same document for multiple times, a reference mechanism is used, but the model of the XML expressing the data in a nested relation does not have the reference mechanism. Therefore, if the data with the same label is not identified and screened in the process of converting the XML into the RDF, the constructed RDF data can have data redundancy and even data contradiction, and further processing (query and ontology-based reasoning) on the constructed RDF data is affected. Aiming at the situation, the invention provides an identification and screening mechanism in the construction process of RDF data, and ensures the effectiveness and completeness of the constructed data.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a method for automatically constructing RDF data based on XML data. The method achieves the purpose of converting XML data into RDF type data on the basis of processing various types of XML data, has higher universality, eliminates the redundancy and the contradiction of the constructed RDF data by identifying the repeated elements in the XML, constructs the data which is more suitable for knowledge engineering, and has good effect.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a method for automatically constructing RDF data based on XML data comprises the following steps:
step S1, analyzing the tree model of the XML data document; clustering elements according to the names of initial labels, determining sub-models corresponding to the elements of the same type, and integrating the sub-models of all the elements of the same type to obtain an abstract model corresponding to the cluster; constructing a glossary RDF Schema according to the obtained abstract model; in particular, the amount of the solvent to be used,
for in XML data documentsClassifying into several classes En1,En2,L,Eni,L(ni∈Ne) An element eiCategorizing into corresponding aggregation class E according to the name of the start tagni:
f1:(e∈E)→En1,L EniL,(ni∈Ne)
Step S2, step EniOf all elements in modej∈MODniAbstract integration is carried out to obtain an abstract model smod corresponding to the aggregation classni=(ni,{CNni,ANni,vni}); wherein n isiIs of the polymerization class EniTag name of all elements in, CNniIs a set of names of sub-elements, ANniIs the name set of the attribute contained by the element's start tag, where vniIs a logical variable, vniPresence description elements inline content contains text values, specifically:
(1) when CNni=φ、ANni=φ、vniWhen present, polymerising class EniThe corresponding abstract model is smodni=(ni,vni) At this timeElement ejCorresponding submodel modejIs a simple submodel, i.e. modejE is S; wherein S represents a set of sub-models, at this time, the RDF triples are constructed as follows:
fpi:ni→pini,(ni∈Ne,pini∈PI)
Type(?pini,Property)
PropVal(range,?pini,?DateTypeIRI)
where PI represents the collection of attributes in the RDF vocabulary, DateTypeIRI represents the built-in data type, PIniRepresents niThe attributes mapped to;
(2) when v isniIn the absence of polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni}); the RDF triples are constructed at this point as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnj,L}→{pi1,pi2,L,pij,L}(cnj∈CNni,pij∈PI,j=1,2,L,n)
Type(?pij,Property)(j=1,2,L,n)
PropVal(domain,?pij,?cini)
fci:cnj→cicnj(cnj∈CNni,cicnj∈CI)
PropVal(range,?pij,?cicnj)
PropVal(range,?pij,?DateTypeIRI)
fpi:{an1,an2,L,ank,L}→{pi1,pi2,L,pik,L}(ank∈ANni,pik∈PI,k=1,2,L,n)
Type(?pik,Property)(k=1,2,L,n)
PropVal(domain,?pik,?cini)
PropVal(range,?pik,?DateTypeIRI)
where CI represents a collection of classes in the RDF vocabulary. Let cn assumej∈CNniThen based on cnjAn attribute pi can be constructedj。
(3) And polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni,vni}); based on n at this timei、CNni、ANniThe class and attribute rules in the RDF Schema are respectively constructed as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnm,L}→{pi1,pi2,L,pim,L}(cnm∈CNni,pim∈PI,m=1,2,L,n)
Type(?pim,Property)(m=1,2,L,n)
PropVal(domain,?pim,?cini)
fci:cnq→cicnq(cnq∈CNni,cicnq∈CI)
PropVal(range,?pim,?cicnm)
PropVal(range,?pim,?DateTypeIRI)
fpi:{an1,an2,L,anp,L}→{pi1,pi2,L,pip,L}(anp∈ANni,pip∈PI,p=1,2,L,n)
Type(?pip,Property)(p=1,2,L,n)
PropVal(domain,?pip,?cini)
PropVal(range,?pip,?DateTypeIRI)
Type(value,Property)
step S3, according to the mapping rule corresponding to the abstract model in the step S2, constructing a glossary RDF Schema of the current field as follows:
frdfs:{En1,L EniL}→RDF Schema(ni∈Ne)
wherein the set of all aggregated classes of the XML document is XSD ═ En1,L EniL}(ni∈Ne);
Step S4, recognizing the repetitive elements in the XML document data, specifically,
traversing all elements E and attributes A of the XML document, and attaching a unique ID; adjusting the IDs of the elements and the attributes in the current XML document to ensure that the IDs of the equivalent elements and the equivalent attributes are the same; traversing the tree model of the XML again by adopting a back root traversal method, adjusting from the bottom of the tree model to the root, identifying equivalent elements and equivalent attributes in the document, and adjusting the IDs of the equivalent elements and the equivalent attributes uniformly; the method comprises the following specific steps:
em∈CEei,en∈CEej,am∈EAei,an∈EAej
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
em∈CEei,en∈CEej,ap∈EAei,aq∈EAej,ei→vi,ej→vj
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
s5, after clustering the elements in the current XML document, mapping the XML document into an RDF triple sequence based on the step S2; according to step S4, traversing the XML tree model after the ID adjustment of the elements and attributes is completed, storing the mapped element ID set as OID, and constructing an RDF ternary sequence as follows:
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
t1=tv=(rei,rdf:value,vi)
PropVal(value,?rei,?vi)
{t1,t2,L}={tem|m=1,2,L}∪{tan|n=1,2,L}(piem→tem、pian→tan)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
fr:(nm,IDem)→rem,(nm∈Ne,rem∈R)
tan=(rei,pian,vn)
PropVal(?pian,?rei,?vi)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
tv=(rei,rdf:value,vi)
tv∈{t1,t2,L}
PropVal(value,?rei,?vi)。
has the advantages that: the invention has the following advantages:
(1) the invention has good effect on the conversion construction of the XML data with larger scale, and is different from the prior scheme which only aims at the XML data with smaller scale.
(2) The invention optimizes the mapping rule, solves the problem of semantic accuracy in the existing method, provides a corresponding conversion scheme aiming at different types of XML data, and provides a more uniform scheme for constructing RDF data based on XML.
(3) The invention greatly reduces the redundancy of RDF data obtained by subsequent conversion by identifying repeated data entities in XML data, and is different from the prior method in that the prior method generally processes the redundancy condition of the RDF data.
Drawings
FIG. 1 is a schematic diagram of an RDF triple data model provided by the present invention;
FIG. 2 is a schematic diagram of XML document tree model parsing provided by the present invention;
fig. 3 is a schematic diagram of an XML document data repetitive element identification process provided by the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The method for automatically constructing RDF data based on XML data comprises the following three parts:
(1) extracting semantics using different methods for different types of XML data
As shown in fig. 2, aggregating elements with the same tag name for XML data without format limitation in a traversal manner, then sorting the aggregation classes to obtain abstract models corresponding to different aggregation classes, and constructing an RDF Schema according to corresponding mapping rules; for XMl data with XML Schema limitation, relevant classes and attributes are obtained through parsing the XML Schema, and an ontology RDF Schema is constructed according to the obtained classes and attributes.
(2) Screening XMl for duplicate data entities
As shown in fig. 3, traversal is performed on the repeated elements in the XML, unique codes are added to different elements according to equivalent element judgment conditions, the same codes are given to the repeated elements, and whether conversion processing is performed or not is determined by identifying element codes in the later conversion process of the repeated elements, so that redundancy and contradiction of RDF data are solved.
(3) Building RDF triples
Based on the analysis of the XML tree model, elements in the XML tree model are aggregated to form an aggregation class, the aggregation class is divided into three abstract models, corresponding mapping rules are established for different aggregation classes, and RDF triples corresponding to different elements in the XML are established according to the mapping rules corresponding to the aggregation abstract models to which the elements belong, so that the conversion from the XML to the RDF is completed.
The method comprises the following specific steps:
step S1, analyzing the tree model of the XML data document; clustering elements according to the names of initial labels, determining sub-models corresponding to the elements of the same type, and integrating the sub-models of all the elements of the same type to obtain an abstract model corresponding to the cluster; constructing a glossary RDF Schema according to the obtained abstract model; in particular, the amount of the solvent to be used,
for in XML data documentsClassifying into several classes En1,En2,L,Eni,L(ni∈Ne) An element eiCategorizing into corresponding aggregation class E according to the name of the start tagni:
f1:(e∈E)→En1,L EniL,(ni∈Ne)
Step S2, step EniOf all elements in modej∈MODniAbstract integration is carried out to obtain an abstract model smod corresponding to the aggregation classni=(ni,{CNni,ANni,vni}); wherein n isiIs of the polymerization class EniTag name of all elements in, CNniIs a set of names of sub-elements, ANniIs the name set of the attribute contained by the element's start tag, where vniIs a logical variable, vniPresence description elements inline content contains text values, specifically:
(1) when CNni=φ、ANni=φ、vniWhen present, polymerising class EniThe corresponding abstract model is smodni=(ni,vni) At this timeElement ejCorresponding submodel modejIs a simple submodel, i.e. modejE is S; wherein S represents a set of sub-models, at this time, the RDF triples are constructed as follows:
fpi:ni→pini,(ni∈Ne,pini∈PI)
Type(?pini,Property)
PropVal(range,?pini,?DateTypeIRI)
where PI represents the collection of attributes in the RDF vocabulary, DateTypeIRI represents the built-in data type, PIniRepresents niThe attribute mapped to, Type (ni,Property),PropVal(range,?piniIs it? DateTypeIRI) is an axiomatic expression of RDF triples.
(2) When v isniIn the absence of polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni}); the RDF triples are constructed at this point as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnj,L}→{pi1,pi2,L,pij,L}(cnj∈CNni,pij∈PI,j=1,2,L,n)
Type(?pij,Property)(j=1,2,L,n)
PropVal(domain,?pij,?cini)
fci:cnj→cicnj(cnj∈CNni,cicnj∈CI)
PropVal(range,?pij,?cicnj)
PropVal(range,?pij,?DateTypeIRI)
fpi:{an1,an2,L,ank,L}→{pi1,pi2,L,pik,L}(ank∈ANni,pik∈PI,k=1,2,L,n)
Type(?pik,Property)(k=1,2,L,n)
PropVal(domain,?pik,?cini)
PropVal(range,?pik,?DateTypeIRI)
where CI represents a collection of classes in the RDF vocabulary. Let cn assumej∈CNniThen based on cnjAn attribute pi can be constructedj。
(3) And polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni,vni}); based on n at this timei、CNni、ANniThe class and attribute rules in the RDF Schema are respectively constructed as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnm,L}→{pi1,pi2,L,pim,L}(cnm∈CNni,pim∈PI,m=1,2,L,n)
Type(?pim,Property)(m=1,2,L,n)
PropVal(domain,?pim,?cini)
fci:cnq→cicnq(cnq∈CNni,cicnq∈CI)
PropVal(range,?pim,?cicnm)
PropVal(range,?pim,?DateTypeIRI)
fpi:{an1,an2,L,anp,L}→{pi1,pi2,L,pip,L}(anp∈ANni,pip∈PI,p=1,2,L,n)
Type(?pip,Property)(p=1,2,L,n)
PropVal(domain,?pip,?cini)
PropVal(range,?pip,?DateTypeIRI)
Type(value,Property)
step S3, according to the mapping rule corresponding to the abstract model in the step S2, constructing a glossary RDF Schema of the current field as follows:
frdfs:{En1,L EniL}→RDF Schema(ni∈Ne)
wherein the set of all aggregated classes of the XML document is XSD ═ En1,L EniL}(ni∈Ne);
Step S4, recognizing the repetitive elements in the XML document data, specifically,
traversing all elements E and attributes A of the XML document, and attaching a unique ID; adjusting the IDs of the elements and the attributes in the current XML document to ensure that the IDs of the equivalent elements and the equivalent attributes are the same; traversing the tree model of the XML again by adopting a back root traversal method, adjusting from the bottom of the tree model to the root, identifying equivalent elements and equivalent attributes in the document, and adjusting the IDs of the equivalent elements and the equivalent attributes uniformly; the method comprises the following specific steps:
em∈CEei,en∈CEej,am∈EAei,an∈EAej
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
em∈CEei,en∈CEej,ap∈EAei,aq∈EAej,ei→vi,ej→vj
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
s5, after clustering the elements in the current XML document, mapping the XML document into an RDF triple sequence based on the step S2; according to step S4, traversing the XML tree model after the ID adjustment of the elements and attributes is completed, storing the mapped element ID set as OID, and constructing an RDF ternary sequence as follows:
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
t1=tv=(rei,rdf:value,vi)
PropVal(value,?rei,?vi)
{t1,t2,L}={tem|m=1,2,L}∪{tan|n=1,2,L}(piem→tem、pian→tan)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
fr:(nm,IDem)→rem,(nm∈Ne,rem∈R)
tan=(rei,pian,vn)
PropVal(?pian,?rei,?vi)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
tv=(rei,rdf:value,vi)
tv∈{t1,t2,L}
PropVal(value,?rei,?vi)。
the above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (1)
1. A method for automatically constructing RDF data based on XML data is characterized by comprising the following steps:
step S1, analyzing the tree model of the XML data document; clustering elements according to the names of initial labels, determining sub-models corresponding to the elements of the same type, and integrating the sub-models of all the elements of the same type to obtain an abstract model corresponding to the cluster; constructing a glossary RDF Schema according to the obtained abstract model; in particular, the amount of the solvent to be used,
for in XML data documentsClassifying and aggregating into several classes En1,En2,L,Eni,L(ni∈Ne) An element eiCategorizing into corresponding aggregation class E according to the name of the start tagni:
f1:(e∈E)→En1,L EniL,(ni∈Ne)
Step S2, submodel mod for all elementsej∈MODniAbstract integration is carried out to obtain an abstract model smod corresponding to the aggregation classni=(ni,{CNni,ANni,vni}); wherein n isiIs of the polymerization class EniTag name of all elements in, CNniIs a set of names of sub-elements, ANniIs the name set of the attribute contained by the element's start tag, where vniIs a logical variable, vniPresence description elements inline content contains text values, specifically:
(1) when CNni=φ、ANni=φ、vniWhen present, polymerising class EniThe corresponding abstract model is smodni=(ni,vni) At this timeElement ejCorresponding submodel modejIs a simple submodel, i.e. modejE is S; wherein S represents a set of sub-models, at this time, the RDF triples are constructed as follows:
fpi:ni→pini,(ni∈Ne,pini∈PI)
Type(?pini,Property)
PropVal(range,?pini,?DateTypeIRI)
where PI represents the collection of attributes in the RDF vocabulary, DateTypeIRI represents the built-in data type, PIniRepresents niThe attributes mapped to;
(2) when v isniIn the absence of polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni}); the RDF triples are constructed at this point as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnj,L}→{pi1,pi2,L,pij,L}(cnj∈CNni,pij∈PI,j=1,2,L,n)
Type(?pij,Property)(j=1,2,L,n)
PropVal(domain,?pij,?cini)
fci:cnj→cicnj(cnj∈CNni,cicnj∈CI)
PropVal(range,?pij,?cicnj)
PropVal(range,?pij,?DateTypeIRI)
fpi:{an1,an2,L,ank,L}→{pi1,pi2,L,pik,L}(ank∈ANni,pik∈PI,k=1,2,L,n)
Type(?pik,Property)(k=1,2,L,n)
PropVal(domain,?pik,?cini)
PropVal(range,?pik,?DateTypeIRI)
where CI represents a collection of classes in the RDF vocabulary. Let cn assumej∈CNniThen based on cnjAn attribute pi can be constructedj。
(3) And polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni,vni}); based on n at this timei、CNni、ANniThe class and attribute rules in the RDF Schema are respectively constructed as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnm,L}→{pi1,pi2,L,pim,L}(cnm∈CNni,pim∈PI,m=1,2,L,n)
Type(?pim,Property)(m=1,2,L,n)
PropVal(domain,?pim,?cini)
fci:cnq→cicnq(cnq∈CNni,cicnq∈CI)
PropVal(range,?pim,?cicnm)
PropVal(range,?pim,?DateTypeIRI)
fpi:{an1,an2,L,anp,L}→{pi1,pi2,L,pip,L}(anp∈ANni,pip∈PI,p=1,2,L,n)
Type(?pip,Property)(p=1,2,L,n)
PropVal(domain,?pip,?cini)
PropVal(range,?pip,?DateTypeIRI)
Type(value,Property)
step S3, according to the mapping rule corresponding to the abstract model in the step S2, constructing a glossary RDF Schema of the current field as follows:
frdfs:{En1,…Eni…}→RDF Schema(ni∈Ne)
wherein the set of all aggregated classes of the XML document is XSD ═ En1, … En … (n)i∈Ne);
Step S4, recognizing the repetitive elements in the XML document data, specifically,
traversing all elements E and attributes A of the XML document, and attaching a unique ID; adjusting the IDs of the elements and the attributes in the current XML document to ensure that the IDs of the equivalent elements and the equivalent attributes are the same; traversing the tree model of the XML again by adopting a back root traversal method, adjusting from the bottom of the tree model to the root, identifying equivalent elements and equivalent attributes in the document, and adjusting the IDs of the equivalent elements and the equivalent attributes uniformly; the method comprises the following specific steps:
em∈CEei,en∈CEej,am∈EAei,an∈EAej
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
em∈CEei,en∈CEej,ap∈EAei,aq∈EAej,ei→vi,ej→vj
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
s5, after clustering the elements in the current XML document, mapping the XML document into an RDF triple sequence based on the step S2; according to step S4, traversing the XML tree model after the ID adjustment of the elements and attributes is completed, storing the mapped element ID set as OID, and constructing an RDF ternary sequence as follows:
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
t1=tv=(rei,rdf:value,vi)
PropVal(value,?rei,?vi)
{t1,t2,L}={tem|m=1,2,L}∪{tan|n=1,2,L}(piem→tem、pian→tan)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
fr:(nm,IDem)→rem,(nm∈Ne,rem∈R)
tan=(rei,pian,vn)
PropVal(?pian,?rei,?vi)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
tv=(rei,rdf:value,vi)
tv∈{t1,t2,L}
PropVal(value,?rei,?vi)。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011445817.1A CN112559767A (en) | 2020-12-09 | 2020-12-09 | Method for automatically constructing RDF data based on XML data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011445817.1A CN112559767A (en) | 2020-12-09 | 2020-12-09 | Method for automatically constructing RDF data based on XML data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112559767A true CN112559767A (en) | 2021-03-26 |
Family
ID=75061145
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011445817.1A Pending CN112559767A (en) | 2020-12-09 | 2020-12-09 | Method for automatically constructing RDF data based on XML data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112559767A (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100306207A1 (en) * | 2009-05-27 | 2010-12-02 | Ibm Corporation | Method and system for transforming xml data to rdf data |
CN102629256A (en) * | 2012-02-29 | 2012-08-08 | 浙江工商大学 | Extensive markup language (XML) data information representation method of agricultural information ontology |
-
2020
- 2020-12-09 CN CN202011445817.1A patent/CN112559767A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20100306207A1 (en) * | 2009-05-27 | 2010-12-02 | Ibm Corporation | Method and system for transforming xml data to rdf data |
CN102629256A (en) * | 2012-02-29 | 2012-08-08 | 浙江工商大学 | Extensive markup language (XML) data information representation method of agricultural information ontology |
Non-Patent Citations (2)
Title |
---|
KLEIN, M: "XML, RDF, and relatives", 《IEEE INTELLIGENT SYSTEMS》, vol. 16, no. 2, pages 26 - 28 * |
吕玉连: "面向链接数据的链接分析与可视化系统", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 4, pages 1 - 55 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Learning ontology from relational database | |
Lausen et al. | WSML-a Language Framework for Semantic Web Services. | |
Rodríguez-Gianolli et al. | A semantic approach to XML-based data integration | |
US20110238610A1 (en) | System and method for efficient reasoning using view in dbms-based rdf triple store | |
CN112000725B (en) | Ontology fusion preprocessing method for multi-source heterogeneous resources | |
Cruz et al. | Semi-automatic ontology alignment for geospatial data integration | |
De Uña et al. | Machine learning and constraint programming for relational-to-ontology schema mapping | |
Khosravi et al. | Learning compact Markov logic networks with decision trees | |
Ma et al. | Modeling fuzzy data with XML: A survey | |
Tomaszuk et al. | Pgo: Describing property graphs in rdf | |
Ma et al. | Data modeling and querying with fuzzy sets: A systematic survey | |
Bouhali et al. | Exploiting RDF open data using NoSQL graph databases | |
Uzdanaviciute et al. | Ontology-based foundations for data integration | |
Sanprasit et al. | Intelligent approach to automated star-schema construction using a knowledge base | |
Euzenat et al. | Classifications of ontology matching techniques | |
CN112559767A (en) | Method for automatically constructing RDF data based on XML data | |
Shu | A practical approach to modelling and validating integrity constraints in the Semantic Web | |
Serbout et al. | How composable is the web? an empirical study on openapi data model compatibility | |
CN114881019A (en) | Data hybrid storage method and device for multi-modal network | |
Ma et al. | A fuzzy ontology generation framework from fuzzy relational databases | |
CN111767453A (en) | Query instruction generation method, device, equipment and storage medium based on semantic network | |
Liu et al. | Mapping XML to RDF: An algorithm based on element classification and aggregation | |
Damiani et al. | Knowledge extraction from an XML data flow: building a taxonomy based on clustering technique | |
Koutsomitropoulos et al. | Expressive reasoning about cultural heritage knowledge using Web ontologies | |
Samb et al. | Toward an Ontology of Pattern Mining over Data Streams |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |