CN112559767A - Method for automatically constructing RDF data based on XML data - Google Patents

Method for automatically constructing RDF data based on XML data Download PDF

Info

Publication number
CN112559767A
CN112559767A CN202011445817.1A CN202011445817A CN112559767A CN 112559767 A CN112559767 A CN 112559767A CN 202011445817 A CN202011445817 A CN 202011445817A CN 112559767 A CN112559767 A CN 112559767A
Authority
CN
China
Prior art keywords
rdf
elements
xml
data
propval
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011445817.1A
Other languages
Chinese (zh)
Inventor
刘玉春
马宗民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202011445817.1A priority Critical patent/CN112559767A/en
Publication of CN112559767A publication Critical patent/CN112559767A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/84Mapping; Conversion

Abstract

The invention discloses a method for automatically constructing RDF data based on XML data, which comprises the following steps of firstly, extracting semantics of different types of XML data; aggregating elements with the same label name for XML data without format limitation in a traversal mode, then sorting the aggregation classes to obtain abstract models corresponding to different aggregation classes, and constructing an RDF Schema according to corresponding mapping rules; for XML data limited by XML Schema, obtaining relevant classes and attributes through the analysis processing of the XML Schema, and constructing an RDF Schema body according to the obtained classes and attributes; then, screening repeated data entities for the element numbers in the XML, traversing the repeated elements in the XML, adding unique codes to different elements according to equivalent element judgment conditions, and giving the same codes to the repeated elements; finally, corresponding mapping rules are constructed for different aggregation classes, and RDF triples corresponding to the elements are constructed; the method realizes the purpose of converting the RDF data into the RDF data, and has higher universality.

Description

Method for automatically constructing RDF data based on XML data
Technical Field
The invention relates to the technical field of knowledge maps, in particular to a method for automatically constructing RDF data based on XML data.
Background
The development of the world wide web technology changes the development process of the human society, almost all aspects of human life exist today, and the revolution of the world wide web technology drives the progress of the human society. The semantic web technology has been a great progress since its birth as one of the directions of such revolution. The semantic WEB technology adopts a representation method which is easier to be understood by a machine to describe data information on the WEB, so that a computer can process data more intelligently. RDF is a data model for describing the relationship between objects (resources), and the data model is used as a data description model to endow semantics to the data, and the semantic data can realize logic reasoning in a semantic network, so that the network application is more intelligent. RDF (resource description frame) is composed of a series of statements, namely "object-attribute-value" triples. RDF is domain independent, and a user can use the RDF Schema to define terms used in a certain domain, can use the terms as a vocabulary description language for describing classes and attributes, and can describe the hierarchical semantics related to the classes and attributes.
XML is a document markup language, which can effectively describe the interrelation between data through the user-defined tags and the nesting relationship between the tags, and as a standard format suitable for describing network semi-structured data, XML has been developed as a main medium for data representation and data exchange in the information field and has been applied in many fields. XML provides support in grammar for data construction through tag nesting and self-defining, but semantics hidden in data can only be understood through manual analysis, and the purpose of processing data in an intelligent agent mode depicted by a semantic network cannot be achieved, so that data described based on XML needs to be converted, semantics between related data and data are described through an RDF data model, and the converted data can reach a data construction standard required by the semantic network.
The invention starts from the structure and the content of the XML and extracts the implicit semantics in the data. In order to unify the structure of the XML data in a specific field, dtd (document type Definition) or XML Schema Definition is generally used to specify the elements and attributes used in the XML document and the organization of the data. Most XML documents are also DTD or XML Schema free. The invention focuses on various different types of XML documents and realizes a universal conversion method, and because DTD is gradually replaced by XML Schema, the invention does not discuss DTD.
The invention is based on XML Schema XML document, through the XML Schema analysis to obtain XML document structure information, mainly element and attribute mutual nesting relation; for XML documents without structure specification, the mutual nesting relation of elements and attributes is obtained by directly analyzing the XML documents to classify and aggregate the elements and attributes in the XML documents. No matter which type of XML is used, the obtained nested relation of the elements and the attributes is classified and defined, and a mapping rule which is converted into corresponding RDF domain vocabularies (classes and attributes) is constructed, namely a conceptual model-ontology of the related domain is obtained, and an RDF Schema is used as a description language of the ontology in the invention. The ontology is the basis of logical reasoning, and the building of the ontology conforming to the semantics contained in the source data is the basis of building RDF built based on XML. In many existing conversion methods, only the conversion method of XML with structural description (i.e. XML Schema) is considered, and there are some cases that the semantics are unreasonable in the process of constructing domain vocabulary (RDF Schema), and some semantics which are helpful for the conversion process but not in the source data are artificially added. The invention constructs a set of ontology vocabulary which accords with the semantics of source data (XML documents) based on the structure of XML and the contents of elements and attributes.
And converting the data in the XML instance into RDF instance data based on the constructed ontology, namely the RDF Schema, wherein the RDF describes the relationship between the entities and the attribute values, the entities are instances corresponding to the classes contained in the RDF Schema, and the relationship is the attribute specified in the relationships. XML is a semi-structured data in which elements and attribute tags appear repeatedly, especially in large-scale documents. Through analysis of related XML data documents, the embedded content of a part of tags in the same element tags and attribute tags can repeatedly appear in the XML documents, and the other part of tags is different, the repeatedly appearing data is only expressed and stored once in other data models (such as relational databases and RDFs), if the data is used in the same document for multiple times, a reference mechanism is used, but the model of the XML expressing the data in a nested relation does not have the reference mechanism. Therefore, if the data with the same label is not identified and screened in the process of converting the XML into the RDF, the constructed RDF data can have data redundancy and even data contradiction, and further processing (query and ontology-based reasoning) on the constructed RDF data is affected. Aiming at the situation, the invention provides an identification and screening mechanism in the construction process of RDF data, and ensures the effectiveness and completeness of the constructed data.
Disclosure of Invention
The purpose of the invention is as follows: the invention provides a method for automatically constructing RDF data based on XML data. The method achieves the purpose of converting XML data into RDF type data on the basis of processing various types of XML data, has higher universality, eliminates the redundancy and the contradiction of the constructed RDF data by identifying the repeated elements in the XML, constructs the data which is more suitable for knowledge engineering, and has good effect.
The technical scheme is as follows: in order to achieve the purpose, the invention adopts the technical scheme that:
a method for automatically constructing RDF data based on XML data comprises the following steps:
step S1, analyzing the tree model of the XML data document; clustering elements according to the names of initial labels, determining sub-models corresponding to the elements of the same type, and integrating the sub-models of all the elements of the same type to obtain an abstract model corresponding to the cluster; constructing a glossary RDF Schema according to the obtained abstract model; in particular, the amount of the solvent to be used,
for in XML data documents
Figure BDA0002824525230000031
Classifying into several classes En1,En2,L,Eni,L(ni∈Ne) An element eiCategorizing into corresponding aggregation class E according to the name of the start tagni
f1:(e∈E)→En1,L EniL,(ni∈Ne)
Step S2, step EniOf all elements in modej∈MODniAbstract integration is carried out to obtain an abstract model smod corresponding to the aggregation classni=(ni,{CNni,ANni,vni}); wherein n isiIs of the polymerization class EniTag name of all elements in, CNniIs a set of names of sub-elements, ANniIs the name set of the attribute contained by the element's start tag, where vniIs a logical variable, vniPresence description elements inline content contains text values, specifically:
(1) when CNni=φ、ANni=φ、vniWhen present, polymerising class EniThe corresponding abstract model is smodni=(ni,vni) At this time
Figure BDA0002824525230000032
Element ejCorresponding submodel modejIs a simple submodel, i.e. modejE is S; wherein S represents a set of sub-models, at this time, the RDF triples are constructed as follows:
fpi:ni→pini,(ni∈Ne,pini∈PI)
Type(?pini,Property)
PropVal(range,?pini,?DateTypeIRI)
where PI represents the collection of attributes in the RDF vocabulary, DateTypeIRI represents the built-in data type, PIniRepresents niThe attributes mapped to;
(2) when v isniIn the absence of polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni}); the RDF triples are constructed at this point as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnj,L}→{pi1,pi2,L,pij,L}(cnj∈CNni,pij∈PI,j=1,2,L,n)
Type(?pij,Property)(j=1,2,L,n)
PropVal(domain,?pij,?cini)
fci:cnj→cicnj(cnj∈CNni,cicnj∈CI)
PropVal(range,?pij,?cicnj)
PropVal(range,?pij,?DateTypeIRI)
fpi:{an1,an2,L,ank,L}→{pi1,pi2,L,pik,L}(ank∈ANni,pik∈PI,k=1,2,L,n)
Type(?pik,Property)(k=1,2,L,n)
PropVal(domain,?pik,?cini)
PropVal(range,?pik,?DateTypeIRI)
where CI represents a collection of classes in the RDF vocabulary. Let cn assumej∈CNniThen based on cnjAn attribute pi can be constructedj
(3) And polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni,vni}); based on n at this timei、CNni、ANniThe class and attribute rules in the RDF Schema are respectively constructed as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnm,L}→{pi1,pi2,L,pim,L}(cnm∈CNni,pim∈PI,m=1,2,L,n)
Type(?pim,Property)(m=1,2,L,n)
PropVal(domain,?pim,?cini)
fci:cnq→cicnq(cnq∈CNni,cicnq∈CI)
PropVal(range,?pim,?cicnm)
PropVal(range,?pim,?DateTypeIRI)
fpi:{an1,an2,L,anp,L}→{pi1,pi2,L,pip,L}(anp∈ANni,pip∈PI,p=1,2,L,n)
Type(?pip,Property)(p=1,2,L,n)
PropVal(domain,?pip,?cini)
PropVal(range,?pip,?DateTypeIRI)
Type(value,Property)
step S3, according to the mapping rule corresponding to the abstract model in the step S2, constructing a glossary RDF Schema of the current field as follows:
frdfs:{En1,L EniL}→RDF Schema(ni∈Ne)
wherein the set of all aggregated classes of the XML document is XSD ═ En1,L EniL}(ni∈Ne);
Step S4, recognizing the repetitive elements in the XML document data, specifically,
traversing all elements E and attributes A of the XML document, and attaching a unique ID; adjusting the IDs of the elements and the attributes in the current XML document to ensure that the IDs of the equivalent elements and the equivalent attributes are the same; traversing the tree model of the XML again by adopting a back root traversal method, adjusting from the bottom of the tree model to the root, identifying equivalent elements and equivalent attributes in the document, and adjusting the IDs of the equivalent elements and the equivalent attributes uniformly; the method comprises the following specific steps:
Figure BDA0002824525230000051
Figure BDA0002824525230000052
em∈CEei,en∈CEej,am∈EAei,an∈EAej
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
Figure BDA0002824525230000053
em∈CEei,en∈CEej,ap∈EAei,aq∈EAej,ei→vi,ej→vj
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
Figure BDA0002824525230000054
s5, after clustering the elements in the current XML document, mapping the XML document into an RDF triple sequence based on the step S2; according to step S4, traversing the XML tree model after the ID adjustment of the elements and attributes is completed, storing the mapped element ID set as OID, and constructing an RDF ternary sequence as follows:
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
t1=tv=(rei,rdf:value,vi)
PropVal(value,?rei,?vi)
{t1,t2,L}={tem|m=1,2,L}∪{tan|n=1,2,L}(piem→tem、pian→tan)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
fr:(nm,IDem)→rem,(nm∈Ne,rem∈R)
Figure BDA0002824525230000061
Figure BDA0002824525230000062
tan=(rei,pian,vn)
PropVal(?pian,?rei,?vi)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
tv=(rei,rdf:value,vi)
tv∈{t1,t2,L}
PropVal(value,?rei,?vi)。
has the advantages that: the invention has the following advantages:
(1) the invention has good effect on the conversion construction of the XML data with larger scale, and is different from the prior scheme which only aims at the XML data with smaller scale.
(2) The invention optimizes the mapping rule, solves the problem of semantic accuracy in the existing method, provides a corresponding conversion scheme aiming at different types of XML data, and provides a more uniform scheme for constructing RDF data based on XML.
(3) The invention greatly reduces the redundancy of RDF data obtained by subsequent conversion by identifying repeated data entities in XML data, and is different from the prior method in that the prior method generally processes the redundancy condition of the RDF data.
Drawings
FIG. 1 is a schematic diagram of an RDF triple data model provided by the present invention;
FIG. 2 is a schematic diagram of XML document tree model parsing provided by the present invention;
fig. 3 is a schematic diagram of an XML document data repetitive element identification process provided by the present invention.
Detailed Description
The present invention will be further described with reference to the accompanying drawings.
The method for automatically constructing RDF data based on XML data comprises the following three parts:
(1) extracting semantics using different methods for different types of XML data
As shown in fig. 2, aggregating elements with the same tag name for XML data without format limitation in a traversal manner, then sorting the aggregation classes to obtain abstract models corresponding to different aggregation classes, and constructing an RDF Schema according to corresponding mapping rules; for XMl data with XML Schema limitation, relevant classes and attributes are obtained through parsing the XML Schema, and an ontology RDF Schema is constructed according to the obtained classes and attributes.
(2) Screening XMl for duplicate data entities
As shown in fig. 3, traversal is performed on the repeated elements in the XML, unique codes are added to different elements according to equivalent element judgment conditions, the same codes are given to the repeated elements, and whether conversion processing is performed or not is determined by identifying element codes in the later conversion process of the repeated elements, so that redundancy and contradiction of RDF data are solved.
(3) Building RDF triples
Based on the analysis of the XML tree model, elements in the XML tree model are aggregated to form an aggregation class, the aggregation class is divided into three abstract models, corresponding mapping rules are established for different aggregation classes, and RDF triples corresponding to different elements in the XML are established according to the mapping rules corresponding to the aggregation abstract models to which the elements belong, so that the conversion from the XML to the RDF is completed.
The method comprises the following specific steps:
step S1, analyzing the tree model of the XML data document; clustering elements according to the names of initial labels, determining sub-models corresponding to the elements of the same type, and integrating the sub-models of all the elements of the same type to obtain an abstract model corresponding to the cluster; constructing a glossary RDF Schema according to the obtained abstract model; in particular, the amount of the solvent to be used,
for in XML data documents
Figure BDA0002824525230000071
Classifying into several classes En1,En2,L,Eni,L(ni∈Ne) An element eiCategorizing into corresponding aggregation class E according to the name of the start tagni
f1:(e∈E)→En1,L EniL,(ni∈Ne)
Step S2, step EniOf all elements in modej∈MODniAbstract integration is carried out to obtain an abstract model smod corresponding to the aggregation classni=(ni,{CNni,ANni,vni}); wherein n isiIs of the polymerization class EniTag name of all elements in, CNniIs a set of names of sub-elements, ANniIs the name set of the attribute contained by the element's start tag, where vniIs a logical variable, vniPresence description elements inline content contains text values, specifically:
(1) when CNni=φ、ANni=φ、vniWhen present, polymerising class EniThe corresponding abstract model is smodni=(ni,vni) At this time
Figure BDA0002824525230000081
Element ejCorresponding submodel modejIs a simple submodel, i.e. modejE is S; wherein S represents a set of sub-models, at this time, the RDF triples are constructed as follows:
fpi:ni→pini,(ni∈Ne,pini∈PI)
Type(?pini,Property)
PropVal(range,?pini,?DateTypeIRI)
where PI represents the collection of attributes in the RDF vocabulary, DateTypeIRI represents the built-in data type, PIniRepresents niThe attribute mapped to, Type (ni,Property),PropVal(range,?piniIs it? DateTypeIRI) is an axiomatic expression of RDF triples.
(2) When v isniIn the absence of polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni}); the RDF triples are constructed at this point as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnj,L}→{pi1,pi2,L,pij,L}(cnj∈CNni,pij∈PI,j=1,2,L,n)
Type(?pij,Property)(j=1,2,L,n)
PropVal(domain,?pij,?cini)
fci:cnj→cicnj(cnj∈CNni,cicnj∈CI)
PropVal(range,?pij,?cicnj)
PropVal(range,?pij,?DateTypeIRI)
fpi:{an1,an2,L,ank,L}→{pi1,pi2,L,pik,L}(ank∈ANni,pik∈PI,k=1,2,L,n)
Type(?pik,Property)(k=1,2,L,n)
PropVal(domain,?pik,?cini)
PropVal(range,?pik,?DateTypeIRI)
where CI represents a collection of classes in the RDF vocabulary. Let cn assumej∈CNniThen based on cnjAn attribute pi can be constructedj
(3) And polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni,vni}); based on n at this timei、CNni、ANniThe class and attribute rules in the RDF Schema are respectively constructed as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnm,L}→{pi1,pi2,L,pim,L}(cnm∈CNni,pim∈PI,m=1,2,L,n)
Type(?pim,Property)(m=1,2,L,n)
PropVal(domain,?pim,?cini)
fci:cnq→cicnq(cnq∈CNni,cicnq∈CI)
PropVal(range,?pim,?cicnm)
PropVal(range,?pim,?DateTypeIRI)
fpi:{an1,an2,L,anp,L}→{pi1,pi2,L,pip,L}(anp∈ANni,pip∈PI,p=1,2,L,n)
Type(?pip,Property)(p=1,2,L,n)
PropVal(domain,?pip,?cini)
PropVal(range,?pip,?DateTypeIRI)
Type(value,Property)
step S3, according to the mapping rule corresponding to the abstract model in the step S2, constructing a glossary RDF Schema of the current field as follows:
frdfs:{En1,L EniL}→RDF Schema(ni∈Ne)
wherein the set of all aggregated classes of the XML document is XSD ═ En1,L EniL}(ni∈Ne);
Step S4, recognizing the repetitive elements in the XML document data, specifically,
traversing all elements E and attributes A of the XML document, and attaching a unique ID; adjusting the IDs of the elements and the attributes in the current XML document to ensure that the IDs of the equivalent elements and the equivalent attributes are the same; traversing the tree model of the XML again by adopting a back root traversal method, adjusting from the bottom of the tree model to the root, identifying equivalent elements and equivalent attributes in the document, and adjusting the IDs of the equivalent elements and the equivalent attributes uniformly; the method comprises the following specific steps:
Figure BDA0002824525230000101
Figure BDA0002824525230000102
em∈CEei,en∈CEej,am∈EAei,an∈EAej
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
Figure BDA0002824525230000103
em∈CEei,en∈CEej,ap∈EAei,aq∈EAej,ei→vi,ej→vj
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
Figure BDA0002824525230000104
s5, after clustering the elements in the current XML document, mapping the XML document into an RDF triple sequence based on the step S2; according to step S4, traversing the XML tree model after the ID adjustment of the elements and attributes is completed, storing the mapped element ID set as OID, and constructing an RDF ternary sequence as follows:
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
t1=tv=(rei,rdf:value,vi)
PropVal(value,?rei,?vi)
{t1,t2,L}={tem|m=1,2,L}∪{tan|n=1,2,L}(piem→tem、pian→tan)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
fr:(nm,IDem)→rem,(nm∈Ne,rem∈R)
Figure BDA0002824525230000111
Figure BDA0002824525230000112
tan=(rei,pian,vn)
PropVal(?pian,?rei,?vi)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
tv=(rei,rdf:value,vi)
tv∈{t1,t2,L}
PropVal(value,?rei,?vi)。
the above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.

Claims (1)

1. A method for automatically constructing RDF data based on XML data is characterized by comprising the following steps:
step S1, analyzing the tree model of the XML data document; clustering elements according to the names of initial labels, determining sub-models corresponding to the elements of the same type, and integrating the sub-models of all the elements of the same type to obtain an abstract model corresponding to the cluster; constructing a glossary RDF Schema according to the obtained abstract model; in particular, the amount of the solvent to be used,
for in XML data documents
Figure FDA0002824525220000011
Classifying and aggregating into several classes En1,En2,L,Eni,L(ni∈Ne) An element eiCategorizing into corresponding aggregation class E according to the name of the start tagni
f1:(e∈E)→En1,L EniL,(ni∈Ne)
Step S2, submodel mod for all elementsej∈MODniAbstract integration is carried out to obtain an abstract model smod corresponding to the aggregation classni=(ni,{CNni,ANni,vni}); wherein n isiIs of the polymerization class EniTag name of all elements in, CNniIs a set of names of sub-elements, ANniIs the name set of the attribute contained by the element's start tag, where vniIs a logical variable, vniPresence description elements inline content contains text values, specifically:
(1) when CNni=φ、ANni=φ、vniWhen present, polymerising class EniThe corresponding abstract model is smodni=(ni,vni) At this time
Figure FDA0002824525220000012
Element ejCorresponding submodel modejIs a simple submodel, i.e. modejE is S; wherein S represents a set of sub-models, at this time, the RDF triples are constructed as follows:
fpi:ni→pini,(ni∈Ne,pini∈PI)
Type(?pini,Property)
PropVal(range,?pini,?DateTypeIRI)
where PI represents the collection of attributes in the RDF vocabulary, DateTypeIRI represents the built-in data type, PIniRepresents niThe attributes mapped to;
(2) when v isniIn the absence of polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni}); the RDF triples are constructed at this point as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnj,L}→{pi1,pi2,L,pij,L}(cnj∈CNni,pij∈PI,j=1,2,L,n)
Type(?pij,Property)(j=1,2,L,n)
PropVal(domain,?pij,?cini)
fci:cnj→cicnj(cnj∈CNni,cicnj∈CI)
PropVal(range,?pij,?cicnj)
PropVal(range,?pij,?DateTypeIRI)
fpi:{an1,an2,L,ank,L}→{pi1,pi2,L,pik,L}(ank∈ANni,pik∈PI,k=1,2,L,n)
Type(?pik,Property)(k=1,2,L,n)
PropVal(domain,?pik,?cini)
PropVal(range,?pik,?DateTypeIRI)
where CI represents a collection of classes in the RDF vocabulary. Let cn assumej∈CNniThen based on cnjAn attribute pi can be constructedj
(3) And polymerization class EniThe corresponding abstract model is smodni=(ni,{CNni,ANni,vni}); based on n at this timei、CNni、ANniThe class and attribute rules in the RDF Schema are respectively constructed as follows:
fci:ni→cini,(ni∈Ne,cini∈CI)
Type(?cini,Class)
fpi:{cn1,cn2,L,cnm,L}→{pi1,pi2,L,pim,L}(cnm∈CNni,pim∈PI,m=1,2,L,n)
Type(?pim,Property)(m=1,2,L,n)
PropVal(domain,?pim,?cini)
fci:cnq→cicnq(cnq∈CNni,cicnq∈CI)
PropVal(range,?pim,?cicnm)
PropVal(range,?pim,?DateTypeIRI)
fpi:{an1,an2,L,anp,L}→{pi1,pi2,L,pip,L}(anp∈ANni,pip∈PI,p=1,2,L,n)
Type(?pip,Property)(p=1,2,L,n)
PropVal(domain,?pip,?cini)
PropVal(range,?pip,?DateTypeIRI)
Type(value,Property)
step S3, according to the mapping rule corresponding to the abstract model in the step S2, constructing a glossary RDF Schema of the current field as follows:
frdfs:{En1,…Eni…}→RDF Schema(ni∈Ne)
wherein the set of all aggregated classes of the XML document is XSD ═ En1, … En … (n)i∈Ne);
Step S4, recognizing the repetitive elements in the XML document data, specifically,
traversing all elements E and attributes A of the XML document, and attaching a unique ID; adjusting the IDs of the elements and the attributes in the current XML document to ensure that the IDs of the equivalent elements and the equivalent attributes are the same; traversing the tree model of the XML again by adopting a back root traversal method, adjusting from the bottom of the tree model to the root, identifying equivalent elements and equivalent attributes in the document, and adjusting the IDs of the equivalent elements and the equivalent attributes uniformly; the method comprises the following specific steps:
Figure FDA0002824525220000031
Figure FDA0002824525220000032
em∈CEei,en∈CEej,am∈EAei,an∈EAej
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
Figure FDA0002824525220000033
em∈CEei,en∈CEej,ap∈EAei,aq∈EAej,ei→vi,ej→vj
CLei={IDe1,IDe2,L IDem,L,IDa1,IDa2,L IDap,L}
CLej={IDe1,IDe2,L IDen,L,IDa1,IDa2,L IDaq,L}
Figure FDA0002824525220000034
s5, after clustering the elements in the current XML document, mapping the XML document into an RDF triple sequence based on the step S2; according to step S4, traversing the XML tree model after the ID adjustment of the elements and attributes is completed, storing the mapped element ID set as OID, and constructing an RDF ternary sequence as follows:
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
t1=tv=(rei,rdf:value,vi)
PropVal(value,?rei,?vi)
{t1,t2,L}={tem|m=1,2,L}∪{tan|n=1,2,L}(piem→tem、pian→tan)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
fr:(nm,IDem)→rem,(nm∈Ne,rem∈R)
Figure FDA0002824525220000041
Figure FDA0002824525220000042
tan=(rei,pian,vn)
PropVal(?pian,?rei,?vi)
fr:(ni,IDei)→rei,(ni∈Ne,rei∈R)
tv=(rei,rdf:value,vi)
tv∈{t1,t2,L}
PropVal(value,?rei,?vi)。
CN202011445817.1A 2020-12-09 2020-12-09 Method for automatically constructing RDF data based on XML data Pending CN112559767A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011445817.1A CN112559767A (en) 2020-12-09 2020-12-09 Method for automatically constructing RDF data based on XML data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011445817.1A CN112559767A (en) 2020-12-09 2020-12-09 Method for automatically constructing RDF data based on XML data

Publications (1)

Publication Number Publication Date
CN112559767A true CN112559767A (en) 2021-03-26

Family

ID=75061145

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011445817.1A Pending CN112559767A (en) 2020-12-09 2020-12-09 Method for automatically constructing RDF data based on XML data

Country Status (1)

Country Link
CN (1) CN112559767A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306207A1 (en) * 2009-05-27 2010-12-02 Ibm Corporation Method and system for transforming xml data to rdf data
CN102629256A (en) * 2012-02-29 2012-08-08 浙江工商大学 Extensive markup language (XML) data information representation method of agricultural information ontology

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100306207A1 (en) * 2009-05-27 2010-12-02 Ibm Corporation Method and system for transforming xml data to rdf data
CN102629256A (en) * 2012-02-29 2012-08-08 浙江工商大学 Extensive markup language (XML) data information representation method of agricultural information ontology

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
KLEIN, M: "XML, RDF, and relatives", 《IEEE INTELLIGENT SYSTEMS》, vol. 16, no. 2, pages 26 - 28 *
吕玉连: "面向链接数据的链接分析与可视化系统", 《中国优秀硕士学位论文全文数据库信息科技辑》, no. 4, pages 1 - 55 *

Similar Documents

Publication Publication Date Title
Li et al. Learning ontology from relational database
Lausen et al. WSML-a Language Framework for Semantic Web Services.
Rodríguez-Gianolli et al. A semantic approach to XML-based data integration
US20110238610A1 (en) System and method for efficient reasoning using view in dbms-based rdf triple store
CN112000725B (en) Ontology fusion preprocessing method for multi-source heterogeneous resources
Cruz et al. Semi-automatic ontology alignment for geospatial data integration
De Uña et al. Machine learning and constraint programming for relational-to-ontology schema mapping
Khosravi et al. Learning compact Markov logic networks with decision trees
Ma et al. Modeling fuzzy data with XML: A survey
Tomaszuk et al. Pgo: Describing property graphs in rdf
Ma et al. Data modeling and querying with fuzzy sets: A systematic survey
Bouhali et al. Exploiting RDF open data using NoSQL graph databases
Uzdanaviciute et al. Ontology-based foundations for data integration
Sanprasit et al. Intelligent approach to automated star-schema construction using a knowledge base
Euzenat et al. Classifications of ontology matching techniques
CN112559767A (en) Method for automatically constructing RDF data based on XML data
Shu A practical approach to modelling and validating integrity constraints in the Semantic Web
Serbout et al. How composable is the web? an empirical study on openapi data model compatibility
CN114881019A (en) Data hybrid storage method and device for multi-modal network
Ma et al. A fuzzy ontology generation framework from fuzzy relational databases
CN111767453A (en) Query instruction generation method, device, equipment and storage medium based on semantic network
Liu et al. Mapping XML to RDF: An algorithm based on element classification and aggregation
Damiani et al. Knowledge extraction from an XML data flow: building a taxonomy based on clustering technique
Koutsomitropoulos et al. Expressive reasoning about cultural heritage knowledge using Web ontologies
Samb et al. Toward an Ontology of Pattern Mining over Data Streams

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination