WO2014069983A2 - A system and method for distributed querying of linked semantic webs - Google Patents

A system and method for distributed querying of linked semantic webs Download PDF

Info

Publication number
WO2014069983A2
WO2014069983A2 PCT/MY2013/000177 MY2013000177W WO2014069983A2 WO 2014069983 A2 WO2014069983 A2 WO 2014069983A2 MY 2013000177 W MY2013000177 W MY 2013000177W WO 2014069983 A2 WO2014069983 A2 WO 2014069983A2
Authority
WO
WIPO (PCT)
Prior art keywords
queries
sub
query
index
ontologies
Prior art date
Application number
PCT/MY2013/000177
Other languages
French (fr)
Other versions
WO2014069983A3 (en
Inventor
Weng Onn Kow
Anand Sadanandan ARUN
Lukose Dickson
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2014069983A2 publication Critical patent/WO2014069983A2/en
Publication of WO2014069983A3 publication Critical patent/WO2014069983A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/81Indexing, e.g. XML tags; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/80Information retrieval; Database structures therefor; File system structures therefor of semi-structured data, e.g. markup language structured data such as SGML, XML or HTML
    • G06F16/83Querying
    • G06F16/835Query processing
    • G06F16/8365Query optimisation

Definitions

  • the present invention relates to a system and method for distributed querying of linked semantic webs.
  • each of the ontologies is queried separately and the results are therefore not aggregated, and the user also needs to know intimate details of each ontology in order to make a query.
  • the user needs to at least know: (i) which ontology might have the knowledge; (ii) how to query the ontology, that is, whether it has an endpoint; what the URL for Malaysia is in the ontology; and the property that is used to represent population.
  • United States Patent No. 5,600,329 describes a database system that provides independence between the query and physical structure of the database tables by captioning each database table with a partial query reflecting the contents of that table.
  • the partial query is a query that if applied to a larger database of a standard configuration would produce the data of the table.
  • Relevant tables for a particular query may be identified by piecing together the partial queries until the user query is obtained.
  • the database system may be integrated with an optimizer by comparing each of the identified tables against the others for the amount of overlap their sub-queries have with the user query and the cost of accessing the table and then repeating this process as the tables are joined in various combinations.
  • join processing and grouping techniques have been proposed to minimize the number of remote requests required, and to develop an effective solution for source selection in the absence of pre-processed metadata.
  • frameworks have been proposed that enable SPARQL query processing on heterogeneous, virtually integrated Linked Data sources.
  • the present invention advantageously provides a way for a user to perform SPARQL queries on a set of linked ontologies without needing to know the names of the ontologies, the location at which they are stored, or how they are internally structured.
  • the present invention relates to a system and method for distributed querying of linked semantic webs.
  • One aspect of the present invention provides a system (100) for distributed querying of linked semantic webs (110) comprising at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to the LOD ontologies; at least one concept index (130) comprising concepts and corresponding URIs; at least one relation index (140) comprising relations and corresponding URIs; at least one query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with the LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from the query interface (160).
  • the distributed query engine (150) is adapted to parse and rewrite queries received from the query interface (160) and generate a plurality of sub-queries; identify dependencies within the sub-queries and chunk sub-queries based on ontology; execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of the sub-queries.
  • the invention provides a system (100) wherein the metadata included in the LOD ontologies index comprises one or more of namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint.
  • the metadata included in the LOD ontologies index is in the form of a database table, knowledge base and/or text file.
  • the invention provides a system wherein the concept index and the relation index are incorporated into a single index.
  • the invention provides a system wherein the concept index includes classes and instances.
  • the invention provides a method (200) for distributed querying of linked semantic webs (110) comprising receiving an initial query from a user (210); parsing the query (220) and replacing generic terms (230); breaking the queries into sub-queries and chunking the sub-queries (240); executing the sub-queries (250) based on ontology; and merging the results obtained (260) to determine whether an answer is reached (270 and, if so, returning the answer (280) to the user.
  • the method for breaking the queries into sub-queries and chunking the sub-queries (240) further comprises steps of selecting a clause in a query (241); determining the ontology of terms (242) and variables in the query clause (243); determining whether a variable is dependent (244); if the variable is not dependent, identifying any other clauses querying the same ontology (245) and grouping the clauses into a sub-query (246) or, if there are no other clauses querying " the same ontology, establishing a new sub-query (247); if the variable is dependent, determining whether the dependent clause queries the same ontology (248) and, if the dependent clause does query the same ontology, grouping it into a sub-query (246) and, if not, sequencing it as a sub-query after the dependent clause (249).
  • the invention provides a method wherein, if an answer is not reached, the steps of the method are repeated, other than the step of receiving the initial query.
  • the invention provides a method wherein the step of parsing the query (220) and replacing generic terms (230) comprises checking concepts and/or relations and replacing generic terms with their actual URIs.
  • the invention provides a method wherein the process of replacing generic terms (230) comprises determining whether a term is generic or not (232); if the term is generic, determining whether or not the term is a concept or relation (234),searching a concept index (235) or a relation index (236); and replacing the term with its actual URI (237) reiterating the steps until all generic terms are replaced.
  • a method comprising repeating the steps of the immediately preceding paragraph until all clauses are included in the chunked sub-queries.
  • FIG. 1 illustrates the top level architecture of an embodiment of the invention.
  • FIG. 2 illustrates a flowchart for a querying process according to an embodiment of the invention.
  • FIG. 3 illustrates a replace generic terms flowchart of an embodiment of the invention.
  • FIG. 4 illustrates a sub-query chunking flowchart of an embodiment of the invention.
  • Table 1 shows an example of a concept index (130) with three concepts in it.
  • Table 2 shows an example of a relations index (140) with three relations in it.
  • the present invention provides a system and method for distributed querying of linked semantic webs.
  • the system (100) includes a number of modules, each of which will be discussed below.
  • the system (100) includes a LOD metadata module (120) that is provided with an index of ontologies that are included within the LOD.
  • the metadata may include, but is not limited to, namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint.
  • the metadata may be provided in any suitable form, for example a database table, knowledge base, text file and so on.
  • a concept index (130) is also provided that includes an index of concepts, such as classes and instances, and their actual uniform resource identifier (URI) details. Each unique concept included in the index can appear in multiple ontologies with different URIs. Table 1 shows an example of a concept index (130) with three concepts in it.
  • the system (100) includes a relation index (140) that includes an index of relations and their actual URIs. Again, each of the unique relations can appear in multiple ontologies with different URIs. Table 2 shows an example of a relations index (140) with three relations in it.
  • the system (100) includes a distributed query engine (150) which is adapted to receive a query from a user at a query interface (160) and breaking the query down to sub-queries. The sub-queries may be parallel or sequential, based on dependencies of the sub-queries.
  • the distributed query engine (150) searches the LOD metadata module (120) for all ontologies that may be able to provide answers for each sub-query. This search may, for example, include semantic matching of query terms to the properties and concepts in each of the ontologies. If there are a number of relevant ontologies for a particular sub-query, then that sub-query is sent to all of the matching ontologies. The distributed query engine (150) then merges the answers received for each of the sub-queries and forms a final answer to the query.
  • FIG. 2 A flowchart illustrating the querying process (200) employed by the distributed query engine (150) is provided in Figure 2.
  • an initial query is received (210) by the distributed query engine (150).
  • the query is then parsed (220) and generic terms replaced (230), as described in more detail below with reference to Figure 3.
  • the distributed query engine (150) checks the concept index (130) and relation index (140) and replaces generic terms with their actual URIs.
  • the queries are chunked into sub-queries (240), as discussed above and described in more detail below with reference to Figure 4, and the sub-queries executed (250).
  • the results of the sub-queries are merged (260) to determine whether an answer is reached (270 and, if so, the answer is returned (280) to the user. If an answer is not reached, the process may be repeated.
  • a term enters the process (231) and it is determined whether the term is generic or not (232). If the term is not generic, it is added to the queries (233). If the term is identified as being generic, it is determined whether or not the term is a concept (234). If the term is considered a concept, the concept index is searched (235) and, if not, the relation index is searched (236). Once searching is complete, the term is replaced in the query with its actual URl (237). The process then identifies any further terms requiring consideration (238) or ends to provide a list of queries. As an example, the query "What is the population of Malaysia?" can be written as a generic SPARQL query, such as:
  • the methodology of the invention identifies the generic concept, Malaysia, and searches the concept index (130). By replacing the concept Malaysia with URIs from the concept index (130), a total of three possible queries are formed:
  • the methodology of embodiments of the invention may attempt to execute all possible query combinations in parallel. However, in some instances there are dependencies between SPARQL clauses. In such cases, the SPARQL queries must be executed in series. When there are dependencies, the clauses are rearranged and grouped into possible sub-queries. Information from the LOD metadata module (120) is used to determine which parts of the query can be resolved by querying a single ontology and which has to be distributed.
  • the chunking process (240) involves a clause in the query being selected (241 ).
  • the ontology of terms is determined (242) and variables in the query clause obtained (243). Once obtained, the process determines whether the variable is dependent (244). If the variable is not dependent, any other clauses querying the same ontology are identified (245) and, if so, grouped into a sub-query (246) and, if not, a new sub-query established (247). If the variable is determined to be dependent, the process involves determining whether the dependent clause queries the same ontology (248).
  • the dependent clause does query the same ontology, it is grouped into a sub-query (246) and, if not, it is sequenced as a sub-query after the dependent clause (249). This process may be repeated as necessary until there are no more clauses to provide the chunked sub-queries.
  • the query can be executed in parallel and, in this case, to two difference ontologies.
  • this query must be executed in sequence, first identifying the capital of Malaysia by executing the following sub-query:
  • This sub-query may be executed to arrive at an answer to the original query.

Abstract

A system (100) for distributed querying of linked semantic webs (110) comprising at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to said LOD ontologies; at least one concept index (130) comprising concepts and corresponding URIs; at least one relation index (140) comprising relations and corresponding URIs; query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with said LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from said query interface (160); characterised in that said distributed query engine (150) is adapted to: parse and rewrite queries received from said query interface (160) and generate a plurality of sub-queries; identify dependencies within said sub-queries and chunk sub-queries based on ontology; execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of said sub-queries.

Description

A SYSTEM AND METHOD FOR DISTRIBUTED QUERYING OF LINKED SEMANTIC
WEBS
FIELD OF INVENTION
The present invention relates to a system and method for distributed querying of linked semantic webs.
BACKGROUND ART
In recent years, there has been unprecedented growth in the amount of publically available semantic information. This information is encoded in RDF form in dozens of ontologies and linked together to form the Linked Open Data (LOD) cloud. However, the ability to treat the entire LOD as a single super World Wide Semantic Web for the purpose of querying and interfacing is still currently lacking. The only method available to query the ontologies that make up an interconnected public semantic web, such as the LOD cloud, is to use the public SPARQL endpoints provided. According to this method, each ontology is queried separately and the user needs to know at least the basic interna) structure before a query can be made.
As noted above, in currently available methodology each of the ontologies is queried separately and the results are therefore not aggregated, and the user also needs to know intimate details of each ontology in order to make a query. For example, to determine the population of Malaysia, the user needs to at least know: (i) which ontology might have the knowledge; (ii) how to query the ontology, that is, whether it has an endpoint; what the URL for Malaysia is in the ontology; and the property that is used to represent population.
As an example, United States Patent No. 5,600,329 describes a database system that provides independence between the query and physical structure of the database tables by captioning each database table with a partial query reflecting the contents of that table. In particular, the partial query is a query that if applied to a larger database of a standard configuration would produce the data of the table. Relevant tables for a particular query may be identified by piecing together the partial queries until the user query is obtained. As described in this patent, the database system may be integrated with an optimizer by comparing each of the identified tables against the others for the amount of overlap their sub-queries have with the user query and the cost of accessing the table and then repeating this process as the tables are joined in various combinations.
Other join processing and grouping techniques have been proposed to minimize the number of remote requests required, and to develop an effective solution for source selection in the absence of pre-processed metadata. In particular, frameworks have been proposed that enable SPARQL query processing on heterogeneous, virtually integrated Linked Data sources.
The present invention, at least in certain embodiments, advantageously provides a way for a user to perform SPARQL queries on a set of linked ontologies without needing to know the names of the ontologies, the location at which they are stored, or how they are internally structured.
The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practice.
SUMMARY OF INVENTION
The present invention relates to a system and method for distributed querying of linked semantic webs.
One aspect of the present invention provides a system (100) for distributed querying of linked semantic webs (110) comprising at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to the LOD ontologies; at least one concept index (130) comprising concepts and corresponding URIs; at least one relation index (140) comprising relations and corresponding URIs; at least one query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with the LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from the query interface (160).The distributed query engine (150) is adapted to parse and rewrite queries received from the query interface (160) and generate a plurality of sub-queries; identify dependencies within the sub-queries and chunk sub-queries based on ontology; execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of the sub-queries. In another aspect the invention provides a system (100) wherein the metadata included in the LOD ontologies index comprises one or more of namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint. In a further aspect the invention provides a system wherein the metadata included in the LOD ontologies index is in the form of a database table, knowledge base and/or text file.
In yet another aspect the invention provides a system wherein the concept index and the relation index are incorporated into a single index.
In still another aspect the invention provides a system wherein the concept index includes classes and instances. In a further aspect the invention provides a method (200) for distributed querying of linked semantic webs (110) comprising receiving an initial query from a user (210); parsing the query (220) and replacing generic terms (230); breaking the queries into sub-queries and chunking the sub-queries (240); executing the sub-queries (250) based on ontology; and merging the results obtained (260) to determine whether an answer is reached (270 and, if so, returning the answer (280) to the user. The method for breaking the queries into sub-queries and chunking the sub-queries (240) further comprises steps of selecting a clause in a query (241); determining the ontology of terms (242) and variables in the query clause (243); determining whether a variable is dependent (244); if the variable is not dependent, identifying any other clauses querying the same ontology (245) and grouping the clauses into a sub-query (246) or, if there are no other clauses querying " the same ontology, establishing a new sub-query (247); if the variable is dependent, determining whether the dependent clause queries the same ontology (248) and, if the dependent clause does query the same ontology, grouping it into a sub-query (246) and, if not, sequencing it as a sub-query after the dependent clause (249).
In another aspect the invention provides a method wherein, if an answer is not reached, the steps of the method are repeated, other than the step of receiving the initial query. In yet another aspect the invention provides a method wherein the step of parsing the query (220) and replacing generic terms (230) comprises checking concepts and/or relations and replacing generic terms with their actual URIs.
In a further aspect the invention provides a method wherein the process of replacing generic terms (230) comprises determining whether a term is generic or not (232); if the term is generic, determining whether or not the term is a concept or relation (234),searching a concept index (235) or a relation index (236); and replacing the term with its actual URI (237) reiterating the steps until all generic terms are replaced. In still another aspect of the invention there is provided a method comprising repeating the steps of the immediately preceding paragraph until all clauses are included in the chunked sub-queries. The present invention consists of features and a combination of parts hereinafter fully described and. illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which:
FIG. 1 illustrates the top level architecture of an embodiment of the invention.
FIG. 2 illustrates a flowchart for a querying process according to an embodiment of the invention.
FIG. 3 illustrates a replace generic terms flowchart of an embodiment of the invention.
FIG. 4 illustrates a sub-query chunking flowchart of an embodiment of the invention. Table 1 shows an example of a concept index (130) with three concepts in it. Table 2 shows an example of a relations index (140) with three relations in it.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a system and method for distributed querying of linked semantic webs.
Hereinafter, this specification will describe the present invention according to the preferred embodiments. It is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned without departing from the scope of the appended claims.
Referring to Figure 1 , a system (100) for distributed querying of linked semantic webs, particularly the LOD (110), is illustrated. The system (100) includes a number of modules, each of which will be discussed below. The system (100) includes a LOD metadata module (120) that is provided with an index of ontologies that are included within the LOD. The metadata may include, but is not limited to, namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint. The metadata may be provided in any suitable form, for example a database table, knowledge base, text file and so on.
A concept index (130) is also provided that includes an index of concepts, such as classes and instances, and their actual uniform resource identifier (URI) details. Each unique concept included in the index can appear in multiple ontologies with different URIs. Table 1 shows an example of a concept index (130) with three concepts in it.
In addition, the system (100) includes a relation index (140) that includes an index of relations and their actual URIs. Again, each of the unique relations can appear in multiple ontologies with different URIs. Table 2 shows an example of a relations index (140) with three relations in it. In addition to the above mentioned components, the system (100) includes a distributed query engine (150) which is adapted to receive a query from a user at a query interface (160) and breaking the query down to sub-queries. The sub-queries may be parallel or sequential, based on dependencies of the sub-queries. Once the query is broken down into these sub-queries, the distributed query engine (150) searches the LOD metadata module (120) for all ontologies that may be able to provide answers for each sub-query. This search may, for example, include semantic matching of query terms to the properties and concepts in each of the ontologies. If there are a number of relevant ontologies for a particular sub-query, then that sub-query is sent to all of the matching ontologies. The distributed query engine (150) then merges the answers received for each of the sub-queries and forms a final answer to the query.
A flowchart illustrating the querying process (200) employed by the distributed query engine (150) is provided in Figure 2. Referring to Figure 2, an initial query is received (210) by the distributed query engine (150). The query is then parsed (220) and generic terms replaced (230), as described in more detail below with reference to Figure 3. Briefly, during this process the distributed query engine (150) checks the concept index (130) and relation index (140) and replaces generic terms with their actual URIs. Once this process is completed, the queries are chunked into sub-queries (240), as discussed above and described in more detail below with reference to Figure 4, and the sub-queries executed (250). The results of the sub-queries are merged (260) to determine whether an answer is reached (270 and, if so, the answer is returned (280) to the user. If an answer is not reached, the process may be repeated.
Referring to Figure 3, the process of replacing generic terms (230) is illustrated. In this process, a term enters the process (231) and it is determined whether the term is generic or not (232). If the term is not generic, it is added to the queries (233). If the term is identified as being generic, it is determined whether or not the term is a concept (234). If the term is considered a concept, the concept index is searched (235) and, if not, the relation index is searched (236). Once searching is complete, the term is replaced in the query with its actual URl (237). The process then identifies any further terms requiring consideration (238) or ends to provide a list of queries. As an example, the query "What is the population of Malaysia?" can be written as a generic SPARQL query, such as:
SeUzCTIpaputeticm WHERE
{
Malaysia population ^population
}
The methodology of the invention identifies the generic concept, Malaysia, and searches the concept index (130). By replacing the concept Malaysia with URIs from the concept index (130), a total of three possible queries are formed:
SELBCT ?popuiation WHERE
{
dbp:f,1slaysla population ?population
J
SELBCT ΫρορνΙαϋοη WHERE
{
geoimmesMataysia population ?populatlon
J
SELBCT' ? population WHERE (
geolnfoMa!ays population ?popviatfan
} Next, the generic term population is identified and the relation index (140) searched for matches. This returns another three URIs. Replacing each of the generated queries above produces a total of nine different queries, some of which include: SeLECT ?popwiatf©» WHERE
{
di>p:Ma!aysia dbpprop:poputatlanCensus ?popvlatior>
)
SELECT ^population WHERE
i
geottames.-Mala sJa gcona 0s:p pulation ?popul<ition
} i r TpopuMimtW em
{
geolnfoMilaysIs gooinfo:pop lneonTotat ?population
}
These are the legitimate SPARQL queries that can be sent to the various SPARQL endpoints in the ontology index provided in the LOD metadata module (120).
The methodology of embodiments of the invention may attempt to execute all possible query combinations in parallel. However, in some instances there are dependencies between SPARQL clauses. In such cases, the SPARQL queries must be executed in series. When there are dependencies, the clauses are rearranged and grouped into possible sub-queries. Information from the LOD metadata module (120) is used to determine which parts of the query can be resolved by querying a single ontology and which has to be distributed.
Referring to Figure 4, the chunking process (240) involves a clause in the query being selected (241 ). The ontology of terms is determined (242) and variables in the query clause obtained (243). Once obtained, the process determines whether the variable is dependent (244). If the variable is not dependent, any other clauses querying the same ontology are identified (245) and, if so, grouped into a sub-query (246) and, if not, a new sub-query established (247). If the variable is determined to be dependent, the process involves determining whether the dependent clause queries the same ontology (248). If the dependent clause does query the same ontology, it is grouped into a sub-query (246) and, if not, it is sequenced as a sub-query after the dependent clause (249). This process may be repeated as necessary until there are no more clauses to provide the chunked sub-queries.
For example, given the input query "What is the capital and population of Malaysia?", the following query may be obtained:
SELECT Teaplui 7iH¾j«l9«0>* WHERE
/
Malaysia capital, cHy ?c<tpital
Mala sia population fpopufatitm
}
After replacement of generic terms (230), the query below is generated:
S .ecr mpUai tpoptilsitan WH&tB
(
dhp.-Matsysle di}pprx>p:t;»p!(of ?capHl>!
geottamesiftefaysla gaoni>mos:populstlot> fpopwhrtfon
}
As the two clauses in the query are independent of each other, the query can be executed in parallel and, in this case, to two difference ontologies.
If, on the other hand, the input query is "What is the population of the capital of Malaysia?, the following queries may be obtained:
&ELMCT ^ca ital ?popuifli/oi> WHERE
f
Malaysia capital ' tit ?capitel
?capiMpofmlaii tpttpui &
)
After replacement
SELECT Tcaplta) fpopula on WHERE
{
dbp;Meiaysla dbpprop capital ?capltai
7cBpttnl S)0oname$:popalatfon ?poput»ikm
> In this case, the second clause is dependent on the first. As such, this query must be executed in sequence, first identifying the capital of Malaysia by executing the following sub-query:
SELECT ?eapiti>l WHERE
{ >
The result (dbp:Kaulal_umpur) is then replaced and a second sub-query is generated:
SELECT ?(K>pulatlon WHERE
{
goof)a os:kus(a umpurgoon3mes:popitlat!on ^papulation
r
This sub-query may be executed to arrive at an answer to the original query.
Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.
Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term "comprising" is used in an inclusive sense and thus should be understood as meaning "including principally, but not necessarily solely".
It will be appreciated that the foregoing description has been given by way of illustrative example of the invention and that all such modifications and variations thereto as would be apparent to persons of skill in the art are deemed to fall within the broad scope and ambit of the invention as herein set forth.

Claims

1. A system (100) for distributed querying of linked semantic webs (110), the system comprising:
at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to said LOD ontologies;
at least one concept index (130) comprising concepts and corresponding URIs;
at least one relation index (140) comprising relations and corresponding URIs;
at least one query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with said LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from said query interface (160); characterised in that said distributed query engine (150) further having means to:
parse and rewrite queries received from said query interface (160) and generate a plurality of sub-queries;
identify dependencies within said sub-queries and chunk sub-queries based on ontology;
execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of said sub-queries.
2. A system (100) according to claim 1 , wherein said metadata included in said LOD ontologies index comprises one or more of namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint.
3. A system according to claim 1 or 2, wherein said metadata included in said LOD ontologies index is in the form of a database table, knowledge base and/or text file. A system according to any of claim 1 , wherein said concept index and said relation index are incorporated into a single index.
A system according to claim 1 , wherein said concept index includes classes and instances.
A method (200) for distributed querying of linked semantic webs (110), the method comprising steps of:
receiving an initial query from a user (210);
parsing said query (220) and replacing generic terms (230);
breaking said queries into sub-queries and chunking said sub-queries
(240);
executing said sub-queries (250) based on ontology; and
merging the results obtained (260) to determine whether an answer is reached (270 and, if so, returning the answer (280) to the user characterized in that
breaking said queries into sub-queries and chunking said sub-queries (240) further comprises steps of:
selecting a clause in a query (241 );
determining the ontology of terms (242) and variables in the query clause (243);
determining whether a variable is dependent (244); if the variable is not dependent, identifying any other clauses querying the same ontology (245) and grouping said clauses into a sub-query (246) or, if there are no other clauses querying the same ontology, establishing a new sub-query (247); if the variable is dependent, determining whether the dependent clause queries the same ontology (248) and, if the dependent clause does query the same ontology, grouping it into a sub-query
(246) and, if not, sequencing it as a sub-query after the dependent clause (249).
7. A method according to claim 6, wherein, merging the results obtained (260) to determine whether an answer is reached (270) further comprises repeating the steps of the method if an answer is not reached, , other than said step of receiving said initial query.
8. A method according to claim 6 , wherein said step of parsing said query (220) and replacing generic terms (230) further comprises checking concepts and/or relations and replacing generic terms with their actual URIs. 9. A method according to claim 8, wherein said step of replacing generic terms (230) further comprises steps of:
determining whether a term is generic or not (232);
if the term is generic, determining whether or not the term is a concept or relation (234);
searching a concept index (235) or a relation index (236); and replacing the term with its actual URI (237) and reiterating the steps until all generic terms are replaced.
10. A method according to claim 6, further comprising repeating the steps of claim 6 until all clauses are included in said chunked sub-queries when breaking said queries into sub-queries and chunking said sub-queries (240).
PCT/MY2013/000177 2012-11-01 2013-09-30 A system and method for distributed querying of linked semantic webs WO2014069983A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2012004796A MY164083A (en) 2012-11-01 2012-11-01 A system and method for distributed querying of linked semantic webs
MYPI2012004796 2012-11-01

Publications (2)

Publication Number Publication Date
WO2014069983A2 true WO2014069983A2 (en) 2014-05-08
WO2014069983A3 WO2014069983A3 (en) 2014-12-04

Family

ID=49551726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2013/000177 WO2014069983A2 (en) 2012-11-01 2013-09-30 A system and method for distributed querying of linked semantic webs

Country Status (2)

Country Link
MY (1) MY164083A (en)
WO (1) WO2014069983A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339334A (en) * 2020-02-11 2020-06-26 支付宝(杭州)信息技术有限公司 Data query method and system for heterogeneous graph database

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002080026A1 (en) * 2001-03-30 2002-10-10 British Telecommunications Public Limited Company Global database management system integrating heterogeneous data resources
US20040243595A1 (en) * 2001-09-28 2004-12-02 Zhan Cui Database management system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2002080026A1 (en) * 2001-03-30 2002-10-10 British Telecommunications Public Limited Company Global database management system integrating heterogeneous data resources
US20040243595A1 (en) * 2001-09-28 2004-12-02 Zhan Cui Database management system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BASTIAN QUILITZ ET AL: "Querying Distributed RDF Data Sources with SPARQL", 3 June 2007 (2007-06-03), THE SEMANTIC WEB: RESEARCH AND APPLICATIONS; [LECTURE NOTES IN COMPUTER SCIENCE], SPRINGER BERLIN HEIDELBERG, BERLIN, HEIDELBERG, PAGE(S) 524 - 538, XP019075716, ISBN: 978-3-540-68233-2 abstract; figure 1 page 2 - page 7 *
Ian C Millard ET AL: "Consuming multiple linked data sources: Challenges and Experiences", , 7 November 2010 (2010-11-07), pages 1-12, XP055128961, Retrieved from the Internet: URL:http://eprints.soton.ac.uk/271681/1/cold2010-paper16-camera-ready.pdf [retrieved on 2014-07-15] *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111339334A (en) * 2020-02-11 2020-06-26 支付宝(杭州)信息技术有限公司 Data query method and system for heterogeneous graph database
CN111339334B (en) * 2020-02-11 2023-04-07 支付宝(杭州)信息技术有限公司 Data query method and system for heterogeneous graph database

Also Published As

Publication number Publication date
WO2014069983A3 (en) 2014-12-04
MY164083A (en) 2017-11-30

Similar Documents

Publication Publication Date Title
US11698937B2 (en) Robust location, retrieval, and display of information for dynamic networks
US9448995B2 (en) Method and device for performing natural language searches
Hogan et al. An empirical survey of linked data conformance
US7933916B2 (en) Querying nonSQL data stores with a SQL-style language
Sycara et al. Larks: Dynamic matchmaking among heterogeneous software agents in cyberspace
EP3080721B1 (en) Query techniques and ranking results for knowledge-based matching
EP3080723B1 (en) Building features and indexing for knowledge-based matching
US20090089047A1 (en) Natural Language Hypernym Weighting For Word Sense Disambiguation
Harth et al. Using naming authority to rank data and ontologies for web search
EP1713010A2 (en) Using attribute inheritance to identify crawl paths
CN104850554A (en) Searching method and system
CN103488759A (en) Method and device for searching application programs according to key words
US20170193095A1 (en) Machine Processing of Search Query based on Grammar Rules
Mišutka et al. System description: Egomath2 as a tool for mathematical searching on wikipedia. org
CN114168622A (en) Data query method and device based on domain specific language
CN112559709A (en) Knowledge graph-based question and answer method, device, terminal and storage medium
WO2017063596A1 (en) Method, apparatus and device for processing sitemap
KR20230005797A (en) Apparatus, method and computer program for processing inquiry
CN112000690B (en) Method and device for analyzing structured operation statement
WO2012091541A1 (en) A semantic web constructor system and a method thereof
Eyal-Salman et al. Feature-to-code traceability in legacy software variants
US8498987B1 (en) Snippet search
WO2014069983A2 (en) A system and method for distributed querying of linked semantic webs
CN110222156B (en) Method and device for discovering entity, electronic equipment and computer readable medium
CN113420219A (en) Method and device for correcting query information, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 13786765

Country of ref document: EP

Kind code of ref document: A2

122 Ep: pct application non-entry in european phase

Ref document number: 13786765

Country of ref document: EP

Kind code of ref document: A2