WO2014069983A2

WO2014069983A2 - A system and method for distributed querying of linked semantic webs

Info

Publication number: WO2014069983A2
Application number: PCT/MY2013/000177
Authority: WO
Inventors: Weng Onn Kow; Anand Sadanandan ARUN; Lukose Dickson
Original assignee: Mimos Berhad
Priority date: 2012-11-01
Filing date: 2013-09-30
Publication date: 2014-05-08
Also published as: WO2014069983A3; MY164083A

Abstract

A system (100) for distributed querying of linked semantic webs (110) comprising at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to said LOD ontologies; at least one concept index (130) comprising concepts and corresponding URIs; at least one relation index (140) comprising relations and corresponding URIs; query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with said LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from said query interface (160); characterised in that said distributed query engine (150) is adapted to: parse and rewrite queries received from said query interface (160) and generate a plurality of sub-queries; identify dependencies within said sub-queries and chunk sub-queries based on ontology; execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of said sub-queries.

Description

A SYSTEM AND METHOD FOR DISTRIBUTED QUERYING OF LINKED SEMANTIC

WEBS

FIELD OF INVENTION

The present invention relates to a system and method for distributed querying of linked semantic webs.

BACKGROUND ART

In recent years, there has been unprecedented growth in the amount of publically available semantic information. This information is encoded in RDF form in dozens of ontologies and linked together to form the Linked Open Data (LOD) cloud. However, the ability to treat the entire LOD as a single super World Wide Semantic Web for the purpose of querying and interfacing is still currently lacking. The only method available to query the ontologies that make up an interconnected public semantic web, such as the LOD cloud, is to use the public SPARQL endpoints provided. According to this method, each ontology is queried separately and the user needs to know at least the basic interna) structure before a query can be made.

As noted above, in currently available methodology each of the ontologies is queried separately and the results are therefore not aggregated, and the user also needs to know intimate details of each ontology in order to make a query. For example, to determine the population of Malaysia, the user needs to at least know: (i) which ontology might have the knowledge; (ii) how to query the ontology, that is, whether it has an endpoint; what the URL for Malaysia is in the ontology; and the property that is used to represent population.

As an example, United States Patent No. 5,600,329 describes a database system that provides independence between the query and physical structure of the database tables by captioning each database table with a partial query reflecting the contents of that table. In particular, the partial query is a query that if applied to a larger database of a standard configuration would produce the data of the table. Relevant tables for a particular query may be identified by piecing together the partial queries until the user query is obtained. As described in this patent, the database system may be integrated with an optimizer by comparing each of the identified tables against the others for the amount of overlap their sub-queries have with the user query and the cost of accessing the table and then repeating this process as the tables are joined in various combinations.

Other join processing and grouping techniques have been proposed to minimize the number of remote requests required, and to develop an effective solution for source selection in the absence of pre-processed metadata. In particular, frameworks have been proposed that enable SPARQL query processing on heterogeneous, virtually integrated Linked Data sources.

The present invention, at least in certain embodiments, advantageously provides a way for a user to perform SPARQL queries on a set of linked ontologies without needing to know the names of the ontologies, the location at which they are stored, or how they are internally structured.

The subject matter claimed herein is not limited to embodiments that solve any disadvantages or that operate only in environments such as those described above. Rather, this background is only provided to illustrate one exemplary technology area where some embodiments described herein may be practice.

SUMMARY OF INVENTION

One aspect of the present invention provides a system (100) for distributed querying of linked semantic webs (110) comprising at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to the LOD ontologies; at least one concept index (130) comprising concepts and corresponding URIs; at least one relation index (140) comprising relations and corresponding URIs; at least one query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with the LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from the query interface (160).The distributed query engine (150) is adapted to parse and rewrite queries received from the query interface (160) and generate a plurality of sub-queries; identify dependencies within the sub-queries and chunk sub-queries based on ontology; execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of the sub-queries. In another aspect the invention provides a system (100) wherein the metadata included in the LOD ontologies index comprises one or more of namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint. In a further aspect the invention provides a system wherein the metadata included in the LOD ontologies index is in the form of a database table, knowledge base and/or text file.

In yet another aspect the invention provides a system wherein the concept index and the relation index are incorporated into a single index.

In still another aspect the invention provides a system wherein the concept index includes classes and instances. In a further aspect the invention provides a method (200) for distributed querying of linked semantic webs (110) comprising receiving an initial query from a user (210); parsing the query (220) and replacing generic terms (230); breaking the queries into sub-queries and chunking the sub-queries (240); executing the sub-queries (250) based on ontology; and merging the results obtained (260) to determine whether an answer is reached (270 and, if so, returning the answer (280) to the user. The method for breaking the queries into sub-queries and chunking the sub-queries (240) further comprises steps of selecting a clause in a query (241); determining the ontology of terms (242) and variables in the query clause (243); determining whether a variable is dependent (244); if the variable is not dependent, identifying any other clauses querying the same ontology (245) and grouping the clauses into a sub-query (246) or, if there are no other clauses querying " the same ontology, establishing a new sub-query (247); if the variable is dependent, determining whether the dependent clause queries the same ontology (248) and, if the dependent clause does query the same ontology, grouping it into a sub-query (246) and, if not, sequencing it as a sub-query after the dependent clause (249).

In another aspect the invention provides a method wherein, if an answer is not reached, the steps of the method are repeated, other than the step of receiving the initial query. In yet another aspect the invention provides a method wherein the step of parsing the query (220) and replacing generic terms (230) comprises checking concepts and/or relations and replacing generic terms with their actual URIs.

In a further aspect the invention provides a method wherein the process of replacing generic terms (230) comprises determining whether a term is generic or not (232); if the term is generic, determining whether or not the term is a concept or relation (234),searching a concept index (235) or a relation index (236); and replacing the term with its actual URI (237) reiterating the steps until all generic terms are replaced. In still another aspect of the invention there is provided a method comprising repeating the steps of the immediately preceding paragraph until all clauses are included in the chunked sub-queries. The present invention consists of features and a combination of parts hereinafter fully described and. illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.

BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS

To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which:

FIG. 1 illustrates the top level architecture of an embodiment of the invention.

FIG. 2 illustrates a flowchart for a querying process according to an embodiment of the invention.

FIG. 3 illustrates a replace generic terms flowchart of an embodiment of the invention.

FIG. 4 illustrates a sub-query chunking flowchart of an embodiment of the invention. Table 1 shows an example of a concept index (130) with three concepts in it. Table 2 shows an example of a relations index (140) with three relations in it.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

The present invention provides a system and method for distributed querying of linked semantic webs.

Hereinafter, this specification will describe the present invention according to the preferred embodiments. It is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned without departing from the scope of the appended claims.

Referring to Figure 1 , a system (100) for distributed querying of linked semantic webs, particularly the LOD (110), is illustrated. The system (100) includes a number of modules, each of which will be discussed below. The system (100) includes a LOD metadata module (120) that is provided with an index of ontologies that are included within the LOD. The metadata may include, but is not limited to, namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint. The metadata may be provided in any suitable form, for example a database table, knowledge base, text file and so on.

A concept index (130) is also provided that includes an index of concepts, such as classes and instances, and their actual uniform resource identifier (URI) details. Each unique concept included in the index can appear in multiple ontologies with different URIs. Table 1 shows an example of a concept index (130) with three concepts in it.

In addition, the system (100) includes a relation index (140) that includes an index of relations and their actual URIs. Again, each of the unique relations can appear in multiple ontologies with different URIs. Table 2 shows an example of a relations index (140) with three relations in it. In addition to the above mentioned components, the system (100) includes a distributed query engine (150) which is adapted to receive a query from a user at a query interface (160) and breaking the query down to sub-queries. The sub-queries may be parallel or sequential, based on dependencies of the sub-queries. Once the query is broken down into these sub-queries, the distributed query engine (150) searches the LOD metadata module (120) for all ontologies that may be able to provide answers for each sub-query. This search may, for example, include semantic matching of query terms to the properties and concepts in each of the ontologies. If there are a number of relevant ontologies for a particular sub-query, then that sub-query is sent to all of the matching ontologies. The distributed query engine (150) then merges the answers received for each of the sub-queries and forms a final answer to the query.

A flowchart illustrating the querying process (200) employed by the distributed query engine (150) is provided in Figure 2. Referring to Figure 2, an initial query is received (210) by the distributed query engine (150). The query is then parsed (220) and generic terms replaced (230), as described in more detail below with reference to Figure 3. Briefly, during this process the distributed query engine (150) checks the concept index (130) and relation index (140) and replaces generic terms with their actual URIs. Once this process is completed, the queries are chunked into sub-queries (240), as discussed above and described in more detail below with reference to Figure 4, and the sub-queries executed (250). The results of the sub-queries are merged (260) to determine whether an answer is reached (270 and, if so, the answer is returned (280) to the user. If an answer is not reached, the process may be repeated.

Referring to Figure 3, the process of replacing generic terms (230) is illustrated. In this process, a term enters the process (231) and it is determined whether the term is generic or not (232). If the term is not generic, it is added to the queries (233). If the term is identified as being generic, it is determined whether or not the term is a concept (234). If the term is considered a concept, the concept index is searched (235) and, if not, the relation index is searched (236). Once searching is complete, the term is replaced in the query with its actual URl (237). The process then identifies any further terms requiring consideration (238) or ends to provide a list of queries. As an example, the query "What is the population of Malaysia?" can be written as a generic SPARQL query, such as:

SeUzCTIpaputeticm WHERE

{

Malaysia population ^population

}

The methodology of the invention identifies the generic concept, Malaysia, and searches the concept index (130). By replacing the concept Malaysia with URIs from the concept index (130), a total of three possible queries are formed:

SELBCT ?popuiation WHERE

{

dbp:f,1slaysla population ?population

J

SELBCT ΫρορνΙαϋοη WHERE

{

geoimmesMataysia population ?populatlon

J

SELBCT^' ? population WHERE (

geolnfoMa!ays population ?popviatfan

} Next, the generic term population is identified and the relation index (140) searched for matches. This returns another three URIs. Replacing each of the generated queries above produces a total of nine different queries, some of which include: SeLECT ?popwiatf©» WHERE

{

di>p:Ma!aysia dbpprop:poputatlanCensus ?popvlatior>

)

SELECT ^population WHERE

i

geottames.-Mala sJa gcona 0s:p pulation ?popul_<ition

} i r TpopuMimtW em

{

geolnfoMilaysIs gooinfo:pop lneonTotat ?population

}

These are the legitimate SPARQL queries that can be sent to the various SPARQL endpoints in the ontology index provided in the LOD metadata module (120).

The methodology of embodiments of the invention may attempt to execute all possible query combinations in parallel. However, in some instances there are dependencies between SPARQL clauses. In such cases, the SPARQL queries must be executed in series. When there are dependencies, the clauses are rearranged and grouped into possible sub-queries. Information from the LOD metadata module (120) is used to determine which parts of the query can be resolved by querying a single ontology and which has to be distributed.

Referring to Figure 4, the chunking process (240) involves a clause in the query being selected (241 ). The ontology of terms is determined (242) and variables in the query clause obtained (243). Once obtained, the process determines whether the variable is dependent (244). If the variable is not dependent, any other clauses querying the same ontology are identified (245) and, if so, grouped into a sub-query (246) and, if not, a new sub-query established (247). If the variable is determined to be dependent, the process involves determining whether the dependent clause queries the same ontology (248). If the dependent clause does query the same ontology, it is grouped into a sub-query (246) and, if not, it is sequenced as a sub-query after the dependent clause (249). This process may be repeated as necessary until there are no more clauses to provide the chunked sub-queries.

For example, given the input query "What is the capital and population of Malaysia?", the following query may be obtained:

SELECT Teaplui 7iH¾j«l9«0>* WHERE

/

Malaysia capital_, cHy ?c<tpital

Mala sia population fpopufatitm

}

After replacement of generic terms (230), the query below is generated:

S .ecr mpUai tpoptilsitan WH&tB

(

dhp.-Matsysle di}pprx>p:t;»p!(of ?capHl>!

geottamesiftefaysla gaoni>mos:populstlot> fpopwhrtfon

}

As the two clauses in the query are independent of each other, the query can be executed in parallel and, in this case, to two difference ontologies.

If, on the other hand, the input query is "What is the population of the capital of Malaysia?, the following queries may be obtained:

&ELMCT ^ca ital ?popuifli/oi> WHERE

f

Malaysia capital ' tit ?capitel

?capiMpofmlaii tpttpui &

)

After replacement

SELECT Tcaplta) fpopula on WHERE

{

dbp;Meiaysla dbpprop capital ?capltai

7cBpttnl S)0oname$:popalatfon ?poput»ikm

> In this case, the second clause is dependent on the first. As such, this query must be executed in sequence, first identifying the capital of Malaysia by executing the following sub-query:

SELECT ?eapiti>l WHERE

{ >

The result (dbp:Kaulal_umpur) is then replaced and a second sub-query is generated:

SELECT ?(K>pulatlon WHERE

{

goof)a os:kus⁽a umpurgoon3mes:popitlat!on ^papulation

r

This sub-query may be executed to arrive at an answer to the original query.

Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements.

Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term "comprising" is used in an inclusive sense and thus should be understood as meaning "including principally, but not necessarily solely".

It will be appreciated that the foregoing description has been given by way of illustrative example of the invention and that all such modifications and variations thereto as would be apparent to persons of skill in the art are deemed to fall within the broad scope and ambit of the invention as herein set forth.

Claims

1. A system (100) for distributed querying of linked semantic webs (110), the system comprising:

at least one LOD ontologies index (120) comprising LOD ontologies and metadata relating to said LOD ontologies;

at least one concept index (130) comprising concepts and corresponding URIs;

at least one relation index (140) comprising relations and corresponding URIs;

at least one query interface (160) for entering SPARQL queries; and a distributed query engine (150) in communication with said LOD ontologies index (120), concept index (130) and relations index (140) and adapted to receive queries from said query interface (160); characterised in that said distributed query engine (150) further having means to:

parse and rewrite queries received from said query interface (160) and generate a plurality of sub-queries;

identify dependencies within said sub-queries and chunk sub-queries based on ontology;

execute sub-queries by sending to relevant source ontology; and merge results obtained from execution of said sub-queries.

2. A system (100) according to claim 1 , wherein said metadata included in said LOD ontologies index comprises one or more of namespace(s), vocabulary used (RDF, OWL, SKOS, etc.), properties and domain ranges information, and SPARQL endpoint.

3. A system according to claim 1 or 2, wherein said metadata included in said LOD ontologies index is in the form of a database table, knowledge base and/or text file. A system according to any of claim 1 , wherein said concept index and said relation index are incorporated into a single index.

A system according to claim 1 , wherein said concept index includes classes and instances.

A method (200) for distributed querying of linked semantic webs (110), the method comprising steps of:

receiving an initial query from a user (210);

parsing said query (220) and replacing generic terms (230);

breaking said queries into sub-queries and chunking said sub-queries

(240);

executing said sub-queries (250) based on ontology; and

merging the results obtained (260) to determine whether an answer is reached (270 and, if so, returning the answer (280) to the user characterized in that

breaking said queries into sub-queries and chunking said sub-queries (240) further comprises steps of:

selecting a clause in a query (241 );

determining the ontology of terms (242) and variables in the query clause (243);

determining whether a variable is dependent (244); if the variable is not dependent, identifying any other clauses querying the same ontology (245) and grouping said clauses into a sub-query (246) or, if there are no other clauses querying the same ontology, establishing a new sub-query (247); if the variable is dependent, determining whether the dependent clause queries the same ontology (248) and, if the dependent clause does query the same ontology, grouping it into a sub-query

(246) and, if not, sequencing it as a sub-query after the dependent clause (249).

7. A method according to claim 6, wherein, merging the results obtained (260) to determine whether an answer is reached (270) further comprises repeating the steps of the method if an answer is not reached, , other than said step of receiving said initial query.

8. A method according to claim 6 , wherein said step of parsing said query (220) and replacing generic terms (230) further comprises checking concepts and/or relations and replacing generic terms with their actual URIs. 9. A method according to claim 8, wherein said step of replacing generic terms (230) further comprises steps of:

determining whether a term is generic or not (232);

if the term is generic, determining whether or not the term is a concept or relation (234);

searching a concept index (235) or a relation index (236); and replacing the term with its actual URI (237) and reiterating the steps until all generic terms are replaced.

10. A method according to claim 6, further comprising repeating the steps of claim 6 until all clauses are included in said chunked sub-queries when breaking said queries into sub-queries and chunking said sub-queries (240).