EP2419840A1

EP2419840A1 - Method and device for generating an rdf database for an rdf database query and a search method and a search device for the rdf database query

Info

Publication number: EP2419840A1
Application number: EP10712921A
Authority: EP
Inventors: Mario DÖLLER; Gero BÄSE; Florian Markus Stegmaier
Original assignee: Siemens AG
Current assignee: Siemens AG
Priority date: 2009-04-15
Filing date: 2010-03-23
Publication date: 2012-02-22
Also published as: US9213738B2; US20120041974A1; CN102395968A; WO2010118931A1; KR20120022957A; DE102009017082A1; KR101662561B1; CN102395968B

Abstract

The invention relates to a method and a device for generating a database. To this end, information values are created by means of nodes and directed edges describing dependencies between two nodes or information values, respectively, in the form of a directed graph for the database query. By using a path distance describing a number of directed edges between a selected node and a target tuple, consisting of two nodes connected to a directed edge and the associated directed edge, a reduction of a complexity of a database query and thus an acceleration of the database query can be achieved. The invention further comprises a search method and a search device for querying the described database. The invention can be used, for example, for monitoring systems or in medical databases. Furthermore, the invention can be used for databases which are dynamically extended, for example by new events in the monitoring system.

Description

description

METHOD AND DEVICE FOR GENERATING AN RDF DATABASE FOR A RDF DATABASE INQUIRY, AS WELL AS A SEARCH METHOD AND A RDF DATABASE INQUIRY SEARCH ENGINEERING DEVICE

The invention relates to a method and a device for generating a database for a database query. Furthermore, the invention relates to a search method and a search device for querying a database.

Information values, such as flight data or stock market prices, are made available to users via databases today. There are a variety of database languages with which the respective database can be described and queried in a structured manner. A representative for describing databases with semantic relations is, for example, RDF / OWL (RDF - Resource Description Framework, OWL - Web Ontology Language) [1, 2]. Here, informational values in the form of nodes are described, with two nodes and one directed edge between the two nodes being referred to as RDF triples. The two nodes represent a subject and an object and the directed edge a predicate. Here, the predicate generally defines a semantic relation between the subject and the object. This will be explained in more detail using an example according to FIG. Here, the node B represents the subject "person", the node A represents the object "Mario" and the directed edge a the predicate "has name". Thus, the RDF triplet "BaA" reads "Person has name Mario". A concatenation of such RDF triplets leads to a graph structure which reproduces a directed graph, see, for example, FIG. 1. Triples are outlined in bold in FIG.

To query an information value of the database according to

RDF / OWL can use a query language SPARQL (SPARQL - SPARQL Protocol and RDF Query Language) [3]. For this purpose, starting from a predeterminable node, that is to say a specific formation value, one or more RDF triplets of the database searched, see, for example, in Figure 1, the bold rimmed RDF triplets (BaA, HjJ, IkK). To query these RDF triples, ie a semantic query, a searched subgraph is realized by specifying all RDF triples involved. Therefore, to specify the above three RDF triplets, chains of RDF triplets including the RDF triplets framed in FIG. 2 must be specified. This approach requires that such a semantic database query is complex and time-consuming, since already in places of the database query an extensive knowledge of the structure of the database is needed.

It is therefore an object of the invention to specify a method and a device for generating a database for a database query, as well as a search method and a search device for querying a database, which enable a reduction in the complexity of the database query.

This object is solved by the independent claims. Further developments of the invention can be found in the dependent claims.

The invention relates to a method for generating a database for providing information values by means of nodes and dependencies of information values by means ge ^¬ directed edges for a database query, the database is formed in the form of a directed graph through the nodes and the directed edges, where the following steps are performed:

a) reading a description rule that indicates the assignment of two information values with the corresponding dependency;

b) creating the respective node for the respective information value and the respective directed edge for the respective dependency; c) generating the directed graph starting from a predeterminable node of the nodes on the basis of the description ^¬ rule, in each case two of the nodes and of the respective node connecting directed edge are marked as triple ^¬ marked;

d) determining at least one path from the predeterminable node to a triple to be determined in the database query;

e) Generating a respective path distance of the respective path, wherein the respective path distance indicates a number of ge ^¬ directed edges of the predeterminable node to be determined triplets, wherein the respective path distance can be evaluated in the database query, starting from the predetermined node.

The method achieves a reduction of the computing power in the search for specific triples, since a specification of the path distance achieves a reduction of the nodes to be examined.

Furthermore, a device for generating a database for providing information values by means of nodes and dependencies of the information values using directed edges for a database query is part of the invention, wherein the database is formed in the form of a directed graph by the nodes and by the directed edges the device comprises the following means:

a) First means for reading a description rule, which indicates the assignment of two information values with the corresponding dependency;

b) second means for creating the respective node for the respective information ^value and the respective court ^¬ edge for the respective dependence; c) third means for generating the directed graph from a predeterminable node of the nodes on the basis of the description rule, wherein in each case two of the nodes and the directed edge connecting the respective nodes are identified as triples;

d) fourth means for determining at least one path from the predeterminable node to a triple to be determined in the database query;

d) fifth means for generating a respective path distance of the respective path, wherein the respective path distance indicating a number of directed edges of the predetermined node to the to be determined triples, wherein the respective path ^¬ distance in the database query, starting evaluated by the vorgebba ^¬ ren node can be.

With the aid of the device, the method for generating a database can be implemented.

Furthermore, part of the invention is a search method for determining an information value in a database, wherein the database can be generated according to methods for generating a database, in which the following steps are performed:

Generating a database query by means of a search pattern ^¬ an indication of the specifiable node, the path distance and the to be determined triple;

Searching the database query database, taking into account the path distance specified in the search pattern in the search;

Providing at least one of the information values of at least one of the triples satisfying the specification of the search pattern. The search method achieves a reduction of the computing power in the search for specific triples in the database, since a specification of the path distance achieves a reduction of the nodes to be examined.

In a refinement of the search method, searching through the database for determining the database query takes into account those triples which have a maximum path distance of the path distance specified in the search pattern to the node specified in the search pattern. As a result, a further reduction of the computing power for performing the database query can be achieved since a number of triples to be considered for the search is further reduced.

In addition, the computing performance for performing the database query can be further reduced if, when searching the database for determining the database query, those triples are taken into account which have a path distance of exactly the path distance specified in the search pattern to the node specified in the search pattern.

In an advantageous development of the search method, a value zero for the path distance in the search pattern is processed such that this path distance is set to at least one maximum path distance occurring in the database. This allows a full search of all nodes and triplets of the database regardless of the maximum occurring path distance.

Finally, part of the invention is a search device for

Determining an information value in a database, wherein the database is generated according to the device for generating a database, comprising the following units:

First unit for generating a database query by means of a search pattern comprising an indication of the predeterminable node, the path distance and the triple to be determined; Second database for searching the database for determining the database query, wherein the path distance specified in the search pattern is taken into account in the search;

Third unit for providing at least one of the information values of at least one of the triples, which fulfills the specification of the search pattern.

With the help of the search device, the search method can be implemented. Further advantages, also for further developments of the search device are the respective corresponding features of the search method removable.

In a further development of the search device, the second unit is further configured in such a way as to search through the

Database for determining the database query to take into account those triples having a maximum path distance of the path distance specified in the search pattern to the predetermined in the search pattern node.

Additionally or alternatively, the second unit may be further configured to consider, when searching the database for determining the database query, those triples having a path distance from exactly the path distance specified in the search pattern to the node specified in the search pattern.

In an advantageous development of the search device, the second unit is further configured such that a value zero for the path distance in the search pattern can be processed such that this path distance can be set at least to a maximum path distance occurring in the database.

The invention and its developments are explained in more detail with reference to figures.

Show it: Figure 1 A structure of a database with edges and nodes according to a standard RDF / OWL (prior art);

Figure 2 A number of RDF triplets of the database to be considered in retrieving three RDF triplets (prior art);

FIG. 3 shows a structure of a database according to an exemplary embodiment of the invention;

FIG. 4 shows a flowchart for creating the database according to FIG. 3;

FIG. 5 Flow chart for querying the database.

Elements having the same function and effect are given the same reference numerals in the figures.

FIGS. 3 and 4 show an exemplary embodiment of the invention. As part of a monitoring application in a building, a database is to be created for a semantic annotation of temporal and spatial allocations of a person to one or more rooms. In this case, FIG. 3 shows nodes and directed edges which can each receive specific information values or dependencies. In FIG. 3, the following reference symbols are used:

A: Information value node IA = Name

B: Node for the information value IB = person

C: node for the information value IC = position

D: node for the information value ID = time

E: node for the information value IE = local

F: Node for the information value IF = location

G: node for the information value IG = name

a: directed edge for dependency aa = "has name" b: directed edge for dependency bb = "has position" c: directed edge for dependence cc = "has temporal position" d: directed edge for dependence dd = "has local position" e: directed edge for dependence ee = "has located" f: directed edge for dependence ff = "has local Region "g: directed edge for dependency gg =" has name "

In a step STA, the method is started with the steps S1 to S5 in order to create a database DB.

In step S1, a description specification DEF is read in which the assignment of two information values IA, IB with the associated dependency aa. The description specification DEF can be in the form of a paper sheet or as an electronic file and, for example, by means of a description language XML (XML - eXtensible Markup Language) represent the respective assignment. In this example, the description rule describes the assignments as pictorially illustrated in FIG.

In step S2, the respective nodes A,..., G are formed for the respective information value IA,..., IG and the respective directed edges a,..., G for the dependencies aa,..., Gg. The edges are directed because the node where the directional edge originates, for example, a subject, and the node where the directional edge terminates corresponds to an object, where the directed edge represents a semantic relation of both edges. Two nodes connected to a directed edge and the associated directed edge are called triple TA, TF.

In a next step S3, a directed graph TR from the nodes and directed edges is then formed starting from a prescribable node AA = B on the basis of the description rule. The predeterminable node AA comes from the set of nodes and is a starting point for a database query. In this case, the exemplary embodiment in a concrete form, ie in an instance, may read as follows: Person = first person "has name" name = Werner - person = first person "has position" position = 15

Position = 15 "has time position" Time = 12:05

Position = 15 "has local position" Local = 48 ° 8 'NB, 11 ° 34' OL (NB = north latitude, OL = east longitude) - Locally = 48 ° 8 'NB, 11 ° 34' OL "has local region "Place = first room section

Time = 12:05 o'clock "Location" = first section of the room

Place = first room section "has name" name = entrance area

This involves monitoring large rooms in multiple locations to be monitored, i. Regions. Each time a person enters a region, an instance can be created in the database. In general, at least one specific form exists in the database DB.

In a next step S4, a path PF1 is formed by the predeterminable node AA to the triple TF to be determined during the database query. In the present exemplary embodiment, there are the following paths PF1, PF2 from the prescribable node AA to the node F of the triple TF:

PFl = AA-b-C-c-D-e-F PF2 = AA-b-C-d-E-f-F

A length of the paths, i. a path distance is determined by a number of directed edges in the respective path. In the present example, the path distance DIS of the path PF1 is DIS1 = 3 and the path PF2 is DIS2 = 3.

In an optional extension, in step S4, if there are several paths, the shortest path can be determined to be used subsequently. However, in the present example, both paths are the same length.

In a subsequent step S5, the path distance DIS is added to the triplets TF. The flow chart according to FIG. 4 is ended in step END.

The invention also relates to a search method for determining an information value in the database DB. According to FIG. 5, this query is started in state STA.

In a step S6, a database query is generated by means of a search pattern. In this case, in the database query, the search pattern QY is the predeterminable node instead of a predefinable path, as is the case in the prior art with the query language SPARQL. the associated information value, the distance to be considered in the search, and the triplet to be determined, such as:

QY = "(person) [3] (place" has name "name)"

This means, as shown in step S7, that starting from the information value (person) with a path distance of 3, the triple (place "has name" name) is searched for. In the search thus only the triples are taken into account, the one

Path distance of three. In general, searching the database takes into account the path distance. Furthermore, the search method may determine the path that is least complex for the search.

As a result of the search, at least one information value of the triples determined by the search is output in step S8. Furthermore, further information values of the determined triple and / or the dependency can also be displayed. In addition, special value characteristics for at least one of the elements of the triple can be queried in the search pattern. This is the search pattern QY eg.

QY = "(person) [3] (place" has name "name = ^λ input area") "

In this case, those locations are searched in the database which have a specific value expression of the name "input area".

Furthermore, during the search of the database, those triples can be taken into account which have the exact or maximum distance of the distance DIST indicated in the search pattern from the predeterminable node AA specified in the search pattern. Furthermore, with the aid of a value zero in the path distance DIST = 0 specified in the search pattern, it is possible to inform the search method that all nodes of the database are to be searched.

The flow chart according to FIG. 5 is ended in step END.

In the prior art, the triplets of the individual types can be stored by means of tables. In the evaluation of search patterns, that is to say of specifiable paths, involved tables must be linked together. The efficiency of the search depends essentially on the size of the tables and the respective selectivity.

In contrast, in one implementation of the present invention, a search may be limited to the path that requires few computational steps. This can then be the path that is the shortest, that is, has the smallest path distance. In this case, fewer triples have to be processed in order to get from the predeterminable node to the triple TF to be determined. Furthermore, an advantage can result from the fact that in the processing of the triples by means of tables that path is selected which has the smallest possible tables. In the present example according to FIG. 3 for example, a table for the node D and another table for the node E are created. The table for node D contains a large number of entries and the table for node E contains only a small number of entries. Therefore, when using tables to implement the database, it is expedient to choose path PF2, which passes over node E and does not include node D. This reduces computational complexity.

In the case of a query according to the prior art by means of SPARQL, the search pattern, that is to say the predefinable path, must be completely specified. In this case, a selection of paths leading from the predeterminable node to the triple TF to be determined is not possible. Thus, the invention makes it possible for the search pattern to have only the elements which are essential to the search, and the search method based on this search pattern can determine the optimum path for evaluating the search.

A further advantage of the invention can be seen in the fact that a degree of detail in the database query can be set by specifying the distance in the search pattern. The greater the distance from the predeterminable node, the more detailed the degree of information. Thus, with the aid of the invention, it is also possible to specify a quality of the triple TF to be determined in the database query.

In an extension, a value of zero for the path distance may indicate that the search is to be performed without restriction to the distance. This is advantageous because this way the

Possibility opened regardless of the thickness of the database, so without knowing the maximum occurring path distance in the database, all triples to be considered in the database query.

The method for generating the database can be carried out with the aid of five means M1, M2, M3, M4, M5 with the device VOR. Furthermore, the search method for determining an information value in the database by means of the units El, E2 and E4 are realized with the search device SVOR. These means and / or units may be implemented in hardware, software or a combination of hardware and software. In addition, the means and / or the units can be executed by means of a computer unit.

Literature:

[1] "Resource Description Framework", hc tp: // en. Wikipedia.org/wiki/Resouree Deser iption ork, as of 15.04.2009

[2] "Web Ontology Language", http://en.wikipedia.org/wiki/Web Oncology_Lan_saiage,

Stand 15.04.2009

[3] "SPARQL Protocol and RDF Query Language", hebp: // en. Wikipedia.org/wiki/SPARQL, as of 15/04/2009

Claims

claims

1. A method for generating a database (DB) for providing information values (IA, ..., IG) by means of nodes (A, ..., G) and dependencies (aa, ..., gg) of the information values (IA , ..., IG) by means of directed edges (a, ..., g) for a database query, whereby the nodes (A, ..., G) and the directed edges (a, ..., g) the database (DB) is formed in the form of a directed graph (TRE), in which the following steps are carried out: a) reading a description rule (DEF), which determines the assignment of two information values (IA, IB) with the associated dependency (aa ) indicates; b) Creation of the respective node (A,..., G) for the respective information value (IA,..., IG) and the respective directed edge (a,..., g) for the respective dependency (aa , ..., gg); c) generating the directional graph (TRE) from a predeterminable node (AA) of the nodes (A, ..., G) on the basis of the specification (DEF), wherein two of the

Nodes ((A, B), (F, G)) and the directed edge (a, g) connecting the respective nodes ((A, B), (F, G)) are identified as triples (TA, TG); d) determining at least one path (PF1, PF2) from the predeterminable node (AA) to a triple (TG) to be determined in the database query; e) generating a respective path distance (DIS1, DIS2) of the respective path (PF1, PF2), wherein the respective path distance (DPF1, DPF2) comprises a number of directed edges (a, ..., k) from the predeterminable node (AA) indicates the triplet (TG) to be determined, wherein the respective path distance (DPF1, DPF2) can be evaluated in the database query on the basis of the predefinable node (AA).

2. Device (VOR) for generating a database (DB) for providing information values (IA, ..., IG) by means of nodes (A, ..., G) and dependencies (aa, ..., gg) of Information values (IA, ..., IG) by means of directed edges (a, ..., g) for a database query, wherein by the nodes (A, ..., G) and by the directed edges (a, ..., g), the database (DB) in the form of a directed graph (TRE), in which the device comprises the following means: a) First means (M1) for reading in a description specification (DEF), which specifies the association of two information values (IA, IB) with the associated dependency (aa); b) Second means (M2) for creating the respective node (A, ..., G) for the respective information value (IA, ..., IG) and the respective directed edge (a, ..., g) for the respective dependency (aa, ..., gg); c) Third means (M3) for generating the directed graph (TRE) from a predeterminable node (AA) of the nodes

(A, ..., G) on the basis of the description specification (DEF), whereby in each case two of the nodes ((A, B), (F, G)) and the respective nodes ((A, B), (F, G)) facing edge (a, g) are marked as triples (TA, TG); d) fourth means (M4) for determining at least one path (PF1, PF2) from the predeterminable node (AA) to a triple (TG) to be determined in the database query; d) fifth means (M5) for generating a respective path distance (DIS1, DIS2) of the respective path (PF1, PF2), wherein the respective path distance (DPF1, DPF2) comprises a number of directed edges (a, ..., k) of indicates the predeterminable node (AA) to be determined triple (TG), wherein the respective path distance (DPFl, DPF2) can be evaluated in the database query, starting from the predetermined node (AA).

3. Search method for determining an information value (IF) in a database (DB), wherein the database (DB) can be generated according to claim 1, wherein the following steps are performed: generating a database query using a search pattern

(QY) comprising an indication of the predeterminable node (AA), the path distance (DIST) and the triplet (TG) to be determined; Searching the database (DB) for determining the database query, taking into account the path distance (DIST) specified in the search pattern (QY) during the search; Providing at least one of the information values (IF) at least one of the triples (TR) that fulfills the specification of the search pattern (QY).

4. A search method according to claim 3, wherein when searching the database (DB) for determining the database query, those triples are taken into account which have a path distance of at most the path distance (DIST) specified in the search pattern (QY) to that in the search pattern (QY). have predetermined node (AA).

5. A search method according to claim 3 or 4, wherein when searching the database (DB) for determining the database query, those triples are taken into account which have a path distance of exactly the path distance (DIST) specified in the search pattern (QY) to that in the search pattern (D). QY) predetermined node (AA).

Search method according to one of claims 3 to 5, wherein a value zero for the path distance (DIST) in the search pattern (QY) is processed such that this path distance (DIST) to at least a maximum occurring in the database (DB) path distance (DISmax) is set.

7. Search device (SVOR) for determining an information value (IF) in a database (DB), the database (DB) is generated according to claim 2, comprising the following units:

First unit (El) for generating a database query by means of a search pattern (QY) comprising an indication of the predeterminable node (AA), the path distance (DIST) and the triple (TG) to be determined;

Second unit (E2) for searching the database (DB) to determine the database query, the ones in the search pattern (QY) path distance (DIST) is considered in the search;

Third unit (E3) for providing at least one of the information values (IF) of at least one of the triples (TR) which fulfills the specification of the search pattern (QY).

Search device (SVOR) according to claim 7, further comprising the second unit (E2), when searching the database (DB) for determining the database query, to consider those triples having a maximum path distance in the search pattern (QY) indicated path distance (DIST) to the specified in the search pattern (QY) node (AA).

The search device (SVOR) according to claim 7 or 8, further comprising the second unit (E2), when searching the database (DB) for determining the database query, to consider triples having a path distance of exactly that in the search pattern (QY) indicated path distance (DIST) to the specified in the search pattern (QY) node (AA).

Search device (SVOR) according to one of claims 7 to 9, further comprising the second unit (E4) is designed such that a value zero for the path distance (DIST) in the search pattern (QY) is processable such that this path distance ( DIST) can be set at least to a maximum path distance (DISmax) occurring in the database (DB).