KR20130092242A - Inference query processing using hyper cube - Google Patents

Inference query processing using hyper cube Download PDF

Info

Publication number
KR20130092242A
KR20130092242A KR1020120013861A KR20120013861A KR20130092242A KR 20130092242 A KR20130092242 A KR 20130092242A KR 1020120013861 A KR1020120013861 A KR 1020120013861A KR 20120013861 A KR20120013861 A KR 20120013861A KR 20130092242 A KR20130092242 A KR 20130092242A
Authority
KR
South Korea
Prior art keywords
class
property
box
inference
bit
Prior art date
Application number
KR1020120013861A
Other languages
Korean (ko)
Other versions
KR101318250B1 (en
Inventor
전종훈
김인성
Original Assignee
(주)프람트테크놀로지
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)프람트테크놀로지 filed Critical (주)프람트테크놀로지
Priority to KR20120013861A priority Critical patent/KR101318250B1/en
Publication of KR20130092242A publication Critical patent/KR20130092242A/en
Application granted granted Critical
Publication of KR101318250B1 publication Critical patent/KR101318250B1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)

Abstract

The present invention relates to a method for searching inference using a hypercube, and more particularly, uses an ontology of a triple structure composed of a subject, a predicate, and an object, and the ontology is a target. Or an area of an A-box defined as a T-Box defining a target class, an attribute, and an attribute between the class and the class, and an instance based on a concept defined in the T-Box. In the inferred retrieval method stored separately, the inference retrieval of the tee-box is ⅰ) n property vectors consisting of n bits assigned to each of the n attributes and n bits corresponding to each of the identifiers; generating a property vector; Ii) assigning m class identifiers to the class; Iii) expressing the triple structure between the class and the property as m bit cubes of the property vectors for each class; and iii) the instance and the class to which the instance belongs when the instance search is requested. Inferring search method using a hypercube comprising the step of obtaining the associated class and property by querying a property vector stored in the A-box and extracting the related result, the inference search method using a hypercube of the present invention Efficient inference search is possible in the T-box.

Description

 Inference retrieval using hypercube {INFERENCE QUERY PROCESSING USING HYPER CUBE}

The present invention relates to a method for searching inference using a hypercube, and more particularly, uses an ontology of a triple structure composed of a subject, a predicate, and an object, and the ontology is a target. Or an area of an A-box defined as a T-Box defining a target class, an attribute, and an attribute between the class and the class, and an instance based on a concept defined in the T-Box. In the inferred retrieval method stored separately, the inference retrieval of the tee-box is ⅰ) n property vectors consisting of n bits assigned to each of the n attributes and n bits corresponding to each of the identifiers; generating a property vector; Ii) assigning m class identifiers to the class; Iii) expressing the triple structure between the class and the property as m bit cubes of the property vectors for each class; and iii) the instance and the class to which the instance belongs when the instance search is requested. And a method for inferring search using a hypercube including querying a property vector stored in an A and obtaining an associated class and an attribute and extracting an associated result from the A-box.

As the next generation of the World Wide Web, the Semantic Web has emerged, allowing computers to understand their own meanings and infer new facts based on defined facts. Semantic Web is implemented using Ontology (Specification of a conceptualization), and it is standard to define ontology as OWL based on XML. Definitions, often cited in terms of ontology, are explicit and formalized details for the conceptualization of areas of interest. Ontology is the formulation of the concept of things and events that humans understand as a definition that everyone agrees on.When a machine understands these ontologies and makes inferences based on appropriate rules and relationships, it is similar to what people think. Can be derived. Ontology modeling techniques, combined with the Web, enable us to extend inferences that were previously limited to proprietary data to all electronic resources accessible through the Web. Furthermore, as we entered the ubiquitous era, context-aware knowledge-based services became possible. An ontology is expressed by defining concepts and concepts and the properties that exist between them. In recent years, the rise of the semantic web and related technologies that have been in the spotlight has increased interest in the ontology, and among them, various techniques for efficiently storing and retrieving the ontology for the semantic based semantic search requiring high level of inference are actively studied. come. The standard recommendation in W3C is to use RDF / S and OWL, but it is not suitable for processing large ontologies using triple storage that maintains and saves the original form of ontologies such as editor, inference engine, and memory based implementation. There is a limit. Therefore, in order to solve this problem, various inference engines and query processing algorithms have been proposed to store and efficiently use the ontology using the proven relational database engine. However, this also does not completely obtain inference results according to the five core characteristics of ontology properties. Storing an ontology using a relational database can reduce the size of stored data, and it can also guarantee the integrity of data by utilizing the integrity constraints that are provided in the relational database basically. The key to controlling the performance of query processing in a relational database is the index. It is clear that the role of indexes becomes more important when dealing with large amounts of data. Relational database engines provide various types of indexes that match the nature of the query and the search conditions, which are optimized for existing relational database searches and therefore perform well in handling common SQL queries. However, there is no index suitable for handling reasoning queries required by ontology-based semantic search. The presence of indexes for the processing of a large-sized ontology is the most important problem when storing large-sized ontology in a relational database.

It can be described as a relation attribute expressing a hierarchy between a class and a class, and this relationship attribute is expressed as "isA". This inclusion relationship can also be formed in several layers. In OWL, inference is performed based on DL (Description Logic), and it is possible to make inferences according to the hierarchy structure and properties of the classes. In order to store large capacity ontology in XML form OWL, constraints may arise such as safety related to data storage management, accessibility, query processing method that does not utilize indexes. Although schemas for efficient inference are stored in RDBMS, inference using most RDBMSs is performed only on instances of A-Box, and inference based on the characteristics of OWL attributes can not be performed completely. In addition, if a large-capacity concept needs to be handled in a T-box like UMLS, the cost of query processing increases exponentially due to the problem of self-join operation.

Accordingly, the technical problem to be achieved by the present invention is to provide an inference retrieval method using a hypercube composed of bits to enable efficient inference retrieval even in large T-box and A-box.

In order to achieve the above technical problem, the present invention uses a triple structure ontology consisting of a subject, a predicate and an object form, the ontology is a class, a property and the target In the inference retrieval method that is stored divided into the area of the T-Box defining the class and the properties between the class and the area of the A-box defined as instances based on the concept defined in the T-Box. In the inference search of the tee-box, i) assigns n property identifiers to the n properties and generates n property vectors composed of n bits corresponding to the respective identifiers. step,; Ii) assigning m class identifiers to the class; Iii) expressing the triple structure between the class and the property as m bit cubes of the property vectors for each class; and iii) the instance and the class to which the instance belongs when the instance search is requested. A method of inferring search using a hypercube includes obtaining a related class and an attribute by searching a property vector stored in the A and extracting an associated result from the A-box.

In addition, the present invention is inferred search using a hypercube, characterized in that the triple structure corresponds to one of five properties, such as transitional, symmetric, inverse, functional and inverse functional properties Provide a method.

In addition, the present invention, the inference search of the A-box is stored in the form of a two-dimensional bit matrix form of the inference result of the triple structure according to the properties and properties of properties between the instances for the instance, and the instance search When requested, the method provides an inference retrieval method using a hypercube including querying a bit matrix according to a property of a property and extracting an associated result from the A-box.

In addition, the present invention uses a hypercube characterized in that the triple structure has the characteristics of transitive, symmetric, inverse, functional, and inverse functional. Inference search method.

The inference retrieval method using the hypercube of the present invention is superior to the conventional retrieval method in terms of performance even in a large T-box and A-box, and also guarantees the completeness of the result of inference.

1 illustrates a triple structure consisting of a subject, a predicate, and an object in a tee-box and an A-box in an embodiment of the inference retrieval method using the hypercube of the present invention. Divided understanding
2 is a diagram for explaining a bit cube structure in which the ontology of the entire T-box is expressed in a three-dimensional space using bits in the inference retrieval method using the hypercube of the present invention.
3 is a representation of a hypercube in which a bit cube and a bit matrix are combined
Figure 4 is a flow chart of the inference retrieval method using the hypercube of the present invention
5 is an understanding diagram illustrating five characteristics that an object property may have in the inference retrieval method using the hypercube of the present invention.
Figure 6 shows the results of contrasting the completeness of the query in the conventional inference search method and inference search method using the hypercube of the present invention.
Figure 7 is a graph showing the results of verifying the efficiency of the inference search method using the hypercube of the present invention compared to the conventional inference search engine

Hereinafter, the present invention will be described in detail with reference to the drawings attached hereto.

The inference retrieval method using the hypercube of the present invention uses a triple structure ontology consisting of a subject, a predicate, and an object, and the ontology is a class, a property, and the target. In the inference retrieval method that is stored divided into the area of the T-Box defining the class and the properties between the class and the area of the A-box defined as instances based on the concept defined in the T-Box. In the inference search of the tee-box, i) assigns n property identifiers to the n properties and generates n property vectors composed of n bits corresponding to the respective identifiers. step,; Ii) assigning m class identifiers to the class; Iii) expressing the triple structure between the class and the property as m bit cubes of the property vectors for each class; and iii) the instance and the class to which the instance belongs when the instance search is requested. Querying the property vector stored in to obtain an associated class and attribute and extracting an associated result from the A-box.

Hereinafter, the present invention will be described in detail through suitable examples.

1 is a schematic representation of an example of a T-Box and an A-Box in one embodiment of an inference retrieval method using a hypercube of the present invention. The example in the figure represents a part of the university ontology. T-Box defines properties for students, professors, and courses, and the subclassOf between people, students, and professors. In other words, we define a property called takesCourse between students and courses, and a property called isTaughtBy between subjects and professors. A-Box also defines instances based on each class. Different student instances were defined from the class of the student, and different subject instances were defined from the subject class. In addition, between the student instances and the subject instances, there is a property called Course defined in the T-Box.

In T-Box, the concept is defined in the form of a class, and properties define the relationship between classes. A-Box creates an instance belonging to each class defined in T-Box. Ontologies defined in T-Box and A-Box are described as triples in the form of Subject-Predicate-Object, and triples can be expressed in the form of a graph with direction as shown in FIG. 1.

The W3C has adopted standard recommendations to define ontology through the Resource Description Framework (RDF), RDF Schema (RDFS), and Web Ontology Language (OWL). RDF defines resources and is a specification developed by the W3C to define resources and their relationships. Resources are defined through Uniform Resource Locator (URI), and each resource can be identified through its own URI, and RDF can define an instance corresponding to A-Box. Instances defined using RDF are defined by mapping to classes in T-Box defined as RDFS or OWL. RDFS defines the classes and properties defined in T-Box, and the hierarchy between classes and properties. However, due to the limited expressiveness of RDFS, the W3C recommends defining a T-Box using OWL. OWL is fully compatible with the concept of a superset of RDFS. OWL is more expressive than RDFS and can express knowledge widely. Using OWL, you can express the cardinality constraints of properties that are not defined in RDFS, or the equivalence structure between classes. OWL provides two types of property types. The first is the data type property, which defines the data type associated with the class. The second is object properties, which are properties of classes and their relationships. OWL defines five properties that an object property can have. By using these five properties, it is possible to extract new information other than the explicitly described ontology, that is, to infer it. In OWL, you can give an object property five properties: transitional, symmetrical, invertible, functional, and inverse.

When a new concept is added in T-Box, it is inferred in five ways according to the characteristics of each property.

Definition 1 Transitive Characteristic: If the property of the property P is transitive, then P (x, z) holds for P (x, y) and P (y, z). Definition 2 Symmetric characteristic: If the property of property P is symmetric, then P (y, z) holds for P (x, y). Definition3 InverseOf property: If property P is inverted with another property P ', then P' (y, x) holds for P (x, y). Definition 4 Functional properties: If the property of property P is functional, then if P (x, y) and P (x, z) are defined, then y and z are the same instance. Definition 5 Inverse Functional Property: If the property of property P is inverse, if P (y, x) and P (z, x) are defined, then y and z are the same instance. X, y, z in definitions 1,2,3 are classes or instances, and x, y, z in definitions 4, 5 are instances. This definition is summarized in FIG.

If the defined ontology contains an object property and the property of the object property satisfies one or more of the above five properties, then the information that was not explicitly given in the initial ontology, i.e. between objects or instances It can be inferred that the property can hold additionally. Inference must be performed according to the properties of properties in the ontology. In the transitional characteristic, P (x, z) should be inferred from P (x, y), P (y, z). In the symmetrical nature, P (y, x) must be deduced from P (x, y). P '(y, x) is inferred from P (x, y) when P and P' are inverted. In functional features, we must infer that y and z are the same from P (x, y) and P (x, z), and inversely, y and z from P (y, x) and P (z, x) We must infer the same.

Hypercube is an index structure composed of multi-dimensional bits for efficient search and inference of ontology. The hypercube is composed of a combination of a three-dimensional bit cube representing the T-box ontology and a two-dimensional bit matrix representing the ontology of the A-box, which is expressed in a multi-dimensional form. It is possible to perform inference accordingly. The ontology defined in the T-Box is represented by the bit cube of the hypercube, and the bit matrix for the instances of the A-Box area created through the T-Box definition. Therefore, the whole ontology is in the form of a hypercube in which a bit cube and a bit matrix are combined as shown in FIG. 3. Bitcube is an index structure for storing the result of inference according to the properties of the ontology T-Box and the properties defined in the T-Box.

Classes defined between T-Boxes and properties between classes through bit cubes are expressed in three-dimensional space using bits. FIG. 2 is a diagram for explaining a bit cube structure in which the ontology of the entire T-box is expressed in a three-dimensional space using bits in the inference retrieval method using the hypercube of the present invention. As shown in Figs. 2 and 3, two axes of a bitcube consist of a bitmap for representing a class and a relationship between classes and a bit vector for expressing a plurality of properties. Both the horizontal and vertical axes of a bitmap represent a class. When the number of classes is n, it consists of n ㅧ n. The rows and columns of the bitmap define the relationship between classes. In addition, a bit vector represents a property. When the number of properties is m, m bits are represented as an array of m bits, and n ㅧ n bit vectors exist to express property relationships existing between classes. A bit vector is an array of bits to express the presence or absence of a property between a class and a class. If the number of properties defined in the T-Box is m, one bit vector is represented by m bit arrays. An identifier between 0 and m-1 for each of the m properties is assigned to each property, and a value corresponding to the position of the identifier of the property from the bit vector equal to 1 indicates that there is a property relationship. If there is no property relationship, it can be expressed as 0 to indicate the presence or absence of a property relationship. In addition, bit vectors exist for (C0, C0), ... (Cn-1, Cn-1) for n classes. Accordingly, when the number of classes is n, the number of classes is n ㅧ n, and one bit vector represents a property relationship existing between two classes. To express the property relationship existing between classes through bit vectors, a unique vector for each property must be created. Eigenvectors allow you to distinguish one property from an entire property. A unique vector is created by assigning an identifier between 0 and m-1 for m properties, setting the value corresponding to the property identifier's position in m bit arrays to 1, and setting the remaining values to 0. As shown in Table 1, if five properties exist, each property vector is assigned five unique identifiers (an integer between 0 and 4) and the bit value corresponding to the identifier location is set to 1. The attribute can be identified by Table 1 below summarizes examples of defining property identifiers and property vectors for each property.

Figure pat00001

In addition, m class identifiers are assigned to the m classes. If m classes exist, assign an identifier to identify the class with a value between 0 and m-1, create a bit cube consisting of m ㅧ m, and create a property vector between classes. Create a bit cube index by expressing the relationship through If the relationship defined in the T-Box is P k (C i , C j ), the relationship can be expressed by adding a property vector corresponding to P k to the i th row and j th column of the bit cube index. have. In addition, when a property is added between a specific class and the class, the added information can be reflected through the OR operation between the bit vector and the unique vector of the property. For example, if the properties are defined as shown in Table 1, and the properties existing between classes C0 and C1 are ISA and hasMember, the bit vector corresponding to (C0, C1) is 11000. In addition, if a property called hasAdvisor is added between C0 and C1, the added property is set by setting the result of OR of bit vector 11000 of (C0, C1) and bit vector 00100 of hasAdvisor to bit vector value of (C0, C1). It can be reflected.

Table 2 below represents the identifier for the class and the concept defined in the T-Box.

Figure pat00002

If you create a bit cube through the properties and classes defined in Table 1 and Table 2, it is expressed as shown in Table 3. Bitmaps define classes and relationships between classes for a single property. The bitmap consists of a two-dimensional matrix of n ㅧ n for n classes. The bitmap in FIG. 3 is a cross section of the class and the axis of the class in the bit cube. When the number of properties is m, the bit cube is composed of m bitmaps. When n classes exist, each class can be assigned an identifier of an integer value from 0 to n-1, and the relationship between classes can be defined by the row and column position corresponding to the identifier in the bitmap. Pk (Ci, Cj) can be expressed by setting the bit values of the i-th row and the j-th column of the bitmap corresponding to Pk to one. For example, if four classes exist, as shown in Table 2, an identifier between 0 and 3 is assigned.When triples exist, as shown in Table 3, two triples (Professor, Person) and ISA (Student, Person) is the bit value of the position of the 1st row corresponding to the Professor's identifier and the 0th column corresponding to the Person in the bitmap, and the bit value of the 2nd row corresponding to the Student and the 0th column corresponding to the Person. It can be expressed as Figure 9 by setting to 1. For each property, create a bitmap as shown in Table 3 in the same way.

Figure pat00003

The bit cube is a structure that expresses the ontology of the entire T-Box in three-dimensional space using bits, and consists of a set of bitmaps or entire set of bit vectors for all properties. 2 shows the structure of a bit cube. Bitmaps exist for each property, and a bit vector exists between specific classes. And using a bit cube performing inference based on the nature of the property from the ontology defined in the T-Box, inference is performed using a vector of bits with each bit map for a specific class C i P x (C i, C P x , and C y satisfying y ) are verified by querying the row corresponding to C i in the bit cube. In addition, P x and C y satisfying P x (C y, C i) for C i are identified by looking at the column corresponding to the bit C i in the cube. Algorithm 1 and Algorithm 2 describe how to look up classes from rows and columns of bit cube indexes through classes.

Algorithm1 FindClassesFromRow

-------------------------------------

Input: class C, property P

Output: class list L

-------------------------------------

L ← NULL

for (i = 0; i <bitCube.columnCount; i ++) {

  if bitCube [C] [i] ∧ P is true then

    L ← i th class

  end if

}

Algorithm2 FindClassesFromColumn

------------------------------------

Input: class C, property P

Output: class list L

------------------------------------

L ← NULL

for (i = 0; i <bitCube.columnCount; i ++) {

  if bitCube [i] [C] ∧ P is true then

    L ← i th class

  end if

}

When a new attribute relationship is added, the reasoning is performed using the bit cube index as follows. If P (C i , C j ) is added for the symmetric attribute P, add P (C j , C i ). Then, C x and C y satisfying P (C i , C x ) and P (C j , C y ) from the bit cube for C i and C j , respectively, can be identified through the rows of the bit cube. After obtaining a class corresponding to the C x and C y and bit cube reflects the result after the reasoning for the P (C x, C y) , P (C y, C x). The algorithm is defined in Algorithm 3.

Algorithm3 AddTransitiveProperty

-------------------------------------------------- ------

 Input: subject class Cs, object class Co, property P

 Output: Inference Bit Cube

-------------------------------------------------- ------

 subject class list S ← NULL

 object class list O ← NULL

 S ← Cs + FindClassesFromColumn (Cs, P)

 O ← Co + FindClassesFromRow (Co, P)

 for (i = 0; i <S.length; i ++) {

for (j = 0; j <O.length; j ++) {

Add Inference Bit Cube (Si, P, Oj)

}

 }

In summary, such an attribute relationship may be arranged as shown in FIG. 1.

When an instance search is requested, to obtain the result of inference through the bit cube index, the class to which the instance belongs must be checked. Therefore, the instance and class corresponding to each search word are stored by using the inverted index. When a search request is made, the class containing the instance is searched through the inverted index and the bit cube index is searched through the searched class. Since the classes and properties related to the class can be obtained through the row of the bit cube index, the search speed can be improved by narrowing the search target among the entire classes and properties. The algorithm for performing the search is defined in Algorithm 6.

Algorithm6 FindInferenceInstances

-------------------------------------------------- ------------

 Input: SearchKey value

 Output: ResultInstance values

-------------------------------------------------- ------------

 Classes and id List Set L <C, id> ← NULL

 Property List P ← NULL

 Value List V ← NULL

 L <C, id> ← find classes and instances in inverted index (value)

 for (i = 0; i <L.length; i ++) {

Pi ← find property in Bit Cube (Li)

V ← V + select values from Li <C>, Pi where id = Li <id>

 }

 return V

In most cases, the final result of an inference query requires an instance. However, since a bit cube is an index corresponding to a T-Box, that is, a class, it is not possible to return an instance as a result of an inference query using only the bit cube. Therefore, an additional index structure is needed to meet the case of requesting an instance as the result of inference. A bit matrix is an index structure that stores inferencing results based on properties and properties that exist between instances of A-Box. A bit matrix consists of a two-dimensional array of bits. Each bit matrix stores the result of inference about one property. The method of generating a bit matrix is the same as the method of generating a bitmap in a bit cube, except that each axis in the bitmap is an instance, whereas each axis in the bitmap is a class. The instances that make up the horizontal and vertical axes consist of all instances of the class that corresponds to rows and columns in the T-Box. This allows you to perform inference by accessing the same rows and columns. The bit matrix stores the result of inference about properties with transitive, invertive, functional, and inverse functions for the instance. Therefore, when inferring search for A-Box, inferring result can be obtained by querying bit matrix according to the property of property. Table 4 below is an example of representing a triple of an A-Box using a bit matrix.

Figure pat00004

4 is a flowchart of the inference retrieval method using the hypercube of the present invention. Inference queries assume that one or more keywords are input, and that the search query for a class, property, or instance is included. If the search keyword is a property and a class with properties, each identifier is retrieved, and through this, the bit cube is retrieved to obtain a T-Box inference result. If the search keyword is an instance, the identifier of the instance is inquired and the bit matrix is searched along with the identifier of the property to obtain an inference result of the instance of A-Box. Inference results from bit cubes and bit matrices return only an identifier for each. Therefore, to search the inverted index to search the class, property, and instance corresponding to the identifier, generate the result in SQL about the tables and tuples that need to be accessed in the relational database. Finally, the result of inference is obtained by executing the SQL statement in the relational database through the generated SQL.

The efficiency of the inference retrieval method using the hypercube of the present invention was verified in comparison with the conventional inference retrieval engine. In the experiment, each item was measured according to the performance evaluation criteria suggested by LUBM (Lehigh University Benchmark) to evaluate the performance of inference processing using hypercube. Each item is divided into ontology loading time, storage space size, query result completeness, and query processing time. The specifications of the system used in the experiment are shown in Table 5. The data set provided by LUBM was used for the fairness of the experiment. LUBM provides T-Box of University Ontology in OWL format and provides API (Application Program Interface) to generate A-Box data based on given T-Box. A-Box data is generated based on the number of universities. For example, the A-Box data set according to the number N of universities is represented by LUBM (N, 0). The size of the data was increased by 1, 10, 20, 50 universities, and the experiment was performed by creating an A-Box data set.

Example 1. Data Loading Time

In LUBM, the data loading time, which is the first performance measure of the inference engine, is the time to store the ontology defined by RDF / S and OWL in a relational database. Therefore, in this experiment, we measured the time to load the data set provided by LUBM in OWL form in relational database, and also measured the creation time of hypercube. The loading time measured through the experiment is shown in Table 5. The data loading time is a time for storing the ontology in the relational database from OWL using the API of DLDB (http://swat.cse.lehigh.edu/downloads/dldb.html). Although hypercube indexes are costly in creation time, modifications occur partially for inference after initial creation, and the time of creation of initial indexes does not affect the performance of the entire system because the time is not large. The following table summarizes the data set data load time index (hypercube) generation time.

Data set Triple Instance LUBM (1, 0) 103,397 17,174 LUBM (10, 0) 1,316,993 207,426 LUBM (20, 0) 2,782,419 437,555 LUBM (50, 0) 6,890,933 1,082,818

Data stored in OWL files are stored in MySQL using the API provided by DLDB in relational database. Bit cubes for T-Box and A-Box for specific properties store inference results through bit matrix. It was. Therefore, in this experiment, we measured the size of OWL file, the size of storage stored in MySQL, and the size of bit cube and bit matrix, respectively.

Data set OWL size Storage size Hypercube Index Size LUBM (1, 0) 8.02 30  11 LUBM (10, 0) 102 278 97 LUBM (20, 0) 219 570 186 LUBM (50, 0) 543 1390 463

DLDB proposes to store the ontology through a hybrid schema structure. Hybrid schema structure is a mixture of vertical schema structure and binary schema structure. It uses triple storage method for each property and creates a table for each class to store data. Also, hierarchical relationships for subclasses are defined through views, so they are not stored using separate tables. Therefore, the number of tables can be reduced compared to binary schema storage. When storing ontology using hybrid schema, if the number of classes is Cn, the number of properties is OPn, the number of data types is DPn, and the number of triples is Tn, the actual number of data stored in each table is Tn / ((Cn-DPn) + OPn). Therefore, it can be seen that a large number of triples are efficiently stored when stored using a relational database. For example, for a million triples, using the hybrid schema used in this document, if the number of classes is 10, the number of properties is 10, and the number of data types per class consists of 4 columns, Tn is 1 million. , Cn is 10, OPn is 10, and DPn is 4. Therefore, the number of data stored in each table is stored 20,000 data in the table and a total of 400,000 tuples in 20 tables.

In Table 6, the dataset LUBM (50,0) uses about 543MB of storage when 6 million triples are stored in the form of a simple OWL file, and LUBM (50, 0) when the mapping is stored as a hybrid schema of a relational database. ) Data set is converted into about 80 tables, about 2.7 columns, and about 3.35 million total tuples, and uses 1390MB of storage space. In addition, in order to increase the efficiency of inferential query processing, a separate storage space 463MB is required for storing the hypercube index proposed in the present invention, which means that a total of about 1853MB, that is, about 1.9GB is required. In fact, 6 million triples seem to contain a large amount of information, but from a relational database perspective, the same data can be represented as about 80 tables, about 2.7 columns, and about 3.35 million tuples. This is just the size of data we can access at any time in the business. Therefore, there may be a requirement for an approach to handling inference queries in a reasonable amount of time in a database of this capacity, and its importance will continue to grow. As shown in the table above, the size of the hypercube index occupies only 33% of the total data on average, and it is significant if it can be used to provide an efficient inference query processing algorithm that guarantees completeness. The completeness of the query results is described next.

LUBM provides 14 kinds of queries for evaluating the performance of inference engines. Queries are provided in SPARQL form and provide the result value for each query. LUBM the result query completeness of query results provided by the ÷ LUBM was defined to be measured in the manner of × 100, the measurement results of the query one, two used to, were summarized in Figure 6. Table 7 to demonstrate the performance of the hypercube , 3 (lubm 11, 12, 13 query) and the contents of the query is summarized. As shown in FIG. 6, when using DLDB, a result set for queries 1, 2, and 3 cannot be obtained. The reason is that the properties of subOrganization implied by queries 1 and 2 have transitional characteristics, and the searched organizations and subOrganization, as well as those organizations that have a subOrganization relationship with http://www.University0.edu should be derived accordingly. Although we could not obtain an inferred result that requires repetitive search of the institutions involved, we can achieve 100% query completeness when using the hypercube index structure proposed in the present invention. In addition, the hasAlumnus property included in query 3 should obtain the result of inference about the degreeFrom property and the inverted relationship.However, the DLDB method does not obtain the correct set of answers. You can check it.

Query number vaginal Query content One (type ResearchGroup? X)
(subOrganizationOf? X
http://www.University0.edu)
Type is ResearchGroup
http://www.University0.edu and
Search for institutions in the subOrganizationOf relationship
2 (type Chair? X)
(type Department? Y)
(worksFor? X? Y)
(subOrganizationOf? Y
http://www.University0.edu)
http://www.University0.edu and
Search for the department in the subOrganizationOf relationship and the Chairs that work in that department.
3 (type Person? X)
(hasAlumnus
http://www.University0.edu? X)
Search for people who have a hasAlumnus relationship with http://www.University0.edu.

Example 2 Response Time According to Query Processing

In this experiment, we modified the SQL in the DLDB to measure the query response time using the index. Since DLDB cannot provide 100% completeness of query, response time was measured after satisfying 100% completeness by changing SQL for queries 1, 2, and 3. If you do not do so, queries that do not get the full results will have a relatively fast response time. Therefore, response time was measured and compared when the SQL provided by DLDB and the index were used. The modified SQL enables self-join to obtain a set of correct answers when inferences about transitional characteristics such as queries 1 and 2 are obtained, and inferential about inverseOf as in query 3 The query was obtained by adding a query for the relation and performing a union operation. The results are summarized in Table 8 and FIG. 7. As can be seen in Table 8 and FIG. 7, the inference retrieval method using the hypercube of the present invention is faster than the conventional inference retrieval method, and particularly, the increase in efficiency increases exponentially as the retrieval capacity increases. Increasingly, it can be seen that it is very efficient for large inference retrieval.

vaginal LUBM (1,0) LUBM (10,0) LUBM (20,0)  LUBM (50,0) DLDB Hypercube DLDB Hypercube DLDB Hypercube DLDB Hypercube One 2 3 18 5 33 7 92 13 2 47 110  2560  680 3560 1874 7803 4311 3 62  37 3766  379 5772  1007 29409 3523

The embodiments of the present invention described above should not be construed as limiting the technical idea of the present invention. The scope of protection of the present invention is limited only by the matters described in the claims, and those skilled in the art will be able to modify the technical idea of the present invention in various forms. Accordingly, such improvements and modifications will fall within the scope of the present invention as long as they are obvious to those skilled in the art.

Claims (4)

It uses a triple structure of ontology consisting of subject, attribute, and object, and the ontology defines a class, a property that is a subject or purpose, and an attribute between the class and the class. In the inference retrieval method is stored divided into the area of the T-Box and the A-box defined as instances based on the concept defined in the T-Box,
The speculative search of the tee-box includes the steps of: i) assigning n property identifiers to the n properties and generating n property vectors of n bits corresponding to each of the identifiers, ;
Ii) assigning m class identifiers to the class;
Iii) representing the triple structure between the class and the property as m bit cubes of the property vectors for each class; and
Iii) when the instance search is requested, obtaining the associated class and property by searching the property vector stored in the bit cube for the instance and the class to which the instance belongs, and extracting an associated result from the A-box. Inference retrieval method using.
The method of claim 1,
And said triple structure corresponds to one of five types: transitive, symmetric, inverse, functional, and inverse.
The method of claim 1,
The inference search of the A-box expresses and stores the inference result of the triple structure according to the property of the properties and the properties of the properties of the instances as a bit in the form of a 2D bit matrix, and when the instance search is requested, Inferring search method using a hypercube comprising querying a bit matrix according to a characteristic and extracting an associated result from the A-box.
The method of claim 3,
And said triple structure has transitional, symmetrical, invertible, functional, and inverse functional properties.
KR20120013861A 2012-02-10 2012-02-10 Inference query processing using hyper cube KR101318250B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
KR20120013861A KR101318250B1 (en) 2012-02-10 2012-02-10 Inference query processing using hyper cube

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
KR20120013861A KR101318250B1 (en) 2012-02-10 2012-02-10 Inference query processing using hyper cube

Publications (2)

Publication Number Publication Date
KR20130092242A true KR20130092242A (en) 2013-08-20
KR101318250B1 KR101318250B1 (en) 2013-10-15

Family

ID=49217130

Family Applications (1)

Application Number Title Priority Date Filing Date
KR20120013861A KR101318250B1 (en) 2012-02-10 2012-02-10 Inference query processing using hyper cube

Country Status (1)

Country Link
KR (1) KR101318250B1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956085A (en) * 2016-04-29 2016-09-21 合网络技术(北京)有限公司 Reverse indexing construction method and apparatus as well as retrieval method and apparatus
CN110826145A (en) * 2019-09-09 2020-02-21 西安工业大学 Automobile multi-parameter operation condition design method based on heuristic Markov chain evolution

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101016110B1 (en) * 2008-06-30 2011-02-17 주식회사 케이티 System and method for extracting ontology instance using ontology property

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105956085A (en) * 2016-04-29 2016-09-21 合网络技术(北京)有限公司 Reverse indexing construction method and apparatus as well as retrieval method and apparatus
CN105956085B (en) * 2016-04-29 2019-08-27 优酷网络技术(北京)有限公司 A kind of construction method and device, search method and device of inverted index
CN110826145A (en) * 2019-09-09 2020-02-21 西安工业大学 Automobile multi-parameter operation condition design method based on heuristic Markov chain evolution

Also Published As

Publication number Publication date
KR101318250B1 (en) 2013-10-15

Similar Documents

Publication Publication Date Title
Faye et al. A survey of RDF storage approaches
Wood et al. Kowari: A platform for semantic web storage and analysis
Das et al. A Tale of Two Graphs: Property Graphs as RDF in Oracle.
Pan et al. Ontosearch2: Searching and querying web ontologies
Kellou-Menouer et al. A survey on semantic schema discovery
Hertel et al. RDF storage and retrieval systems
Banane et al. Storing RDF data into big data NoSQL databases
Ait-Ameur et al. Ontologies in engineering: the OntoDB/OntoQL platform
Bergamaschi et al. Keyword search over relational databases: Issues, approaches and open challenges
Stefanidis et al. A context‐aware preference database system
El Idrissi et al. RDF/OWL storage and management in relational database management systems: A comparative study
Yan et al. RDF approximate queries based on semantic similarity
Ma et al. Modeling and querying temporal RDF knowledge graphs with relational databases
Álvarez-García et al. Compact and efficient representation of general graph databases
Santana et al. An analysis of mapping strategies for storing rdf data into nosql databases
KR101318250B1 (en) Inference query processing using hyper cube
Svoboda et al. Linked data indexing methods: A survey
Albahli et al. Rdf data management: A survey of rdbms-based approaches
RU2605387C2 (en) Method and system for storing graphs data
Abbas et al. Selectivity estimation for SPARQL triple patterns with shape expressions
Hauswirth et al. Linked data management
Li et al. Research on storage method for fuzzy RDF graph based on Neo4j
Zhang et al. Storing fuzzy description logic ontology knowledge bases in fuzzy relational databases
Li et al. Object-stack: An object-oriented approach for top-k keyword querying over fuzzy XML
Litvinov et al. Paradigm of controls concept for global information systems

Legal Events

Date Code Title Description
A201 Request for examination
E902 Notification of reason for refusal
E701 Decision to grant or registration of patent right
GRNT Written decision to grant
FPAY Annual fee payment

Payment date: 20161007

Year of fee payment: 4

FPAY Annual fee payment

Payment date: 20170905

Year of fee payment: 5

FPAY Annual fee payment

Payment date: 20180905

Year of fee payment: 6

FPAY Annual fee payment

Payment date: 20190826

Year of fee payment: 7