WO2020101470A1 - System and method for tree based graph indexing and query processing - Google Patents

System and method for tree based graph indexing and query processing Download PDF

Info

Publication number
WO2020101470A1
WO2020101470A1 PCT/MY2019/050082 MY2019050082W WO2020101470A1 WO 2020101470 A1 WO2020101470 A1 WO 2020101470A1 MY 2019050082 W MY2019050082 W MY 2019050082W WO 2020101470 A1 WO2020101470 A1 WO 2020101470A1
Authority
WO
WIPO (PCT)
Prior art keywords
graph
entities
indexing
query
tree
Prior art date
Application number
PCT/MY2019/050082
Other languages
French (fr)
Inventor
Yasaman EFTEKHARYPOUR
Chuan Hai NGO
Meng Wei CHUA
Hong Hoe ONG
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2020101470A1 publication Critical patent/WO2020101470A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures

Definitions

  • the present invention relates to the field of graph Indexing method and query processing from a graph database. More particularly, the present invention relates to a computing system ami method for performing tree-based graph indexing and accelerated query processing by optimal utilization of available resources.
  • the existing prior arts follow the process of reading the graph tiles, separating graph identities and store each group in separate lists, removing the duplicates by searching inside list of Identities, giving a unique ID to each entity in non- redundant list, saving each entity along with its unique ID in a lookup table, reading the graph file again, entity separation and searching the unique ID of the identities in the lookup table and create a new list of graph (GNR) relations with having each node represented with Its unique ID.
  • GNR graph
  • the patent application US 20000064424 A1 discloses a method for creating a bit stream from an indexing tree including a plurality of hierarchical levels, to each of which one or several index nodes are assigned.
  • the index nodes contain index data, which is sorted In the indexing tree according to one or several given criteria.
  • Index data of the index nodes is inserted into the bit stream, and the information concerning the position within the bit stream, where the Index data of one or several Index nodes of the hierarchical level located below the hierarchical level of the respective node is situated, is inserted into the bit stream for an index node.
  • each tuple-part is encoded into a unique part identifier (UPI), each UPI comprises a tag at a fixed position within the UPI.
  • the tug indicates the datatype of the encoded tuple-part.
  • the content data for the tuple-part is encoded in a code that is configured to reflect the ranking or order of the content data, corresponding to each datatype, relative to other tuples In a sot of tuples.
  • the code comprises a hashcode and for content data that comprises or includes a numeric value, the code comprises an immediate value that directly stores the numeric value without encoding.
  • Yet another prior art US 20090024590 A1 discloses a scalable graph database, having a type system created by interaction of users with the graph database and stored in the graph database Itself, a namespace model built on said type system, wherein names are resolved against a dataset rather than being pre-dec lared, a dynamically generated, user contributed, accretive database schema, wherein data entry via means operable by a community of users creates types in said type system that are then instantly available via a query API, said query API further comprising a tree-based object/properly query language; wherein graph database queries are Informed by said dynamically generated schema; wherein schema building is collaborative and not a separate activity from data entry; and wherein existing relationships in said graph database continue to function as said schema is expanded: and a database store, wherein objects in said database store comprise versioned primitives that are attributed to a graph database contributor.
  • the present invention provides a system for tree-based graph entity indexing and accelerated query executing.
  • the present system includes a memory unit for temporary storage of a number of graph data associated with one or more graph database during creation of at least one tree, one or more processors to perform a number of calculations related to the graph data based on a number of processor executable instructions and. one or more graphics processing units, GPU, having a number of graphic processors to perform a. variety of simultaneous tasks on the graph data.
  • the simultaneous tasks performed on the graph data by the GPU accelerates parsing, indexing and query executing on the graph data associated with the graph database and assists in the present tree-based graph entity indexing and queiy processing method.
  • the present system further includes a storage unit to store a number of serialized trees in .form of a number of files to be used, while performing accelerated query executing.
  • the system further include a graph processor module executable by the processor wherein the graph processor module includes a query scheduler unit to schedule the graphic processors in processing a number of graph query requests based on the available resources including the graphic processors and the corresponding query waiting times, a graph load request executing unit, to load each of the graph databases, a graph update request executing unit to update each of the graph databases and a graph query executing unit to process the graph query requests.
  • the number of simultaneous tusks performed using the GPU includes automated removal of a number of duplicate graph entities during loading of the plurality of graph data.
  • the graph load request processor performs a number of tasks including creating a number of entity indexing trees of the plurality of graph data associated with the graph database using a GPU-based parser, serializing the number of entity indexing treea into the files and atoring the files containing the entity indexing trees of graph data in the storage unit.
  • the query scheduler unit schedule processing of the number of graph query requests bused on a number of available resources including the number of graphic processors and the associated query waiting times.
  • the present invention also provides a method for tree-bused graph entity indexing und accelerated query executing using the processor executing the number of processor executable instructions and the GPU -based parser. The method includes the steps of verifying presence of a plurality of previous entity indexing trees for a number of graph data associated with at least one graph database, where an absence of the a previous entity indexing trees further follows a process of compressing and indexing entitiea of the graph data asaociated with the graph database using a tree-based representation of an input relationship data.
  • the present method further includes the steps of creating a number of entity indexing trees of the plurality of graph data associated with the graph database using the GPU-based parser, serializing the entity indexing trees into a. number of files and storing the files containing the graph data in a storage unit associated with the present system.
  • the method further follows the steps of loading the files having a number of serialized entity indexing trees and de-serializing the files into the corresponding entity indexing trees
  • the step of creating the entity indexing trees of the graph data using the GPU-based parser further includes the steps of reading the graph data and separating a number of entities of the graph data in a number of binary files in parallel using a GPU.
  • the binary files include a subject's binary file, an objects binary file and a predicates binary file.
  • the process of creating the entity indexing trees of the graph data using the CPU-based parser further includes the steps of creating a pair of index-based representation files separately for a number of subject entities and a number of object entities using a first entity indexing tree and creating an index-based representation file in parallel for a number of predicate entities using a second entity indexing tree.
  • creating the pair of index-based representation file further includes the steps of creating a first empty radix tree and a pair of indexed-based representation files for the subject entities the object entities, initializing a first counter, indexing the subject entities using the first empty radix tree, the first counter and a subject raw file and store them in the indexed-based representation file of the subject entities and indexing the plurality of object entities using the first empty radix tree, the first, counter and an object raw file and store them in tiro indexed-based representation file of the object entitles.
  • creating the index-bused representation file further includes the steps of creating a second empty radix tree and one indexed-based representation file for the predicate entities, initializing a second counter, and indexing the predicate entities using the second empty radix tree, the second counter and a predicate raw file and store them in the indexed-based representation file of predicate entities.
  • creating the plurality of entity indexing trees further includes the steps of creating the first entity indexing tree containing the plurality of subject entities in the subjects binary file and the plurality of object entitles in the objects binary file and the second entity indexing tree containing the plurality of predicate entities in the predicates binary file.
  • indexing the plurality of subject entities using the first empty radix tree, the first counter and the subject raw file and store them in the indexed-based representation file of the subject entities further includes the steps of checking existence of each element from the subject raw file in an input tree, fetching a next element from the subject raw file when ti previous element is found in the Input tree, adding each element to the input tree in form of it new node and assigning un input, counter in form of n leave value on the new node when the element is absent in the input tree, incrementing a value of the input counter by one and adding the input counter to an input index -based representation file.
  • Indexing the plurality of object entities using the first empty radix tree, the first counter and the object raw file and store them in the indexed-based representation file of the object entities further includes the steps of checking existence of each clement from the object raw file in the input tree, fetching the next element from the object raw file when u previous element is found in the input tree, adding each element to the input tree in form of the new node and assigning the input counter in form of a leave value on the new node when the clement is absent In the input tree; incrementing the value of the input counter by one and adding the input counter to the input index-based representation file.
  • indexing the plurality of predicate entities using the second empty radix tree, the second counter and the predicate raw file and store them in the indexed-bnsed representation file of predicate entities further includes the steps of checking existence of each element from the predicate raw file in the input tree, fetching the next element from the predicate raw file when a previous element is found in the input tree, adding each element to the input tree in form of the new node and assigning the input counter in form of a leave value on the new node when the element is absent in the input tree, incrementing the value of the input counter by one, and adding the input counter to the input index-based representation file.
  • processing of a plurality of graph query requests includes the steps of translating an input, query using the first empty radix tree for the subject entities the object entities and the second empty radix tree for the predicate entities into a plurality of indices, performing a plurality of graph query processing steps, when an input is a applying an algorithm to a graph, the steps include applying the graph algorithm query processing steps on a sub-graph, using the first filled radix tree for the subject and the object entitles and the second filled radix tree for the predicate entities to translate n plurality of graph algorithm query processing results from the indices to a plurality of text-based labels iirul storing the graph query processing results in a buffer.
  • An Input sub-graph creation filter is further applied on the graph to create a sub-graph in a plurality of steps when the input query type is a sub-graph filter, where the steps include applying an index-based filter on a community of the plurality of Index-based representation file of the subject, object, predicate binary files to create a list of nodes satisfying the index-based filter and creating a new sub-graph based on a plurality of reported nodes, where the plurality of reported nodes is maintained for a plurality of further Input queries.
  • Figure 1 is a block diagram showing a number of components of the present, system for tree-based graph entity indexing and accelerated query processing, according to a preferred embodiment of the present invention.
  • Figure 2 is a tree-based graph entity indexing utilizing minimal memory space and with optimal uae of the graphics processing unit resources, according to an exemplary embodiment of the present invention.
  • Figure 3a is a flowchart showing the mclhod for tree-based graph entity indexing and accelerated query processing, according to a preferred embodiment of the present invention.
  • Figure 3b is a flowchart showing Ihe steps of creating the entity indexing trees of the graph data using the GPU-based parser, according to a preferred embodiment of the present invention.
  • Figure 3c is a flowchart showing the additional stops of creating the two index-based representation files disclosed in step, according to a preferred embodiment of the present invention.
  • Figure 3d is a flowchart showing the additional steps of creating the index-based representation file in parallel for the predicate entities in the second entity indexing tree, according to a preferred embodiment of the present invention.
  • Figure 3c is a flowchart showing the additional steps of indexing the graph entities into the text-based file, according to a preferred embodiment of the present invention.
  • Figure 3f is a flowchart showing the processing steps of the graph query requests, according to a preferred embodiment of the present invention.
  • the present invention relates to a computer-assisted system for tree-based graph entity indexing and accelerated query executing, according to a preferred embodiment and one or more alternate embodiment of the present invention.
  • Modem computing systems store relationship data from the complex structured convents such as XML documents, and other geospatial data in form of graphs and form graph databases.
  • each node of the graph represents an entity and an edge between any two nodes represent the relationship between these two nodes.
  • the graph-based representation of the relationship data is then converted to a text-based formatted file and stored for retrieval and processing upon receiving an input query.
  • the present invention proposes an effective and less resource intensive means for the rapid processing of an input query to extract a sub-graph out of the stored graph or for searching something inside the text-based formatted file representing the graph.
  • the present invention makes use of the parallel computing capabilities of the large number of graphic processors in a graphics processing unit (GPU) associated with the serial data processor(s) or the traditional processors) or central processing unit(s) (CPU) associated with any computing system.
  • GPU graphics processing unit
  • CPU central processing unit
  • the present invention Anther discloses a method of indexing the tree-based graph entity into numerical values for using the GPUs as a co-processor to accelerate the input query processing.
  • FIG. 1 is a block diagram showing a number of components of the present system 100 for tree-based graph entity indexing and accelerated query executing, according to a preferred embodiment of the present invention.
  • the present system 100 for tree-based graph entity indexing and accelerated query executing removes the redundant steps while processing the graph file based on the input graph query by doing multiple steps simultaneously and accelerates the process significantly using GPU 106 of the present computer system 100.
  • the final indexed data produced by the present system 100 cun he fitted into small memory space compared to the already existing indexing methods.
  • the present system 100 for tree-based graph entity indexing and accelerated query executing includes a memory unit 102 for tcmporaiy storage of a number of graph data associated with a graph database during creation of one or more trees from the graph data.
  • the system 100 further includes a processor 104 to perform a number of calculations related to the graph data based on a number of processor executable instructions and at the same time utilizes the GPU 106 having a number of graphic processors to perform a variety of simultaneous tasks such as, parsing, indexing and query executing on the graph data associated with a graph database.
  • a storage unit 108 associated with the processor 104 store a number of serialized trees in form of a number of files to be used while performing accelerated query executing using the present system 100.
  • the system 100 further includes a graph processor module (1 10) executable by the processor (104) wherein the graph processor module (1 10) includes code executable to optimize graph related process such as graph load, graph update, graph query, query scheduler etc,
  • the processor 104 communicates with the GPU 106 in real- time to perform the variety of simultaneous tusks such as, parsing, indexing and query executing on the graph data.
  • the graph processor module 1 10 executable by processor 104 that is in communication with the GPU 106 further includes a query scheduler unit
  • the simultaneous or parallel tasks performed using the GPU 106 includes automated removal of a number of duplicate graph entities during loading of the graph date. This saves time as the process of identification and removal of the duplicate graph entities is performed during the loading of the graph data into the memory unit 102.
  • the graph load request processor 1 12 performs a number of tasks including creating a number of entity indexing trees such as the subject, object trees and the predicate trees of the variety of graph data associated with the graph database using a GPU-based parser.
  • the graph load request processor 112 further performs the serialization of the entity indexing trees into the text-based files and stores the files containing the entity indexing graph data in the storage unit 108 for future reference.
  • the query scheduler unit 1 18 performs intelligent automated scheduling of the processing tasks of the graph query requests based on the available resources including the graphic processors and the query waiting times.
  • FIG. 2 illustrates a tree-based graph entity indexing 200 utilizing minimal memory space and with optimal use of the graphics processing unit 106 resources, according to an exemplary embodiment of the present invention.
  • the relationship data is stored in a form of graphs and form graph databases.
  • each node represents an entity and the edge between any two nodes represent the relationship between these two nodes.
  • Frum the graph shown in Figure 2 each unit of graph data triples are placed as a branch of a radix tree. The leave value of each triple unit is assigned as the unique numerical ID of that element.
  • the present system 100 stores the relationship graph in a text-based file format and performs accelerated processing of any query to the relationship graph either with the purpose of extracting a sub-graph out of the graph or searching something inside that graph utilizing the parallel processing of the query steps uaing the CPU 106 resources.
  • the queries to the relationship graph for extract ing a sub-graph out of the graph or searching something inside that graph needs the graph entities to be read from the text-based file stored in the storage unit 108 and to be distinguished correctly for providing accurate results.
  • the tree based word and number mapping utilizing the present system 100 also performs automatic duplicate removal of graph entities while loading graph data, thereby accelerating the query processing.
  • the GPU-based parser accelerates the process of separation and grouping of graph entities and the GPU -based graph query processor accelerate the process of executing sub-graph extraction and graph queries execution.
  • the graph updater updates the already loaded and id mapped graph with new information during each processing stage and the graph queries scheduler schedules the processing of multiple graph queries based on available resources and query waiting time.
  • FIG. 3a is n flowchart 300 showing the method for tree-based graph entity indexing and accelerated query executing, according to a preferred embodiment of the present invention.
  • the present method for tree-based graph entity indexing and accelerated query processing starts with the step of loading the graph data for first lime or verifying presence of one or more previous entity indexing trees for the graph data associated with the selected graph database as in step 302.
  • the process further follows the steps of creating a number of entity indexing trees such as the subject and object entity indexing trees and the predicate entity indexing trees of the graph detu associated with the graph database using the GPU-based parser, as in step 304,
  • the entity indexing trees are serialized into a number of files as in step 306, In a preferred embodiment, the entity indexing trees are serialized into a number of text-based files and the files containing the entity indexing trees of graph data are stored in the storage unit 108 of the present system 100, as in step 30
  • FIG. 400 is a flowchart. 400 showing the steps of creating the entity Indexing trees of the graph data using the GPU-based parser, according to a preferred embodiment of the present invention.
  • the method includes the steps of reading the graph data as in step 402 and separating a number of entities of the graph data in one or more subject; predicate and object entities in parallel using the GPU 106, as in step 404,
  • the binary files thus created include a subject's binary file, an objects binary file and a predicates binary file.
  • the step of creating the entity indexing trees further includes the steps of creating a first entity indexing tree containing the subject entities in the subject's binary file and the object entities in the objects binary file and a second entity indexing tree containing the predicate entities in the predicates binary file.
  • a pair of index-based representation files is created separately for the subject entities and the object entities in the first entity indexing tree.
  • Another step as in 408, creates an index-based representation file in parallel with creating a pair of index-based representation files separately for a number of subject entities and a number of object entities, for the predicate entities in the second entity indexing file.
  • FIG. 3c is a flowchart 500 showing the additional steps of creating the two index-based representation files for subject and object entities disclosed in step 406, according to a preferred embodiment of the present invention.
  • the additional steps of cresting the two index-bused representation files includes creating a first empty radix tree and a pair of indexed-based representation files for the subject entities and the object, entities as in step 502, followed by initializing a first counter or a subject-object tree (SOT) counter in step 504, indexing the subject entities using the first empty radix tree, the first counter and the separated subject entities and store them in the indexed- based representation file of the subject entities as in step 506 and indexing the object entitles using the first empty radix tree, the first counter and the separated object entities and store them in the indexed-based representation file of the object entities as In step 508.
  • SOT subject-object tree
  • FIG. 3d is a flowchart 600 showing the additional steps of creating the index-based representation file for predicate entities in parallel for the predicate entities in the second entity indexing tree, according to a preferred embodiment of the present invention.
  • the steps of creating the index-based representation file for predicate entities in parallel for the predicate entities includes creating a second empty radix tree and one indexed-based representation file for the predicate entities as in step 602, followed by initializing a second counter as in step 604 and indexing the predicate entities using the second empty radix tree, the second counter and the separated predicate entities and store them in the indexed-based representation file of predicate entities as in step 606.
  • FIG. 3e is a flowchart 700 showing the additional steps of indexing the graph entities into the text-based file, according to a preferred embodiment of the present invention.
  • the process of indexing the subject entities using the first, empty radix tree, the first counter and the separated subject and store them in the indexed-based representation file of the subject entities further includes the steps of checking existence of each element from the separated subject entities in the radix tree as in step 702, fetching a next element from the subject raw file when a previous element is found in the input tree as in step 704, adding each element to the radix tree in form of a new node and assigning the value of the first counter in form of a leave value on the new node when the element is absent in the input tree as in step 706, incrementing value of the first counter by one in step 708 and adding the element’s leave value to the input index- based representation tile for further storage in the storage unit 108.
  • the above process is repeated during the indexing the objects entities using the first empty radix tree, the first counter and the
  • the above process Is also repeated during the indexing of the predicate entitles using the second empty radix tree, the second counter and the separated predicate entities and store in the indexed-bused representation file of the separated predicate entities.
  • the process follows the similar steps of cheeking existence of each element from the separated predicate entities in the second radix tree, fetching the next element from the predicate raw file when a previous element is found in the input tree, adding each element’s leave value in the second radix tree to the index-based representation file when each clement is found in the seeonri radix tree, adding each element to the second radix tree in form of a new node and assigning the value of the second counter in form of a leave value for the new node when the element is absent in the second radix tree, incrementing value of the second counter by one and adding the element's leave value to the index-based representation file.
  • the graph query processor unit 1 16 of the present system 100 executes the graph query requests in a number of steps detailed in the flowchart 800 shown in Figure 3f, according to a preferred embodiment of the present invention.
  • the execution of the graph query requests includes the steps of translating an input query using the first filled radix tree for the subject entities the object entities and the second filled radix true for the predicate entities Into a number of indices as in stop 802. Now the type of query Is determined in step 804 and if an input query type is a graph, then the graph query executing steps from 806 to 810 are followed in sequence. When the input query type is a graph, then the graph algorithm or processing steps discussed above is applied on a sub-graph as in step 806.
  • the first filled radix tree is utilized for the subject entities and the object entities and the second filled radix tree for the predicate entities to translate the graph queiy executing results from the indices to the corresponding text-based labels
  • the graph quety processing results are then stored in a buffer for future use-cases like visualization of returning back to user as in step 810.
  • the input query type is a sub-graph filler
  • the graph query executing steps from 812 to 814 are followed in sequence.
  • the input query type is a subgraph filter
  • an index-based filter on a community of the index-based representation file of the subject, object, predicate binary files is applied to create a list of nodes satisfying the index-based filter as in step 812.
  • a new sub-graph is then created in stop 814 based on the reported nodes and is kept as such and waits for further queries.
  • the present system 100 reduce the number graph file reading and parsing times by doing the simultaneous tasks such as separating graph entities, removing redundant graph entities and assigning unique ID to each graph entity utilizing the GPU 106 and the processor 104 resources together. Furthermore the present system stores the graph entities of big graphs in less memory space by using the proposed tree-based data structure and accelerates the graph updating process by using the constructed tree-based data storage.

Abstract

A system ( 100) for tree-based graph indexing and accelerated query executing includes a memory unit (102) for temporary storage of graph data associated with a graph database during creation of a tree, a processor (104) to perform calculations related to the graph data based on a number of processor executable instructions, a graphics processing unit (106) to perform simultaneous tasks on the graph data and a storage unit (108) to store a number of serialized trees in form of a number of text based files for performing accelerated query processing. The graph processor module (110) includes a query scheduler unit (118) to schedule the graphic processors in processing the graph query requests, a graph load request executing unit (112) to load the graph database, a graph update request executing unit (114) to update the graph database and a graph query executing unit (116) to process the graph query requests.

Description

SYSTEM AND METHOD FOR TREE BASED GRAPH INDEXING AND
QUERY PROCESSING
TECHNICAL FIELD OF THE INVENTION
The present invention relates to the field of graph Indexing method and query processing from a graph database. More particularly, the present invention relates to a computing system ami method for performing tree-based graph indexing and accelerated query processing by optimal utilization of available resources.
BACKGROUND OK THE INVENTION
Modern computing systems stow relationship data from the complex structured contents such as XML documents, and other geospatial data in form of graphs and form graph databases. Nowadays relationship data are stored in a form of graphs and form graph databases. In these graphs, each node represents an entity and the edge between any two nodes represent the relationship between these two nodes. One of the common scenurios in mapping numerical Ids to graph Identities is to first identify nodes in an input graph and after removing the duplicate nodes, assign and unique ids to every graph Identities. The existing prior arts follow the process of reading the graph tiles, separating graph identities and store each group in separate lists, removing the duplicates by searching inside list of Identities, giving a unique ID to each entity in non- redundant list, saving each entity along with its unique ID in a lookup table, reading the graph file again, entity separation and searching the unique ID of the identities in the lookup table and create a new list of graph (GNR) relations with having each node represented with Its unique ID. However, the existing methods for performing the query processing in graph databases stored as large number of text based files includes some redundant steps, which in turn increases the processing time and resources requirements, The prosed method utilizes the processing capabilities of graphical processing units available in computing systems along with the processors to accelerate the query indexing and processing steps. Furthermore the proposed method removes the redundant steps during the graph query processing and accelerates the processing speed by performing simultaneous tasks compared to the prior arts,
There are several prior arts discussing the different methods for processing graph data ill a computing system, some of which are listed below for reference. The patent application US 20000064424 A1 discloses a method for creating a bit stream from an indexing tree including a plurality of hierarchical levels, to each of which one or several index nodes are assigned. The index nodes contain index data, which is sorted In the indexing tree according to one or several given criteria. Index data of the index nodes is inserted into the bit stream, and the information concerning the position within the bit stream, where the Index data of one or several Index nodes of the hierarchical level located below the hierarchical level of the respective node is situated, is inserted into the bit stream for an index node.
Another prior art, US 20080243770 A1 discloses a method for creating a graph database, which ia arranged to store, or process data in the form of graph tuples, In an embodiment, each tuple-part, is encoded into a unique part identifier (UPI), each UPI comprises a tag at a fixed position within the UPI. The tug indicates the datatype of the encoded tuple-part. The content data for the tuple-part is encoded in a code that is configured to reflect the ranking or order of the content data, corresponding to each datatype, relative to other tuples In a sot of tuples. For content data that comprises a character-string, the code comprises a hashcode and for content data that comprises or includes a numeric value, the code comprises an immediate value that directly stores the numeric value without encoding.
Yet another prior art US 20090024590 A1 discloses a scalable graph database, having a type system created by interaction of users with the graph database and stored in the graph database Itself, a namespace model built on said type system, wherein names are resolved against a dataset rather than being pre-dec lared, a dynamically generated, user contributed, accretive database schema, wherein data entry via means operable by a community of users creates types in said type system that are then instantly available via a query API, said query API further comprising a tree-based object/properly query language; wherein graph database queries are Informed by said dynamically generated schema; wherein schema building is collaborative and not a separate activity from data entry; and wherein existing relationships in said graph database continue to function as said schema is expanded: and a database store, wherein objects in said database store comprise versioned primitives that are attributed to a graph database contributor.
None of the above-cited prior arts discloses an effective graph entity indexing and accelerated query processing utilizing processing capabilities of graphical processing units available in computing systems.
SUMMARY OF THUS INVENTION
The present invention provides a system for tree-based graph entity indexing and accelerated query executing. The present system includes a memory unit for temporary storage of a number of graph data associated with one or more graph database during creation of at least one tree, one or more processors to perform a number of calculations related to the graph data based on a number of processor executable instructions and. one or more graphics processing units, GPU, having a number of graphic processors to perform a. variety of simultaneous tasks on the graph data. The simultaneous tasks performed on the graph data by the GPU accelerates parsing, indexing and query executing on the graph data associated with the graph database and assists in the present tree-based graph entity indexing and queiy processing method. The present system further includes a storage unit to store a number of serialized trees in .form of a number of files to be used, while performing accelerated query executing. The system further include a graph processor module executable by the processor wherein the graph processor module includes a query scheduler unit to schedule the graphic processors in processing a number of graph query requests based on the available resources including the graphic processors and the corresponding query waiting times, a graph load request executing unit, to load each of the graph databases, a graph update request executing unit to update each of the graph databases and a graph query executing unit to process the graph query requests. Preferably, the number of simultaneous tusks performed using the GPU includes automated removal of a number of duplicate graph entities during loading of the plurality of graph data. Preferably, the graph load request processor performs a number of tasks including creating a number of entity indexing trees of the plurality of graph data associated with the graph database using a GPU-based parser, serializing the number of entity indexing treea into the files and atoring the files containing the entity indexing trees of graph data in the storage unit.
Preferably, the query scheduler unit schedule processing of the number of graph query requests bused on a number of available resources including the number of graphic processors and the associated query waiting times. The present invention also provides a method for tree-bused graph entity indexing und accelerated query executing using the processor executing the number of processor executable instructions and the GPU -based parser. The method includes the steps of verifying presence of a plurality of previous entity indexing trees for a number of graph data associated with at least one graph database, where an absence of the a previous entity indexing trees further follows a process of compressing and indexing entitiea of the graph data asaociated with the graph database using a tree-based representation of an input relationship data. The present method further includes the steps of creating a number of entity indexing trees of the plurality of graph data associated with the graph database using the GPU-based parser, serializing the entity indexing trees into a. number of files and storing the files containing the graph data in a storage unit associated with the present system. In case of the presence of the previous entity indexing trees, the method further follows the steps of loading the files having a number of serialized entity indexing trees and de-serializing the files into the corresponding entity indexing trees, Preferably, the step of creating the entity indexing trees of the graph data using the GPU-based parser further includes the steps of reading the graph data and separating a number of entities of the graph data in a number of binary files in parallel using a GPU. The binary files include a subject's binary file, an objects binary file and a predicates binary file. The process of creating the entity indexing trees of the graph data using the CPU-based parser further includes the steps of creating a pair of index-based representation files separately for a number of subject entities and a number of object entities using a first entity indexing tree and creating an index-based representation file in parallel for a number of predicate entities using a second entity indexing tree.
Preferably, creating the pair of index-based representation file further includes the steps of creating a first empty radix tree and a pair of indexed-based representation files for the subject entities the object entities, initializing a first counter, indexing the subject entities using the first empty radix tree, the first counter and a subject raw file and store them in the indexed-based representation file of the subject entities and indexing the plurality of object entities using the first empty radix tree, the first, counter and an object raw file and store them in tiro indexed-based representation file of the object entitles.
Preferably, creating the index-bused representation file further includes the steps of creating a second empty radix tree and one indexed-based representation file for the predicate entities, initializing a second counter, and indexing the predicate entities using the second empty radix tree, the second counter and a predicate raw file and store them in the indexed-based representation file of predicate entities.
Preferably, creating the plurality of entity indexing trees further includes the steps of creating the first entity indexing tree containing the plurality of subject entities in the subjects binary file and the plurality of object entitles in the objects binary file and the second entity indexing tree containing the plurality of predicate entities in the predicates binary file.
Preferably, indexing the plurality of subject entities using the first empty radix tree, the first counter and the subject raw file and store them in the indexed-based representation file of the subject entities further includes the steps of checking existence of each element from the subject raw file in an input tree, fetching a next element from the subject raw file when ti previous element is found in the Input tree, adding each element to the input tree in form of it new node and assigning un input, counter in form of n leave value on the new node when the element is absent in the input tree, incrementing a value of the input counter by one and adding the input counter to an input index -based representation file.
Preferably, Indexing the plurality of object entities using the first empty radix tree, the first counter and the object raw file and store them in the indexed-based representation file of the object entities further includes the steps of checking existence of each clement from the object raw file in the input tree, fetching the next element from the object raw file when u previous element is found in the input tree, adding each element to the input tree in form of the new node and assigning the input counter in form of a leave value on the new node when the clement is absent In the input tree; incrementing the value of the input counter by one and adding the input counter to the input index-based representation file.
Preferably, indexing the plurality of predicate entities using the second empty radix tree, the second counter and the predicate raw file and store them in the indexed-bnsed representation file of predicate entities further includes the steps of checking existence of each element from the predicate raw file in the input tree, fetching the next element from the predicate raw file when a previous element is found in the input tree, adding each element to the input tree in form of the new node and assigning the input counter in form of a leave value on the new node when the element is absent in the input tree, incrementing the value of the input counter by one, and adding the input counter to the input index-based representation file.
Preferably, processing of a plurality of graph query requests includes the steps of translating an input, query using the first empty radix tree for the subject entities the object entities and the second empty radix tree for the predicate entities into a plurality of indices, performing a plurality of graph query processing steps, when an input is a applying an algorithm to a graph, the steps include applying the graph algorithm query processing steps on a sub-graph, using the first filled radix tree for the subject and the object entitles and the second filled radix tree for the predicate entities to translate n plurality of graph algorithm query processing results from the indices to a plurality of text-based labels iirul storing the graph query processing results in a buffer. An Input sub-graph creation filter is further applied on the graph to create a sub-graph in a plurality of steps when the input query type is a sub-graph filter, where the steps include applying an index-based filter on a community of the plurality of Index-based representation file of the subject, object, predicate binary files to create a list of nodes satisfying the index-based filter and creating a new sub-graph based on a plurality of reported nodes, where the plurality of reported nodes is maintained for a plurality of further Input queries.
BRIEF DESCRIPTION OF THE DRAWINGS
These and other features, aspects, and advantages of the present invention will become better understood, when the following detailed description is road with reference to the accompanying drawings in which like characters represent like parts throughout the drawings, wherein:
Figure 1 is a block diagram showing a number of components of the present, system for tree-based graph entity indexing and accelerated query processing, according to a preferred embodiment of the present invention.
Figure 2 is a tree-based graph entity indexing utilizing minimal memory space and with optimal uae of the graphics processing unit resources, according to an exemplary embodiment of the present invention.
Figure 3a is a flowchart showing the mclhod for tree-based graph entity indexing and accelerated query processing, according to a preferred embodiment of the present invention. Figure 3b is a flowchart showing Ihe steps of creating the entity indexing trees of the graph data using the GPU-based parser, according to a preferred embodiment of the present invention. Figure 3c is a flowchart showing the additional stops of creating the two index-based representation files disclosed in step, according to a preferred embodiment of the present invention.
Figure 3d is a flowchart showing the additional steps of creating the index-based representation file in parallel for the predicate entities in the second entity indexing tree, according to a preferred embodiment of the present invention.
Figure 3c is a flowchart showing the additional steps of indexing the graph entities into the text-based file, according to a preferred embodiment of the present invention.
Figure 3f is a flowchart showing the processing steps of the graph query requests, according to a preferred embodiment of the present invention.
DETAIL DESCRIPTION OF THE INVENTION
The present invention relates to a computer-assisted system for tree-based graph entity indexing and accelerated query executing, according to a preferred embodiment and one or more alternate embodiment of the present invention. Modem computing systems store relationship data from the complex structured convents such as XML documents, and other geospatial data in form of graphs and form graph databases. In the graph representation of the relationship data, each node of the graph represents an entity and an edge between any two nodes represent the relationship between these two nodes. The graph-based representation of the relationship data is then converted to a text-based formatted file and stored for retrieval and processing upon receiving an input query. The present invention proposes an effective and less resource intensive means for the rapid processing of an input query to extract a sub-graph out of the stored graph or for searching something inside the text-based formatted file representing the graph. The present invention makes use of the parallel computing capabilities of the large number of graphic processors in a graphics processing unit (GPU) associated with the serial data processor(s) or the traditional processors) or central processing unit(s) (CPU) associated with any computing system. The use of GPU along with the traditional processors inside a computing system to process iin input query to extract a sub-graph out of the current graph or searching some other stored data inside the graph, which needs the graph entilies to he read from the file and to be distinguished correctly, significantly reduces the query processing time and ensures efficient use of the computing resources in the computing system. The present invention Anther discloses a method of indexing the tree-based graph entity into numerical values for using the GPUs as a co-processor to accelerate the input query processing.
Referring to Figure 1, which is a block diagram showing a number of components of the present system 100 for tree-based graph entity indexing and accelerated query executing, according to a preferred embodiment of the present invention. The present system 100 for tree-based graph entity indexing and accelerated query executing removes the redundant steps while processing the graph file based on the input graph query by doing multiple steps simultaneously and accelerates the process significantly using GPU 106 of the present computer system 100. In addition, the final indexed data produced by the present system 100 cun he fitted into small memory space compared to the already existing indexing methods. The present system 100 for tree-based graph entity indexing and accelerated query executing includes a memory unit 102 for tcmporaiy storage of a number of graph data associated with a graph database during creation of one or more trees from the graph data. The system 100 further includes a processor 104 to perform a number of calculations related to the graph data based on a number of processor executable instructions and at the same time utilizes the GPU 106 having a number of graphic processors to perform a variety of simultaneous tasks such as, parsing, indexing and query executing on the graph data associated with a graph database. A storage unit 108 associated with the processor 104 store a number of serialized trees in form of a number of files to be used while performing accelerated query executing using the present system 100. The system 100 further includes a graph processor module (1 10) executable by the processor (104) wherein the graph processor module (1 10) includes code executable to optimize graph related process such as graph load, graph update, graph query, query scheduler etc, In a preferred embodiment, the processor 104 communicates with the GPU 106 in real- time to perform the variety of simultaneous tusks such as, parsing, indexing and query executing on the graph data. The graph processor module 1 10 executable by processor 104 that is in communication with the GPU 106 further includes a query scheduler unit
(118) to schedule the plurality of graphic processors In processing a plurality of graph query requests based on a plurality of available resources Including the plurality of graphic processors and a plurality of query waiting times,, a graph load request executing unit 1 12 to load the graph database into the memory unit 102, a graph update request executing unit 114 to update the graph database in parallel and a graph query executing unit 116 to process the input graph query requests to the present system 100. The simultaneous tasks performed on the plurality of graph data, utilizing the GPU 106, accelerates the parsing, indexing and query executing operations on the graph data associated with the graph database and provides faster search results with tlie optimal use of the resources. In some instances, the simultaneous or parallel tasks performed using the GPU 106 includes automated removal of a number of duplicate graph entities during loading of the graph date. This saves time as the process of identification and removal of the duplicate graph entities is performed during the loading of the graph data into the memory unit 102. In some instances, the graph load request processor 1 12 performs a number of tasks including creating a number of entity indexing trees such as the subject, object trees and the predicate trees of the variety of graph data associated with the graph database using a GPU-based parser. The graph load request processor 112 further performs the serialization of the entity indexing trees into the text-based files and stores the files containing the entity indexing graph data in the storage unit 108 for future reference. Further, the query scheduler unit 1 18 performs intelligent automated scheduling of the processing tasks of the graph query requests based on the available resources including the graphic processors and the query waiting times.
Referring to Figure 2, which illustrates a tree-based graph entity indexing 200 utilizing minimal memory space and with optimal use of the graphics processing unit 106 resources, according to an exemplary embodiment of the present invention. The relationship data is stored in a form of graphs and form graph databases. In the graph shown in Figure 2, each node represents an entity and the edge between any two nodes represent the relationship between these two nodes. Frum the graph shown in Figure 2, each unit of graph data triples are placed as a branch of a radix tree. The leave value of each triple unit is assigned as the unique numerical ID of that element. The present system 100 stores the relationship graph in a text-based file format and performs accelerated processing of any query to the relationship graph either with the purpose of extracting a sub-graph out of the graph or searching something inside that graph utilizing the parallel processing of the query steps uaing the CPU 106 resources. The queries to the relationship graph for extract ing a sub-graph out of the graph or searching something inside that graph needs the graph entities to be read from the text-based file stored in the storage unit 108 and to be distinguished correctly for providing accurate results. The tree based word and number mapping utilizing the present system 100 also performs automatic duplicate removal of graph entities while loading graph data, thereby accelerating the query processing. The GPU-based parser accelerates the process of separation and grouping of graph entities and the GPU -based graph query processor accelerate the process of executing sub-graph extraction and graph queries execution. Further, the graph updater updates the already loaded and id mapped graph with new information during each processing stage and the graph queries scheduler schedules the processing of multiple graph queries based on available resources and query waiting time.
Referring to Figure 3a is n flowchart 300 showing the method for tree-based graph entity indexing and accelerated query executing, according to a preferred embodiment of the present invention. The present method for tree-based graph entity indexing and accelerated query processing starts with the step of loading the graph data for first lime or verifying presence of one or more previous entity indexing trees for the graph data associated with the selected graph database as in step 302. In case, of absence of any previous entity indexing trees further follows a process of compressing and indexing entities of the graph data associated with the selected graph database using a tree-based representation of an input relationship data, if no tree-based entity indexing has created for current graph during the compressing and indexing entities of graph database using tree-based representation of Input relationship data, then the process further follows the steps of creating a number of entity indexing trees such as the subject and object entity indexing trees and the predicate entity indexing trees of the graph detu associated with the graph database using the GPU-based parser, as in step 304, Once the entity indexing trees such as the subject and object entity indexing trees and the predicate entity indexing trees are created from the graph data associated with the graph database, the entity indexing trees are serialized into a number of files as in step 306, In a preferred embodiment, the entity indexing trees are serialized into a number of text-based files and the files containing the entity indexing trees of graph data are stored in the storage unit 108 of the present system 100, as in step 308, In some instances, the graph data is not loaded for the first time and the storage unit 108 thus stones one or more previous entity indexing trees in form of serialized text-based files, In such an instance, the present method loads the serialized files having the entity indexing trees as in step 310 followed by de-serialization of the files into the respective entity Indexing trees as in step 312. Referring to Figure 3b is a flowchart. 400 showing the steps of creating the entity Indexing trees of the graph data using the GPU-based parser, according to a preferred embodiment of the present invention. The method includes the steps of reading the graph data as in step 402 and separating a number of entities of the graph data in one or more subject; predicate and object entities in parallel using the GPU 106, as in step 404, The binary files thus created include a subject's binary file, an objects binary file and a predicates binary file. The step of creating the entity indexing trees further includes the steps of creating a first entity indexing tree containing the subject entities in the subject's binary file and the object entities in the objects binary file and a second entity indexing tree containing the predicate entities in the predicates binary file. Now, us in stop 406, a pair of index-based representation files is created separately for the subject entities and the object entities in the first entity indexing tree. Another step as in 408, creates an index-based representation file in parallel with creating a pair of index-based representation files separately for a number of subject entities and a number of object entities, for the predicate entities in the second entity indexing file.
Referring to Figure 3c is a flowchart 500 showing the additional steps of creating the two index-based representation files for subject and object entities disclosed in step 406, according to a preferred embodiment of the present invention. The additional steps of cresting the two index-bused representation files includes creating a first empty radix tree and a pair of indexed-based representation files for the subject entities and the object, entities as in step 502, followed by initializing a first counter or a subject-object tree (SOT) counter in step 504, indexing the subject entities using the first empty radix tree, the first counter and the separated subject entities and store them in the indexed- based representation file of the subject entities as in step 506 and indexing the object entitles using the first empty radix tree, the first counter and the separated object entities and store them in the indexed-based representation file of the object entities as In step 508.
Referring to Figure 3d is a flowchart 600 showing the additional steps of creating the index-based representation file for predicate entities in parallel for the predicate entities in the second entity indexing tree, according to a preferred embodiment of the present invention. The steps of creating the index-based representation file for predicate entities in parallel for the predicate entities includes creating a second empty radix tree and one indexed-based representation file for the predicate entities as in step 602, followed by initializing a second counter as in step 604 and indexing the predicate entities using the second empty radix tree, the second counter and the separated predicate entities and store them in the indexed-based representation file of predicate entities as in step 606.
Referring to Figure 3e is a flowchart 700 showing the additional steps of indexing the graph entities into the text-based file, according to a preferred embodiment of the present invention. The process of indexing the subject entities using the first, empty radix tree, the first counter and the separated subject and store them in the indexed-based representation file of the subject entities further includes the steps of checking existence of each element from the separated subject entities in the radix tree as in step 702, fetching a next element from the subject raw file when a previous element is found in the input tree as in step 704, adding each element to the radix tree in form of a new node and assigning the value of the first counter in form of a leave value on the new node when the element is absent in the input tree as in step 706, incrementing value of the first counter by one in step 708 and adding the element’s leave value to the input index- based representation tile for further storage in the storage unit 108. The above process is repeated during the indexing the objects entities using the first empty radix tree, the first counter and the objects raw file uiul store them in the indexed- based representation file of the objects entities.
The above process Is also repeated during the indexing of the predicate entitles using the second empty radix tree, the second counter and the separated predicate entities and store in the indexed-bused representation file of the separated predicate entities. The process follows the similar steps of cheeking existence of each element from the separated predicate entities in the second radix tree, fetching the next element from the predicate raw file when a previous element is found in the input tree, adding each element’s leave value in the second radix tree to the index-based representation file when each clement is found in the seeonri radix tree, adding each element to the second radix tree in form of a new node and assigning the value of the second counter in form of a leave value for the new node when the element is absent in the second radix tree, incrementing value of the second counter by one and adding the element's leave value to the index-based representation file.
The graph query processor unit 1 16 of the present system 100 executes the graph query requests in a number of steps detailed in the flowchart 800 shown in Figure 3f, according to a preferred embodiment of the present invention. The execution of the graph query requests includes the steps of translating an input query using the first filled radix tree for the subject entities the object entities and the second filled radix true for the predicate entities Into a number of indices as in stop 802. Now the type of query Is determined in step 804 and if an input query type is a graph, then the graph query executing steps from 806 to 810 are followed in sequence. When the input query type is a graph, then the graph algorithm or processing steps discussed above is applied on a sub-graph as in step 806. Then ns in step 808, the first filled radix tree is utilized for the subject entities and the object entities and the second filled radix tree for the predicate entities to translate the graph queiy executing results from the indices to the corresponding text-based labels, The graph quety processing results are then stored in a buffer for future use-cases like visualization of returning back to user as in step 810. However, if the input query type is a sub-graph filler, then the graph query executing steps from 812 to 814 are followed in sequence. When the input query type is a subgraph filter, an index-based filter on a community of the index-based representation file of the subject, object, predicate binary files is applied to create a list of nodes satisfying the index-based filter as in step 812. A new sub-graph is then created in stop 814 based on the reported nodes and is kept as such and waits for further queries.
Thus with the above processing steps the present system 100 reduce the number graph file reading and parsing times by doing the simultaneous tasks such as separating graph entities, removing redundant graph entities and assigning unique ID to each graph entity utilizing the GPU 106 and the processor 104 resources together. Furthermore the present system stores the graph entities of big graphs in less memory space by using the proposed tree-based data structure and accelerates the graph updating process by using the constructed tree-based data storage.
The foregoing description of the preferred embodiment of the present invention has been presented for the purpose of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations arc possible in light of the above teachings. It is intended that the scope of the present invention not be limited by this detailed description, but by the claims and the equivalents to the claims appended hereto.

Claims

1. A system (100) for tree-based graph entity indexing and accelerated query executing comprises:
a memo 17 unit (102) for temporary storage of a plurality of graph data associated with at. least one graph database during creation of at least one tree: at least one processor (104) to perform u plurality of calculations related to the plurality of graph data based on a plurality of processor executable instructions: at least one graphics processing unit, GPU, (106) having a plurality of graphic processors to perform a plurality of simultaneous tasks on the plurality of graph data, whereby the plurality of simultaneous tasks on the plurality of graph data, performed using the GPU (106), accelerates parsing, indexing and query executing ort the plurality of graph data associated with the graph database;
a storage unit (108) to store a plurality of serialized trees in form of a plurality of files to be used while performing accelerated query executing; and
a graph processor module (110) executable by the processor (104), the graph processor module (1 10) includes code executable to optimize graph related process; characterized in that the graph processor module (1 10) further comprises:
a graph load request executing unit (112) to load the at least one graph database;
a graph update request executing unit (114) to update the at least one graph database;
a graph query executing unit (116) to process the graph query requests; and a query scheduler unit (118) to schedule the plurality of graphic processors In processing a plurality of graph query requests based on a plurality of available resources Including the plurality of graphic processors and a plurality of query waiting times,
2. The system (100) of claim 1, wherein the graph load request executing unit (1 12) contains instructions executable by the processor (104) to:
create a plurality of entity indexing trees of the plurality of graph data associated with the graph database using a GPU-based parser;
serialize the plurality of entity indexing trees into the plurality of files; and store the plurality of files containing the entity indexing graph data in the storage unit (108),
3. A method for loading the plurality of graph databases utilizing the graph load request executing unit (1 12) includes the steps of:
verifying presence of a plurality of previous entity indexing trees for a plurality of graph data associated with at least one graph database,
wherein absence of the at least one previous entity indexing trees further follows a process of compressing and indexing entities of the plurality of graph data associated with the at least one graph database using at least one tree-based representation of an input relationship data, the process further includes the steps of:
creating a plurality of entity indexing trees of the plurality of graph data associated with the graph database using a GPU-based parser;
serializing the plurality of entity indexing trees into a plurality of files; and
storing the plurality of files containing the entity indexing trees of graph data in a storage unit (108);
wherein presence of the previous entity indexing trees of graph data further follows the steps of:
loading the plurality of files having a plurality of serialized entity indexing trees; and
de-serializing the plurality of files into the plurality of entity indexing trees.
4. The method of claim 3, wherein the step of creating the plurality of entity indexing trees of the plurality of graph data using the GPU-based purser further includes the steps of:
reading the plurality of graph data;
separating a plurality of entities of the plurality of graph data in a plurality of subject, predicate and object entities in parallel using a GPU (106),
creating a pair of index-based representation files separately for a plurality of subject entities and a plurality of object entities in a first entity indexing file; and creating an index-based representation file in parallel with creating a pair of index-based representation flies separately for a plurality of subject entities and a plurality of object entities, for a plurality of predicate entities in a second entity indexing file.
5. The method of claim 4, wherein creating the pair of index-based representation file for subject and object entities further includes the steps of:
creating a first empty radix tree and a pair of indexed-based representation files for the subject entities the object entities;
initializing a first counter;
indexing the plurality of subject entities using the first empty radix tree, the first counter and the separated subject entities and store in the indexed-based representation file of the subject entities; and
indexing the plurality of object entities using the first empty radix tree, the first counter and the separated object entities and store in the indexed-based representation file of the object entities.
6. The method of claim 4, wherein creating the index-based representation tile for predicate entities further includes the steps of:
creating a second empty radix tree and one indexed-based representation file for the predicate entities;
initializing a second counter; and
indexing the plurality of predicate entities using the second empty radix tree, the second counter and the separated predicate entities and store in the indexed- based representation file of predicate entities.
7. The method of claim 4, wherein indexing the plurality of subject entities using the first empty radix tree, the first counter and the separated subject entities further includes the steps of:
checking existence of each element from the separated subject entities in the radix tree; adding un element’s leave value in the radix tree to the index-bused representation file when each element is found in the radix tree;
adding each element to the radix tree in form of a new node and assigning the value of the first counter in form of a leave value for the new node when the element is absent in the input tree;
incrementing a value of the first counter by one; and
adding the element’s leuve value to the input index-based representation file,
8. The method of claim 4, wherein indexing the plurality of object entities using the first empty radix tree, the first counter and the separated object entities further includes the steps of:
checking existence of each element from the separated object entities in the radix tree;
adding the element’s leave value in the radix tree to the index-based representation file when each element is found in the radix tree
adding euch element to the radix tree in form of a new node und assigning the value of the first counter in form of a leave value for the new node when the element is absent in the input tree;
incrementing value of the first counter by one; and
adding the element's leave value to the index-based representution file.
9. The method of claim 4, wherein indexing the plurality of predicate entities using the second empty radix tree, the second counter and the separated predicate, entities further includes the steps of:
checking existence of each element from the separated predicate entities in the second radix tree;
adding an element’s leave value in the second radix tree to the index-based representation file when each element is found in the second radix tree;
adding each clement to the second radix tree in form of a new node and assigning the value of the second counter in form of a leave value for the new node when the element is absent in the second radix tree;
incrementing value of the second counter by one; and
adding the element's leave value to the index-based representation file.
10. The method of claim 4, wherein executing of a plurality of graph query requests includes the steps of:
translating an input query using the first filled radix tree for the subject und object entities and the second filled radix tree for the predicate entities into a plural ity of indices;
performing a plurality of graph query executing steps, when an input query type is a graph algorithm, the steps include:
applying the graph algorithm query executing steps on a sub-graph; using the first filled radix tree for the subject and the object entities und the second filled radix tree for the predicate entities to translate a plurality of graph algorithm query executing results from the indices to a plurality of text-based labels; and
storing the graph query executing results in a buffer;
applying an input filter on the graph to create a sub-graph in a plurality of steps when the input query type is a sub-graph filter, the steps include:
applying an index-based filter on a community of the plurality of index-based representation flic of the subject, object, predicate binary files to create a list of nodes satisfying the index-based (liter; and creating a new sub-graph based on a plurality of reported nodes, wherein the plurality of reported nodes is maintained for a plurality of further input queries.
PCT/MY2019/050082 2018-11-14 2019-11-08 System and method for tree based graph indexing and query processing WO2020101470A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2018001922 2018-11-14
MYPI2018001922 2018-11-14

Publications (1)

Publication Number Publication Date
WO2020101470A1 true WO2020101470A1 (en) 2020-05-22

Family

ID=70731880

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2019/050082 WO2020101470A1 (en) 2018-11-14 2019-11-08 System and method for tree based graph indexing and query processing

Country Status (1)

Country Link
WO (1) WO2020101470A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984796A (en) * 2020-07-31 2020-11-24 西安理工大学 Automatic compliance checking method based on standard knowledge graph IFC model

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003107222A1 (en) * 2002-06-13 2003-12-24 Cerisent Corporation Parent-child query indexing for xml databases
US8572136B2 (en) * 2009-08-28 2013-10-29 Beijing Innovation Works Technology Company Limited Method and system for synchronizing a virtual file system at a computing device with a storage device
US20140136520A1 (en) * 2012-11-12 2014-05-15 Software Ag Method and system for processing graph queries
US20140244687A1 (en) * 2013-02-24 2014-08-28 Technion Research & Development Foundation Limited Processing query to graph database
US20180032559A1 (en) * 2016-07-26 2018-02-01 Ebay Inc. Mechanism for efficient storage of graph data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2003107222A1 (en) * 2002-06-13 2003-12-24 Cerisent Corporation Parent-child query indexing for xml databases
US8572136B2 (en) * 2009-08-28 2013-10-29 Beijing Innovation Works Technology Company Limited Method and system for synchronizing a virtual file system at a computing device with a storage device
US20140136520A1 (en) * 2012-11-12 2014-05-15 Software Ag Method and system for processing graph queries
US20140244687A1 (en) * 2013-02-24 2014-08-28 Technion Research & Development Foundation Limited Processing query to graph database
US20180032559A1 (en) * 2016-07-26 2018-02-01 Ebay Inc. Mechanism for efficient storage of graph data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111984796A (en) * 2020-07-31 2020-11-24 西安理工大学 Automatic compliance checking method based on standard knowledge graph IFC model
CN111984796B (en) * 2020-07-31 2022-11-04 西安理工大学 Automatic compliance inspection method based on standard knowledge graph IFC model

Similar Documents

Publication Publication Date Title
US8782101B1 (en) Transferring data across different database platforms
US8856183B2 (en) Database access using partitioned data areas
US8099725B2 (en) Method and apparatus for generating code for an extract, transform, and load (ETL) data flow
US8782017B2 (en) Representing and manipulating RDF data in a relational database management system
US20170083573A1 (en) Multi-query optimization
US7685106B2 (en) Sharing of full text index entries across application boundaries
US9639542B2 (en) Dynamic mapping of extensible datasets to relational database schemas
CA2997061C (en) Method and system for parallelization of ingestion of large data sets
US20120131022A1 (en) Methods and systems for merging data sets
US11593357B2 (en) Databases and methods of storing, retrieving, and processing data
US9830319B1 (en) Hierarchical data extraction mapping and storage machine
WO2017070247A1 (en) Parallel execution of queries with a recursive clause
US20100235344A1 (en) Mechanism for utilizing partitioning pruning techniques for xml indexes
US11514697B2 (en) Probabilistic text index for semi-structured data in columnar analytics storage formats
JP2008269643A (en) Method of organizing data and of processing query in database system, and database system and software product for executing such method
Fan et al. Handling distributed XML queries over large XML data based on MapReduce framework
US6925463B2 (en) Method and system for query processing by combining indexes of multilevel granularity or composition
WO2020101470A1 (en) System and method for tree based graph indexing and query processing
US20050060302A1 (en) Computer implemented method and according computer program product for storing data sets in and retrieving data sets from a data storage system
JP2007535009A (en) A data structure and management system for a superset of relational databases.
US11144580B1 (en) Columnar storage and processing of unstructured data
EP1503297A1 (en) Computer implemented methods of retrieving hit count data from a data base system and according computer program product
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
US20050055331A1 (en) Computer implemented method for retrieving data from a data storage system and according computer program product and data storage system
CN113282579A (en) Heterogeneous data storage and retrieval method, device, equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19884006

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19884006

Country of ref document: EP

Kind code of ref document: A1