CN108711136B

CN108711136B - CPU-GPU (Central processing Unit-graphics processing Unit) collaborative query processing system and method for RDF (resource description framework) graph data

Info

Publication number: CN108711136B
Application number: CN201810408484.1A
Authority: CN
Inventors: 袁平鹏; 金海�; 王磊
Original assignee: Huazhong University of Science and Technology
Current assignee: Huazhong University of Science and Technology
Priority date: 2018-04-28
Filing date: 2018-04-28
Publication date: 2020-10-30
Anticipated expiration: 2038-04-28
Also published as: CN108711136A

Abstract

A CPU-GPU collaborative query processing system and method of RDF graph data are disclosed, which are based on query statements submitted by users to analyze so as to obtain first information, wherein the first information comprises a three-tuple mode, a public variable and a projection variable; selecting a three-tuple mode as a representative mode for each public variable, and dividing data corresponding to each representative mode to obtain a plurality of data blocks; sequentially reading a plurality of data blocks to generate connection tasks among different data blocks, and sequentially transmitting the data blocks to a GPU video memory; under the condition that the CPU detects that the data blocks on which the connection tasks depend are all transmitted to a GPU video memory, the GPU executes the connection tasks and generates corresponding intermediate results; and under the condition that the connection tasks among different data blocks are all executed and/or all the data blocks are transmitted to the GPU video memory, fully connecting different intermediate results, and collecting and finally obtaining the results in a projection variable sequence mode. The invention has high parallelism degree and fast query speed.

Description

CPU-GPU (Central processing Unit-graphics processing Unit) collaborative query processing system and method for RDF (resource description framework) graph data

Technical Field

The invention relates to the technical field of graph data processing, in particular to a CPU-GPU collaborative query processing system and method for RDF graph data.

Background

Resource Description Framework (RDF) has become one of the standard formats for data exchange. It provides a uniform standard for describing various resources on the Web. In form, RDF may be represented by a triple: subject (subject), predicate (predicate), and object (object).

SPARQL is the SQL-like query language recommended by the W3C standard for querying RDF data. Formally, a SPARQL query can be generally represented as SELECT? x? y? z WHERE { P1, P2 … Pn }. Wherein? x? y? z is a projection variable, P1, P2 … Pn is a triplet pattern.

With the advent and development of General Purpose Graphics Processing Units (GPGPU), a new trend has been to use GPUs in database systems to speed up query processing. Despite the powerful computing power of GPUs, there are still many challenges to using GPUs to improve the performance of query processing: (1) since the data copy from the memory to the video memory needs to pass through the PCI bus, the data transmission time between the CPU and the GPU is limited by the PCI bus bandwidth, and the data must first be copied to the GPU to perform the computation task on the GPU. (2) Most current research focuses on using GPUs to speed up the commonly used query processing operators, with parallel granularity at the relationship level, without much overlap in communication time and computation time.

With the continuous growth of RDF data, how to efficiently query huge RDF data, especially to perform complex connection query in billion-level triples, becomes an urgent problem to be solved. Many systems for processing RDF data are currently available. Such as RF-3X, HexaStore, BitMat, gStore, etc. The query system generates an optimized static query execution plan in a query stage by using a dynamic programming mode, and the disadvantage is that the query execution plan can be executed only after the generation of the whole query plan is completed. Some graph compression-based methods express the relationship among six possible combinations of subjects, predicates and objects in a bitmap mode according to the idea of an incidence matrix in a graph theory, and reduce the space through a compression algorithm, but not only needs to establish a corresponding index to accelerate the query, but also wastes much time through frequent decompression and compression processes. Some vertically-divided storage methods are adopted, when a main body contains a plurality of attribute values, the attribute values can be efficiently stored, but when the number of predicates in the query is large and unknown, an index for assisting the predicate query needs to be additionally added to accelerate the query.

Disclosure of Invention

In view of the deficiencies of the prior art, the present invention provides a CPU-GPU collaborative query processing system for RDF graph data, comprising at least one CPU and at least one GPU, wherein the CPU is configured to: analyzing based on a query statement submitted by a user to acquire first information, wherein the first information comprises a three-tuple mode, a public variable and a projection variable; selecting a three-tuple mode as a representative mode for each public variable, and dividing data corresponding to each representative mode to obtain a plurality of data blocks; sequentially reading the data blocks to generate connection tasks among different data blocks, and sequentially transmitting the data blocks to a GPU video memory; in the case where the CPU detects that the data blocks on which the connection task depends are all transferred into the GPU memory, the GPU is configured to: executing the connection task by the GPU and generating a corresponding intermediate result; and under the condition that the connection tasks among the different data blocks are all executed and/or all the data blocks are transmitted to a GPU video memory, fully connecting different intermediate results, and collecting and feeding back final results according to the projection variable sequence mode.

According to a preferred embodiment, the selecting a triple mode as a representative mode for each common variable and dividing the data corresponding to each representative mode to obtain a plurality of data blocks at least includes: sorting the N triplet patterns based on the number and/or degree of selection of the common variables; setting the initial public variable representation mode not to be null, wherein the representation mode is null to indicate that the corresponding public variable has not selected the representation mode; reading a kth triplet mode, and setting the kth triplet mode as a representative mode of a common variable of the kth triplet mode under the condition that the kth triplet mode only has one common variable and the common variable does not select the representative mode, wherein k is an integer which is larger than zero and smaller than N; setting a kth triad mode as a new representative mode of the common variables of the selected representative mode under the condition that a first common variable and a second common variable exist in the kth triad mode and only one representative mode is not selected; under the condition that the representative mode is not selected by the first common variable and the second common variable, setting the representative mode of the common variable with smaller selection degree in the first common variable and the second common variable as the kth triad mode; in the case where k is greater than or equal to N, a preliminary query plan is generated for each common variable.

According to a preferred embodiment, said obtaining a plurality of data blocks further comprises the steps of:

and when k is larger than or equal to N, generating a query plan by sequentially reading the data blocks according to the sequence of the selection degrees from large to small for each public variable.

According to a preferred embodiment, the generating of the connection task and the transmitting of the data block to the GPU video memory at least comprises the steps of: setting a data block corresponding to each representative mode; under the condition that the first common variable and the second common variable exist in the representative mode, reading the nth data block and creating the connection task according to the value of the second common variable; or reading the nth data block, reading the data block of the non-representative mode based on the first common variable according to the interval of the first common variable, and creating a connection task depending on the nth data block and the data block of the non-representative mode, wherein N is an integer greater than zero and less than N; and transmitting the data block to a GPU video memory under the condition that N is larger than or equal to N.

According to a preferred embodiment, reading the nth data block and creating the join task according to the value of its second common variable comprises at least the steps of: determining the number M of data blocks in a representative mode of a second public variable, and establishing a first array according to the number of the data blocks, wherein for the jth data block of the representative mode of the second public variable, the initial value of the first array is set to be zero to indicate that the connection task is not established between the current data block and the jth data block; reading ith row data of the current data block, and setting the value of the second public variable as valY; setting the value of a first array of H continuous data blocks in the data blocks of the representative mode of the second common variable as 1 to represent that a connection task taking the valY as a key value is established under the condition that the valY is searched in the data blocks of the representative mode of the second common variable in a binary search mode; and under the condition that the current data block is completely read, creating the connection task according to the content of the first array.

According to a preferred embodiment, executing the join task by the GPU and generating the corresponding intermediate result when all the data blocks on which the join task depends are transferred to the GPU video memory comprises at least the steps of: determining a left operand and a right operand of the join task; traversing an incomplete task set depending on the transmitted data blocks, setting the initial value of a counter a to be 1 and starting m GPU threads under the condition that all the data blocks depending on the corresponding tasks are found to be transmitted to a GPU video memory, wherein a first bit vector mark of a left operand and a second bit vector mark of a right operand are set; determining a thread ID and expressing the thread ID by tid, reading the first tid row data of a left operand, and setting a key value participating in the connection task as a key; setting the bit position value of a matching interval marked by a second bit vector as 1 under the condition that the key value key is found in a right operand according to a binary search mode, wherein the matching interval is a result interval matched by a connection task in a data block to be operated; in the case that (a + m) is larger than the length of the left operand, respectively performing a merging operation on the left operand and the right operand, wherein the merging operation places the marked bit vectors with the respective corresponding values of 1 together starting from the starting address of the left operand or the right operand; and entering first subsequent processing when the unfinished task set is empty, wherein the first subsequent processing is to perform full connection on the intermediate result, and collect and feed back the final result according to the projection variable sequence.

According to a preferred embodiment, when the data blocks on which the connection task depends are all transferred to the GPU memory, the step of executing the connection task by the GPU and generating the corresponding intermediate result further comprises: and aiming at each thread, setting the bit value of the tid bit of the first bit vector mark as 1 to indicate that the data corresponding to the bit vector mark belongs to the execution result of the connection task under the condition that the key value key is found in the right operand according to a binary search mode, setting the bit value of the matching interval of the second bit vector mark as 1, and synchronizing all threads.

According to a preferred embodiment, said merging operation comprises at least the steps of: setting the initial value of a counter b to be 1 and the initial value of a counter c to be 1; under the condition that the bit value of the b-th bit of the read first bit vector mark or the read second bit vector mark is 1, updating the value of the c-th data of the array to the value of the b-th data of the array, and updating the values of the counter b and the counter c in a mode of increasing 1 on the original basis; and repeating the process until the value of the counter b is larger than the length of the first bit vector mark or the second bit vector mark, and entering second subsequent processing, wherein the second subsequent processing is to judge whether the unfinished task set is empty or not.

The invention also provides a CPU-GPU collaborative query processing method of the RDF graph data, which comprises the following steps: analyzing based on a query statement submitted by a user to acquire first information; selecting a representative mode based on the first information and dividing data to form a plurality of data blocks; generating a connection task based on the data block and transmitting the relevant data block to a GPU for execution of the connection task; and performing full connection on different intermediate results and outputting a calculation result on the basis of the condition that all connection tasks among the different data blocks are executed and/or all the data blocks are transmitted to a GPU video memory.

According to a preferred embodiment, a CPU analyzes based on a query statement submitted by a user to obtain first information, wherein the first information comprises a triad mode, a public variable and a projection variable; meanwhile, selecting a three-tuple mode as a representative mode for each public variable, and dividing data corresponding to each representative mode to obtain a plurality of data blocks; sequentially reading the data blocks to generate connection tasks among different data blocks, and sequentially transmitting the data blocks to a GPU video memory; under the condition that the CPU detects that the data blocks on which the connection tasks depend are all transmitted to a GPU video memory, the GPU executes the connection tasks and generates corresponding intermediate results; and under the condition that the connection tasks among the different data blocks are all executed and/or all the data blocks are transmitted to a GPU video memory, performing full connection on different intermediate results, and collecting and finally obtaining results according to the projection variable sequence mode.

The invention has the beneficial technical effects that:

(1) the parallelism degree is high: the invention divides the data participating in the query into blocks, and all Join tasks are at the block level. And all Join tasks are basically independent of each other, and there are few places to synchronize.

(2) The query speed is high: the invention fully utilizes the advantage of large number of threads on the GPU, and can quickly complete the execution processing process of the Join task.

(3) The communication and calculation overlap degree is high: the invention can execute Join task while transmitting data to GPU, and realize the purpose of calculating and covering communication.

Drawings

FIG. 1 is a block diagram of a preferred RDF graph data query processing system according to the present invention;

FIG. 2 is a schematic diagram showing interaction of modules of the RDF graph data query processing system in accordance with the present invention;

FIG. 3 is a flow chart of a preferred RDF graph data processing method of the present invention;

FIG. 4 is a flow chart of a preferred data partitioning method of the present invention;

FIG. 5 is a flow chart of a preferred GPU of the present invention for executing Join tasks; and

fig. 6 is a query graph of a LUBM Q5 query statement.

List of reference numerals

1: the CPU 2: GPU 3: user interface module

4: the data storage module 5: algorithm database 41: first data storage module

42: second data storage module

Detailed Description

The following detailed description is made with reference to the accompanying drawings.

To facilitate understanding, identical reference numerals have been used, where possible, to designate similar elements that are common to the figures.

As used throughout this application, the word "may" is used in a permissive sense (i.e., meaning having the potential to), rather than the mandatory sense (i.e., meaning must). Similarly, the words "include", "including" and "comprises" mean including, but not limited to.

The phrases "at least one," "one or more," and/or "are open-ended expressions that encompass both association and disassociation in operation. For example, each of the expressions "at least one of A, B and C", "at least one of A, B or C", "one or more of A, B and C", "A, B or C" and "A, B and/or C" refers to a alone a, a alone B, a alone C, A and B together, a and C together, B and C together, or A, B and C together, respectively.

The terms "a" or "an" entity refer to one or more of that entity. As such, the terms "a" (or "an"), "one or more," and "at least one" are used interchangeably herein. It should also be noted that the terms "comprising," "including," and "having" may be used interchangeably.

As used herein, the term "automated" and variations thereof refer to any process or operation that is completed without substantial manual input when the process or operation is performed. However, if the input is received before the process or operation is performed, the process or operation may be automatic, even if the process or operation is performed using substantial or insubstantial manual input. Such manual input is considered to be material if such input affects the manner in which the process or operation is performed. Manual input that grants permission to perform the procedure or operation is not considered "material".

Example 1

The CPU-GPU collaborative query processing system for RDF graph data shown in fig. 1 includes a central processor 1, a graphics processor 2, a user interface module 3, a data storage module 4, and an algorithm database module 5.

Preferably, the user interface module 3 includes an RDF data import interface, a Sparql query interface, an entity dump interface, and an RDF data dump interface. The RDF data import interface is used for importing the RDF graph data to be processed according to a wired connection mode with the external device. The external device may be, for example, a mobile terminal, a computer, a cloud server, or the like. The Sparql query interface is used to input a query statement. The query statement may be input by connecting the Sparql query interface to a device such as a keyboard, a touch screen, or the like. The RDF data dump interface is used for performing backup storage on the RDF data at a specific moment so as to prevent errors of the RDF data. RDF data in the collaborative query processing system can be transferred to a storage space independent of the system for backup through an RDF data dump interface. The entity dump interface is used to dump entities in the system. Preferably, the RDF data import interface, Sparql query interface, entity dump interface, and RDF data dump interface may share one output port. Preferably, the RDF data import interface, Sparql query interface, entity dump interface and RDF data dump interface are independently arranged and can be separately connected to different hardware. For example, the RDF data import interface is electrically connected to the data storage module 4, and the Sparql query interface is electrically connected to the central processor 1.

Preferably, the CPU is configured to: and analyzing to obtain first information based on a query statement submitted by a user, wherein the first information comprises a triad mode, a public variable and a projection variable. Preferably, the query statement is a SPARQL query statement. And selecting a three-tuple mode as a representative mode for each public variable, and dividing data corresponding to each representative mode to obtain a plurality of data blocks. Sequentially reading the data blocks to generate the TASK_X->[CHUNK_L，CHUNK_R]And sequentially transmitting the data blocks to a GPU video memory after the connection tasks among different data blocks expressed in the form.

Preferably, the Join task, also called Join task, refers to a parallel Join processing operation between data blocks. By a parallel join algorithm, such as a Semi-join reduction algorithm, a sort merge join algorithm, and/or a parallel hash algorithm. In the query graph, Semi-join sends join attribute values to other patterns, and these value lists are used as filtering conditions in the running process. In particular, suppose R_iAnd R_jAre respectively of the same type as P_iAnd P_jPattern matching relationships. From R on attribute a_iTo R_jSemi-join of (a) can be expressed as

The query processing system projects R on attribute a_iCan be expressed as Π_aR_iThis projection variable is then transferred from the storage medium to the CPU. This process does not deliver R_iIncluding two or more than twoA relation R of a plurality of data_iSending only one attribute Π_aR_iThe overhead of the transfer will be reduced. Also, a projection index may be constructed on the projection to indicate that the column is indexed. In this way, the indexed columns can be deleted from the original data and stored separately at the same location of each entry relative to the base record. Query processing system executes projection variables and R by indexing_iTo be connected to each other. Preferably, the Join TASK between different data blocks passes through TASK_X->[CHUNK_L，CHUNK_R]Is expressed. Among them, CHUNK_LIndicating the left operand, CHUNK, participating in the Join task_RRepresenting the right operand participating in the Join task.

Preferably, the data block refers to a unit for storing RDF data in the system, the data block physically refers to one cache block, and the data is stored in a plurality of cache blocks in the form of data blocks. The number of data blocks stored in a cache block can be determined according to the capacity of the cache block and the capacity of the data blocks, for example, one cache block can store 100 data blocks and has 500 data blocks, so that 5 cache blocks are available in total. The size of the data block is dynamically set according to the parameters. The parameters include the size of the data set and the size of the memory. Preferably, the size of the data block is 4 KB.

Preferably, in a case where the CPU detects that all the Join task dependent data blocks are transferred to the GPU memory, the GPU is configured to: the Join task is executed by the GPU and corresponding intermediate results are generated. And under the condition that all Join tasks among different data blocks are executed and/or all data blocks are transmitted to a GPU video memory, fully connecting different intermediate results, and collecting and feeding back final results in a projection variable sequence mode. Preferably, the order of the projected variables is specified by the query statement.

Preferably, the data storage module 4 further stores a related information table after data is re-imported. Such as graph matrices, statistical indices, dictionary files, and entity predicate indices, etc., needed to query triples. The dictionary file is used for establishing a mapping structure in the process of mutual conversion from the URI character string to the ID number. The graph matrix is used for establishing RDF triple data import modeling and compressing the data structure after storage. The index statistics mainly comprise an S table, an O table, an SP table and an OP table. The entity predicate index is used to record all associated predicates for an entity. Preferably, the data storage module 4 performs interactive communication with the central processor 1 and the graphics processor 2 in a limited electrical connection and/or a wireless communication connection. Preferably, the data storage module 4 comprises a first data storage module 41 and a second data storage module 42. The first data storage module 41 is configured to store index files, such as statistical indexes, block indexes, and entity predicate indexes. The second data storage module 42 is used to store data files, such as graph matrices and dictionary files.

Preferably, the algorithm database 5 is used to store data processing algorithms. Such as the parallel connection algorithm, the RDF data processing algorithms disclosed in examples 2 to 4. The algorithm database 5 can be separately provided as a module and electrically connected with the central processor 1 and the graphic processor 2 through ports. The algorithm database 5 may also be arranged in a manner built into the respective central processor 1 and graphics processor 2. The central processing unit 1 and the graphic processing unit 2 perform data operation processing by calling corresponding algorithms in the algorithm database 5. Preferably, the algorithm database 5 is connected to the user interface module 3. The algorithms in the algorithm database 5 can be updated by the user interface module.

For easy understanding, the following discusses the mutual interaction process between the modules of the CPU-GPU collaborative query processing system of RDF graph data according to the present invention.

As shown in fig. 2, the sparql query statement is transmitted to the central processor 1 through the user interface module 3. The query statement performs query parsing in the central processor 1 to generate a query plan. The central processing unit 1 calls an algorithm which is stored in the algorithm database 5, selects a three-tuple mode as a representative mode for each public variable, and divides data into a plurality of data blocks based on the data corresponding to each representative mode. On the basis, the central processing unit 1 also calls the Join task which is stored in the algorithm database and generates the Join task based on different data blocks by transmitting the data blocks to the algorithm in the GPU video memory. And transmits the Join task to the graphic processor 2. The graphics processor 2 invokes an algorithm stored in an algorithm database that executes the Join task and generates corresponding intermediate results to process the Join task. Subsequently, the graphics processor 2 calls the parallel connection algorithm stored in the algorithm database to fully connect the obtained intermediate results, and outputs the final result through the user interface module in a projection variable sequence manner. The relationship between the data file and the index file is embodied in that when the system needs to read the data block, the data block is firstly positioned according to the index file, and then the data file is read according to the positioning information. The block-granularity parallel processing model and the parallel connection algorithm module need to interact with the index file because the index file is accessed first and then the data block is read.

Example 2

This embodiment is a further supplement to embodiment 1, and repeated content is not described again.

As shown in fig. 4, a method for selecting a triple mode as a representative mode for each common variable and dividing data corresponding to each representative mode to obtain a plurality of data blocks specifically includes the following steps:

s1: sorting the three-tuple modes based on the number and/or the selectivity of the public variables and setting the N sorted three-tuple modes as an array Pattern [ N ];

s2: setting a counter k to be 0, and setting the selected public variable representative mode to be Delegate [ var ], wherein the Delegate [ var ] is empty and indicates that the public variable var is not selected temporarily;

s3: reading a kth triple pattern, and setting the kth triple pattern as P;

s4: if there is only one common variable var in P and Delegate [ var ] is empty, the flow proceeds to step S5; if there is only one common variable var in P and Delegate [ var ] is not empty, then the process proceeds to step S7; step S6 is entered if there are two common variables var1, var2 in P and at least one of Delegate [ var1] and Delegate [ var2] is empty; in the case where there are two common variables var1, var2 in P, and neither delgate [ var1] nor delgate [ var2] is empty, proceed to step S7;

s5: setting Delegate [ Var ] ═ P;

s6: in the case where both delgate [ var1] and delgate [ var2] are empty, setting delgate [ var3] ═ P, where var3 is the less selective variable of var1 and var 2; when delgate [ var1] is empty and delgate [ var2] is not empty, setting delgate [ var2] to P; when delgate [ var1] is not empty and delgate [ var2] is empty, setting delgate [ var1] to P;

s7: setting a counter k as k + 1;

s8: if k is equal to or greater than N, the process proceeds to step S9, and if k is less than N, the process proceeds to step S3;

s9: and for each common variable mode, reading the data block of the mode with high selectivity, and then generating a query plan according to the strategy of reading the data block of the mode with low selectivity by the data block.

Preferably, the rule for ordering the triple patterns in step S1 is that the triple pattern containing two common variables is arranged at the top, the triple pattern containing one common variable and one non-common variable is arranged at the next position, and the triple pattern of the single variable is arranged at the last. And when the sorting cannot be carried out according to the number of the common variables, sorting is carried out according to the selection degree. The higher the degree of selection, the more forward the order of the triplet pattern.

Preferably, the selection degree of the triplet pattern is the reciprocal of the number of matching triplet patterns in the system, and a high selection degree indicates that the intermediate result corresponding to the triplet pattern is smaller. The degree of selection of a triplet pattern is calculated by the proportion of the number of triples that match the triplet pattern. The degree of selection of a single query pattern can be determined directly by locating the entity ID in the index of statistical information. Preferably, when both the subject and the object are known and the predicate is a variable, the degree of choice can be estimated to be 2 to assume that the relationship between the two entities is much smaller than for the other types. The calculation of the degree of selection for the triplet mode is illustrated by the query graph of the LUBM Q5 query statement as shown in FIG. 6. As shown in fig. 6, the number of triplet patterns matching a triplet is, in turn: # (P2) < # (P1) < # (P4) < # (P3) < # (P5) < # (P6). The initial selectivity of the triplet pattern can be found to be Sel (P2) > Sel (P1) > Sel (P4) > Sel (P3) > Sel (P5) > Sel (P6), where Sel (P) represents the selectivity of P. The degree of selection of the variables can be obtained according to the degree of selection of the variables included in the query pattern. Preferably, the maximum degree of selection of the ternary pattern is the degree of selection of the variable. For example, P2, P4, P5 share the same connection variable? y, then Sel (. Preferably, the degree of selection of the connection variable between the two query patterns P1 and P2 is calculated by the product of the connection type and the corresponding pattern selection degree. And calculated by the following formula.

Wherein, factor represents a correction factor, and the value of the correction factor is the reciprocal of the mean value of the number of the triples corresponding to the query patterns P1 and P2.

Example 3

This embodiment is a further supplement to embodiment 1 and embodiment 2, and repeated contents are not described again.

A method for generating Join tasks and transmitting data blocks to a GPU video memory specifically comprises the following steps:

a1: setting a counter m to be 0, and setting a data block corresponding to the representative mode as CHUNK [ P ] [ N ] to indicate that the mode P corresponds to N data blocks in total;

a2: under the condition that the representative mode has two common variables, setting the two common variables as x and y respectively and then entering the step A3; under the condition that the representative mode does not have two common variables, entering step A4;

a3: reading an nth data block CHUNK [ P ] [ n ] of the pattern P, creating the join task according to the value of a common variable y in the CHUNK [ P ] [ n ], and then entering the step A5;

a4: nth data block CHUNK [ P ] of read pattern P][n]And according to CHUNK [ P ]][n]Interval [ x ] of middle common variable x_min,x_max]Reads a data block of the unrepresentative mode based on the common variable x and creates a dependent data block CHUNK P][n]A Join task with the data block of the non-representative mode; wherein, the interval [ x_min,x_max]Is established with the maximum and minimum values of the common variable x;

a5: setting a counter m as m + 1;

a6: entering step A7 if m is equal to or greater than N, and returning to step A2 if m is less than N;

a7: and transmitting the data block involved in the step A3 or the step A4 to a GPU video memory.

Preferably, step a3 of the method for generating a Join task and transmitting a data block to the GPU video memory further includes the following specific steps:

a3-1: setting a counter i to be 0 and a group task option [ M ], and setting a task option [ j ] to be 0; wherein M is the number of data blocks of Delegate [ y ]; the task option [ j ] is 0, which indicates that the joint task has not been created between the current data block and the jth data block of the delete [ y ];

a3-2: reading ith row data of the current data block, and setting values on a public variable x and a public variable y as valX and valY respectively;

a3-3: searching valY in the data block of the Delegate [ y ] according to a binary search mode, and entering step A3-4 under the condition that the valY is searched, and entering step A3-5 under the condition that the valY is not searched;

a3-4: setting a data block containing valY in the delete [ y ] as a G-th block, a G + 1-th block, …, and a G + H-th block, and setting task option [ R ] ═ 1 for the H data blocks to indicate that a Join task with valY as a key value has been created, wherein R ═ G, G +1, … G + H; the task Option is used only to indicate whether a relevant Join task has been created;

a3-5: setting a counter i to i + 1;

a3-6: creating a TASK number (TASK) according to the content of the TaskOption under the condition that the current data block is read completely_X->[CHUNK_L，CHUNK_R]The Join task represented in the form of (1) is not read completely in the current data blockProceed to step a 3-2.

Example 4

This embodiment is a further supplement to embodiment 1, embodiment 2, and embodiment 3, and repeated contents are not repeated.

As shown in fig. 5, a method for executing the Join task by the GPU and generating a corresponding intermediate result at least includes the steps of:

b1: traversing the unfinished task set depending on the transmitted data blocks, and entering step B2 under the condition that all the data blocks depending on the corresponding task are found to be transmitted to the GPU video memory;

b2: setting a counter a to 0, starting ThreadNum GPU threads, wherein CHUNK_LThe labeled bit vector of (1) is BitVec1, CHUNK_RThe labeled bit vector of (1) is BitVec 2;

b3: set each thread identifier to tid and read CHUNK_LSetting the key value participating in the Join task as a key according to the tid data;

b4: for each thread, according to a binary search mode, the method is applied to CHUNK_RSet matching interval [ left, right ] under the condition of finding key]And entering step B5, and entering step B6 if no key is found;

b5: setting bit in a [ left, right ] interval in BitVec2 as 1;

b6: setting a counter a ═ a + ThreadNum;

b7: at a is equal to or more than CHUNK_LIn the case of length, proceed to step B8, where a is less than CHUNK_LReturning to step B3 in the case of length;

b8: for CHUNK_LAnd CHUNK_RThe merge operations, also known as compact operations, are performed separately, where the compact operations are CHUNK_LOr CHUNK_RStarting to place together the values of 1 for their respective corresponding flag bit vectors;

b9: and entering subsequent processing when the set of incomplete tasks is empty, and returning to the step B3 when the set of incomplete tasks is not empty.

Preferably, for each thread, according to twoDivide the search mode in Chunk_RSet matching interval [ left, right ] under the condition of finding key](ii) a Wherein, the tid bit of BitVec1 is set as 1, and the matching interval [ left, right ] in BitVec2]The bit of (1) is set to 1 and all threads are synchronized.

Preferably, the compact operation comprises the steps of:

b8-1: setting a counter b to be 0 and a counter c to be 0;

b8-2: reading the B-th bit of BitVec1 or BitVec2 and entering step B8-3 if its value is 1, reading the B-th bit of BitVec1 or BitVec2 and entering step B8-4 if its value is not 1;

b8-3: setting the value of CHUNK [ c ] as CHUNK [ b ] and increasing the value of c by 1 on the original basis;

b8-4: setting a counter b as b + 1;

b8-5: enters subsequent processing on condition that B is equal to or greater than the length of BitVec, and returns to step B8-2 on condition that B is less than the length of BitVec1 or BitVec 2.

Preferably, the incomplete task set refers to a set of all incomplete Join tasks,

preferably, the starting number of GPU threads preferred by the present system may be determined according to the GPU architecture, for example, 2048 in the nvidia kernel architecture. The bit vector mark is a vector consisting of bits and is used for marking the position of Join task matching in the array to be processed. The key value key is set according to the column where the data participating in the Join task is located. And the matching interval [ left, right ] is a result interval matched by the Join task in the data block to be operated. The bit value bit of the flag bit vector means a bit, and setting it to 1 indicates that the data for which the position is a part of the execution result of the Join task. CHUNK [ c ] represents the c-th data in the array CHUNK.

Example 5

The performance testing and analysis is performed for the query processing system of the present invention. The details are as follows.

The system of the invention is developed on the triple bit of the open source system, and all the required dynamic libraries are boost libraries and dynamic libraries related to multithreading. The development language used by the system is C + +. The compiling mode is g + +, and the development environment is a Linux operating system. Testing can be performed in the following test environment.

The invention adopts an ultra-large scale data set LUBM containing more than 10 hundred million triples as a test object. Various data sets of different sizes are generated in a standardized manner to assess the scalability of the query system. The generated data sets are all based on one Ontology. For example, a LUBM 500M dataset and a LUBM 1000M dataset are generated with original sizes of 115.88GB and 231.95GB, respectively. The server used by RDF3X, TripleBit and the query system tripleparallell of the present invention for LUBM 500M dataset test employs hardware configuration 1, operating system 1 and software configuration 1. The RDF3X, TripleBit and the server used by the query system tripleparallell of the present invention for LUBM 1000M dataset test employ hardware configuration 2, operating system 2 and software configuration 2. Preferably, in query statements executed on the LUBM 500M dataset and the LUBM 1000M dataset, Q1, Q2, Q5 and Q6 adopt cyclic connections, and Q3, Q4 and Q7 adopt acyclic connections.

Although the present invention has been described in detail, modifications within the spirit and scope of the invention will be apparent to those skilled in the art. Such modifications are also considered to be part of this disclosure. In view of the foregoing discussion, relevant knowledge in the art, and references or information discussed above in connection with the background, all of which are incorporated herein by reference, further description is deemed unnecessary. Further, it should be understood that aspects of the invention and portions of the various embodiments may be combined or interchanged both in whole or in part. Also, those of ordinary skill in the art will appreciate that the foregoing description is by way of example only, and is not intended to limit the invention.

The foregoing discussion of the disclosure has been presented for purposes of illustration and description. It is not intended to be limited to the form disclosed herein. In the foregoing detailed description, for example, various features of the disclosure are grouped together in one or more embodiments, configurations, or aspects for the purpose of streamlining the disclosure. Features of the embodiments, configurations or aspects may be combined in alternative embodiments, configurations or aspects to those discussed above. This method of disclosure is not to be interpreted as reflecting an intention that the disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment, configuration, or aspect. Thus the following claims are hereby incorporated into the detailed description, with each claim standing on its own as a separate embodiment of the disclosure.

Moreover, although the description of the present disclosure has included description of one or more embodiments, configurations, or aspects and certain variations and modifications, other variations, combinations, and modifications are within the scope of the disclosure, e.g., as may be within the skill and knowledge of those in the art, after understanding the present disclosure. It is intended to obtain rights which include alternative embodiments, configurations, or aspects to the extent permitted, including alternate, interchangeable and/or equivalent structures, functions, ranges or steps to those claimed, whether or not such alternate, interchangeable and/or equivalent structures, functions, ranges or steps are disclosed herein, and without intending to publicly dedicate any patentable subject matter.

Claims

1. A CPU-GPU collaborative query processing system for RDF graph data, comprising at least one CPU and at least one GPU, wherein the CPU is configured to:

analyzing based on a query statement submitted by a user to acquire first information, wherein the first information comprises a three-tuple mode, a public variable and a projection variable;

selecting a three-tuple mode as a representative mode for each public variable, and dividing data corresponding to each representative mode to obtain a plurality of data blocks;

sequentially reading the data blocks to generate connection tasks among different data blocks, and sequentially transmitting the data blocks to a GPU video memory;

in the case where the CPU detects that the data blocks on which the connection task depends are all transferred into the GPU memory, the GPU is configured to:

executing the connection task by the GPU and generating a corresponding intermediate result;

under the condition that the connection tasks among the different data blocks are all executed and/or all the data blocks are transmitted to a GPU video memory, different intermediate results are fully connected, and the final results are collected and fed back in a projection variable sequence mode;

the step of selecting a triple mode as a representative mode for each common variable and dividing the data corresponding to each representative mode to obtain a plurality of data blocks at least comprises the following steps:

sorting the N triplet patterns based on the number and/or degree of selection of the common variables;

setting the initial public variable representation mode not to be null, wherein the representation mode is null to indicate that the corresponding public variable has not selected the representation mode;

reading a kth triplet mode, and setting the kth triplet mode as a representative mode of a common variable of the kth triplet mode under the condition that the kth triplet mode only has one common variable and the common variable does not select the representative mode, wherein k is an integer which is larger than zero and smaller than N;

setting a kth triad mode as a new representative mode of the common variables of the selected representative mode under the condition that a first common variable and a second common variable exist in the kth triad mode and only one representative mode is not selected; under the condition that the representative mode is not selected by the first common variable and the second common variable, setting the representative mode of the common variable with smaller selection degree in the first common variable and the second common variable as the kth triad mode;

in the case where k is greater than or equal to N, a preliminary query plan is generated for each common variable.

2. The query processing system of claim 1, wherein said obtaining a number of data blocks further comprises the steps of:

3. The query processing system of claim 1, wherein the generating of the join task and the transferring of the block of data into the GPU video memory comprises at least the steps of:

setting a data block corresponding to each representative mode;

under the condition that the first common variable and the second common variable exist in the representative mode, reading the nth data block and creating the connection task according to the value of the second common variable; or reading the nth data block, reading the data block of the non-representative mode based on the first common variable according to the interval of the first common variable, and creating a connection task depending on the nth data block and the data block of the non-representative mode, wherein N is an integer greater than zero and less than N;

and transmitting the data block to a GPU video memory under the condition that N is larger than or equal to N.

4. A query processing system as claimed in claim 3, wherein reading the nth data block and creating the join task from the value of its second common variable comprises at least the steps of:

determining the number M of data blocks in a representative mode of a second public variable, and establishing a first array according to the number of the data blocks, wherein for the jth data block of the representative mode of the second public variable, the initial value of the first array is set to be zero to indicate that the connection task is not established between the current data block and the jth data block;

reading ith row data of the current data block, and setting the value of the second public variable as valY;

under the condition that the valY is searched in the data blocks of the representative mode of the second public variable in a binary search mode, selecting H data blocks in the data blocks of the representative mode of the second public variable and setting the value of a first array of the H data blocks to be 1 to indicate that a connection task taking the valY as a key value is established;

and under the condition that the current data block is completely read, creating the connection task according to the content of the first array.

5. The query processing system of claim 4, wherein executing the join task by the GPU and generating corresponding intermediate results when the data blocks on which the join task depends are all transferred into the GPU memory comprises at least the steps of:

determining a left operand and a right operand of the join task;

traversing an incomplete task set depending on the transmitted data blocks, setting the initial value of a counter a to be 1 and starting m GPU threads under the condition that all the data blocks depending on the corresponding tasks are found to be transmitted to a GPU video memory, wherein a first bit vector mark of a left operand and a second bit vector mark of a right operand are set;

determining a thread ID and expressing the thread ID by tid, reading the first tid row data of a left operand, and setting a key value participating in the connection task as a key;

setting the bit position value of a matching interval marked by a second bit vector as 1 under the condition that the key value key is found in a right operand according to a binary search mode, wherein the matching interval is a result interval matched by a connection task in a data block to be operated;

in the case that (a + m) is larger than the length of the left operand, respectively performing a merging operation on the left operand and the right operand, wherein the merging operation places the marked bit vectors with the respective corresponding values of 1 together starting from the starting address of the left operand or the right operand;

and entering first subsequent processing when the unfinished task set is empty, wherein the first subsequent processing is to perform full connection on the intermediate result, and collect and feed back the final result according to the projection variable sequence.

6. The query processing system of claim 5, wherein executing the join task by the GPU and generating corresponding intermediate results when the data blocks on which the join task depends are all transferred into the GPU memory further comprises:

and aiming at each thread, setting the bit value of the tid bit of the first bit vector mark as 1 to represent that the data corresponding to the bit belongs to the execution result of the connection task under the condition that the key value key is found in the right operand according to a binary search mode, setting the bit value of the matching interval of the second bit vector mark as 1, and synchronizing all threads.

7. A CPU-GPU collaborative query processing method of RDF graph data is characterized in that the query processing method comprises the following steps:

the CPU analyzes the query statement submitted by the user to acquire first information, wherein the first information comprises a three-tuple mode, a public variable and a projection variable;

meanwhile, selecting a three-tuple mode as a representative mode for each public variable, and dividing data corresponding to each representative mode to obtain a plurality of data blocks;

under the condition that the CPU detects that the data blocks on which the connection tasks depend are all transmitted to a GPU video memory, the GPU executes the connection tasks and generates corresponding intermediate results;

under the condition that the connection tasks among the different data blocks are executed and completed and/or all the data blocks are transmitted to a GPU video memory, different intermediate results are fully connected, and the final results are collected and fed back in a projection variable sequence mode;