CN106599095B

CN106599095B - Branch reduction method based on complete historical record

Info

Publication number: CN106599095B
Application number: CN201611056390.XA
Authority: CN
Inventors: 陈海波; 姚友阳; 陈榕; 臧斌宇
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2016-11-24
Filing date: 2016-11-24
Publication date: 2020-07-14
Anticipated expiration: 2036-11-24
Also published as: CN106599095A

Abstract

The invention provides a branch reduction method based on a complete historical record, which comprises the following steps of 1: a client sends a query request, and a server receives the query request; step 2: the server analyzes the query request and decomposes the query statement into small steps for execution; and step 3: executing the query process in small steps according to the query to obtain a query intermediate result, and performing corresponding branch reduction operation on the intermediate result, wherein the branch reduction operation is divided into simple branch reduction operation and branch reduction operation according to a complete historical record; and 4, step 4: and adding the result after the branch reduction and all the historical results into a new historical record table together, and transmitting the result to the next small-step query for continuing the branch reduction. Compared with the prior art, the method can eliminate useless intermediate results as early as possible according to the complete history record, fully considers the characteristics of a high performance network (RDMA), reduces the communication overhead, and can avoid the final result merging operation with huge overhead compared with the traditional one-step branch reducing method, thereby greatly improving the performance of the query system.

Description

Branch reduction method based on complete historical record

Technical Field

The invention relates to a branch reduction method for graph query, in particular to a branch reduction method based on a complete historical record.

Background

Graph structure data is more and more common in large-scale network applications, particularly, a large amount of data presents free and rich relevance, and graph data with strong relevance is widely applied to various network applications, such as some commercial search engines including Google (Google) and necessity (Bing) which use rdf (resource Description framework) to explicitly represent the content of a webpage. For the network applications for processing massive graph data, the execution speed of the online query of the user is a very critical ring, wherein a possible result branch reducing method is one of important means for reducing delay, and the efficient branch reducing method can eliminate incorrect results early, reduce communication overhead and improve the overall performance of a query system.

Remote Direct Memory Access (RDMA) is a high-performance network communication technology, which can directly Access Remote Memory addresses, including Direct read and write operations, and exhibits low latency and high throughput because RDMA can completely bypass the CPU of a target machine without the assistance of the target machine CPU, thereby exhibiting great advantages compared to conventional network communication. One significant property of RDMA is that the delay of RDMA remains substantially constant at low latency for a certain transfer data size, since a small amount of data does not fill up a high network bandwidth.

When a system executes a query request of a user, a plurality of useless intermediate results are usually generated, and if the results are reserved till the results are finally removed, huge waste of resources and large communication overhead are necessarily caused, so that the existing system generally adopts some specific branch reduction methods to remove the useless intermediate results. The existing RDF query system usually adopts a method of combining single-step branch reduction and final result merging operation to obtain the final result required by the user, and this method only includes the result of the previous step when each step is executed, so that it is not possible to completely eliminate the useless result, which brings extra communication overhead, and in addition, because each step still includes the useless result, it is necessary to concentrate all the results on one machine for merging operation at the end of execution, and this process is easily a performance bottleneck of the whole system.

Therefore, how to design an efficient branch reduction method, eliminate useless results as early as possible, reduce communication overhead, and avoid final time-consuming result combining operation as much as possible, thereby improving the overall performance of the distributed query system and accelerating the query process of a user has become a technical problem to be solved urgently by technical staff in the field.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a branch reducing method based on a complete historical record, which can make full use of the characteristics of a high-performance network, eliminate intermediate useless results as early as possible, avoid the final time-consuming merging operation and reduce the delay of a user query request.

The invention provides a branch reduction method based on a complete historical record, which comprises the following steps:

step 1: a client sends a query request, and a server receives the query request;

step 2: the server analyzes the query request and decomposes a query statement in the query request into a plurality of steps for execution, wherein each step in the plurality of steps is marked as a small step;

and step 3: executing the query process according to the small steps to obtain a query intermediate result, and performing branch reduction operation on the intermediate result to obtain a result after branch reduction;

and 4, step 4: and adding the result after the branch reduction and all the historical results into a new historical record table, and transmitting the new historical record table to the next small-step query for continuing the branch reduction.

Preferably, the step 1 comprises: the client selects a server to send a query request, the server monitors the query request, initializes and queries related data, clears a history record table and prepares for executing a query process.

Preferably, the step 2 includes: the server analyzes the query request after receiving the query request, wherein the query request comprises a plurality of query statements, and the server decomposes the query statements into a plurality of small steps for execution according to different query statements.

Preferably, the step 3 comprises:

step 3.1: matching data in the data set according to the small-step query statement, and if the number of entries in the complete history record is empty, performing data matching by starting from a constant in the query statement to obtain an intermediate result; if the number of entries in the complete history record is not null, matching the data set to obtain an intermediate result starting from the values of the variables existing in the query statement and the history record at the same time;

step 3.2: and carrying out different branch reduction operations on the intermediate result obtained by the small step according to the following different conditions:

when the query variable corresponding to the newly added intermediate result does not exist in the complete history, the branch reduction operation is performed according to the constant in the small-step execution, and the intermediate result which does not meet the condition of the constant is removed;

when the query variable corresponding to the newly added intermediate result has a record in the complete history, performing a branch reduction according to whether the variable value in the history is matched with the variable value in the intermediate result, specifically: aiming at each record in the complete history record corresponding to the newly added intermediate result, judging whether the variable value in the history record is equal to the corresponding variable value in the newly added intermediate variable, if not, rejecting the record, otherwise, keeping the record;

wherein, the query variable means: and the unknown quantity in the query statement needs to return a value corresponding to the unknown quantity in the query result.

Preferably, the step 4 comprises:

step 4.1: adding the result after the branch subtraction into the complete history table, adding a new column in the complete history table to represent a newly added query variable, and correspondingly increasing or reducing the number of entries in the complete history table;

step 4.2: the complete history list is transmitted to the next small-step query statement along with the query process for next branch reduction, and different transmission operations are executed according to the data related condition of the next small step as follows:

when the data involved in the execution of the next step is at the local machine, the complete history list is passed locally, without involving network transmission;

when the data involved in the execution of the next step is at the remote machine, the complete history list is sent to the remote server following the query request for continued execution.

Compared with the prior art, the invention has the following beneficial effects:

1. compared with the traditional one-step branch reducing method, the branch reducing method based on the complete historical record can avoid the final result combination operation with huge expense, thereby showing greater performance advantage.

2. The invention fully considers the characteristics of a high performance network (RDMA), and utilizes the characteristics to reduce the communication overhead of transmitting the complete history record as much as possible, so that the communication delay can be kept at a lower level, and the high network bandwidth is fully utilized.

3. The branch reducing method based on the complete historical record can be widely applied to a distributed query system, fully schedules limited resources, reduces the waste of the resources, reduces the delay of query requests as much as possible, and improves the performance of the whole query system.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a flow chart of the present invention using a full history based pruning method.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

step 0: a plurality of servers load original data in parallel, distribute the data and perform some initialization operations;

step 1: a server receives a query request sent by a client;

step 2: the server analyzes the query request, and decomposes the query statement into a plurality of small steps (generally not more than 15 small steps) for execution;

and step 3: executing the query process according to the query small steps to obtain a query intermediate result, and performing corresponding branch reduction operation on the intermediate result;

and 4, step 4: and adding the result after the branch reduction and all the historical results into a new historical record table together, and transmitting the result to the next small-step query for continuing the branch reduction.

The step 1 comprises the following steps: the client selects a server with lower load (according to the number of the requests executed by the server) to send the query request, the server monitors the request, initializes the query related data, clears the history record result table and makes corresponding preparation for executing the query.

The step 2 comprises the following steps: the server analyzes the query request after receiving the query request, wherein the query request generally consists of a plurality of query statements, the server decomposes the query statements into a plurality of execution steps according to different query statements, and the query request generally consists of a plurality of triples under the RDF data format, so that the execution steps are divided according to the triples.

The step 3 comprises the following steps:

step 3.1: performing matching operation on data in the data set according to the statement of the query substep, and if the number of entries in the complete history record is empty, which is the case that the first query substep is usually executed, simply performing data matching by starting from a constant in the query statement to obtain an intermediate result; if the number of entries in the complete history record is not null, matching the data set to obtain an intermediate result starting from the value of a variable existing in the history record in the query statement at the same time, wherein the intermediate result refers to a value corresponding to the other end of the triple relative to the variable;

step 3.2: and carrying out different branch reduction operations on the middles obtained by the small step according to different conditions:

when the query variable corresponding to the newly added result does not exist in the complete history, a simple branch reduction operation is performed according to the constant in the query step, that is, the branch reduction is performed by judging whether the intermediate result obtained in step 3.1 is equal to the constant, and the result which does not meet the condition of the constant is removed;

when the query variable corresponding to the newly added result has a record in the complete history, the branch is subtracted according to whether the variable value in the history matches the new value, specifically: judging the value corresponding to the variable in the history record and the value corresponding to the variable in the new result to be equal for each record in the complete history record corresponding to the newly generated result, and rejecting the record if the values are not equal;

wherein, the query variable means: for the unknowns in the query statement, the query system needs to return their corresponding values in the query result.

The step 4 comprises the following steps:

step 4.1: adding the new result after the branch subtraction into the complete history table, wherein a column needs to be added in the complete history table to represent the newly added query variable, and the number of entries in the complete history table is correspondingly increased or reduced (the number of intermediate results is reduced due to the branch subtraction), wherein the number of entries refers to the number of rows in the complete history table;

step 4.2: after adding the new result into the complete history record table, the complete history record is transmitted to the next small-step query statement along with the query process for next branch reduction, and different transmission operations are executed according to the specific data related conditions of the next small step:

when the data involved in the execution of the next step is at the local machine, the complete history need only be simply passed locally, not involving network transmission;

when the data involved in the execution of the next step is at the remote machine, the complete history needs to be sent to the remote server to continue execution following the query sub-request;

more specifically, the complete history of the transmission in this step utilizes the characteristics of the high performance network (RDMA), and the RDMA communication mode has a significant characteristic: when the size of the transmitted data is small (e.g., less than 2000 bytes), the delay of the transmission remains low and substantially constant. The invention utilizes the characteristic to transmit the complete history record with smaller data volume, and can achieve higher transmission efficiency and lower transmission delay. This is because the number of steps and query variables of the query in the RDF query are usually small, the history is usually converted into a digital ID for representation, the data size is small, and if the next small step is executed locally, it is equivalent to locally transmitting the complete history, which can avoid the communication process.

The branch reduction method is realized based on the complete history record, wherein the complete history record is stored by using a dynamic table structure and is marked as a complete history record table, the complete history record table consists of columns and rows, the columns are used for representing query variables contained in a user query request, and the rows are used for storing record items of history results. The table is dynamically changed in the query process, that is, the number of rows and columns of the table may be increased or decreased in the query process, because results may be added in the query process (for example, a certain step of query is executed), or the results may be pruned (when a query statement has a loop condition), the dynamic table structure is used for storing the complete history record, which is concise and convenient, and the efficiency of operations for deleting rows and columns on the table and increasing the number of rows and columns can be ensured.

The invention adopts a branch reduction method based on the complete historical record instead of the traditional single-step branch reduction method, and the main reason is that the traditional branch reduction method always causes larger additional expenditure. The conventional single-step pruning method has the following problems:

(1) high communication overhead is caused, redundant useless intermediate results are always reserved to the end in the single-step branch reduction method, and therefore a large number of intermediate results cannot be eliminated, and communication waste is caused;

(2) and finally, result merging operation is performed, and since single-step pruning can only be judged according to the result of the previous step and all useless results cannot be eliminated, after the query statement is executed, some results still do not meet the final requirement, and finally all results need to be concentrated on one machine for final result merging operation, which may become the performance bottleneck of the whole system.

Compared with the traditional branch reduction method, the branch reduction method based on the complete historical record has the following advantages that:

(1) the time-consuming final result combination operation in the traditional branch reduction method is effectively avoided, all useless results can be completely eliminated in the execution process by transmitting the complete history record, the result combination is not required to be carried out after the final execution is finished, and all results required by users are contained in the history record;

(2) the RDMA communication is efficiently utilized to reduce the expense of transmitting the complete history record, and the RMDA friendly communication mode is adopted, so that the network bandwidth is effectively utilized, the transmission delay is reduced, and the resource waste in the traditional method is avoided.

In summary, the branch reduction method based on the complete history record provided by the invention can eliminate useless results as early as possible, save network bandwidth, fully utilize the characteristic of high-performance network transmission, and keep lower communication delay.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A pruning method based on a complete historical record is characterized by comprising the following steps:

and 4, step 4: adding the result after the branch reduction and all the historical results into a new historical record table, and transmitting the new historical record table to the next small-step query for continuing the branch reduction;

the step 3 comprises the following steps:

2. The complete history record based pruning method according to claim 1, characterized in that the step 1 comprises: the client selects a server to send a query request, the server monitors the query request, initializes and queries related data, clears a history record table and prepares for executing a query process.

3. The complete history record based pruning method according to claim 1, characterized in that the step 2 comprises: the server analyzes the query request after receiving the query request, wherein the query request comprises a plurality of query statements, and the server decomposes the query statements into a plurality of small steps for execution according to different query statements.

4. The complete history record based pruning method according to claim 1, characterized in that the step 4 comprises: