CN106599095B - Branch reduction method based on complete historical record - Google Patents

Branch reduction method based on complete historical record Download PDF

Info

Publication number
CN106599095B
CN106599095B CN201611056390.XA CN201611056390A CN106599095B CN 106599095 B CN106599095 B CN 106599095B CN 201611056390 A CN201611056390 A CN 201611056390A CN 106599095 B CN106599095 B CN 106599095B
Authority
CN
China
Prior art keywords
query
record
branch
result
complete history
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611056390.XA
Other languages
Chinese (zh)
Other versions
CN106599095A (en
Inventor
陈海波
姚友阳
陈榕
臧斌宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201611056390.XA priority Critical patent/CN106599095B/en
Publication of CN106599095A publication Critical patent/CN106599095A/en
Application granted granted Critical
Publication of CN106599095B publication Critical patent/CN106599095B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a branch reduction method based on a complete historical record, which comprises the following steps of 1: a client sends a query request, and a server receives the query request; step 2: the server analyzes the query request and decomposes the query statement into small steps for execution; and step 3: executing the query process in small steps according to the query to obtain a query intermediate result, and performing corresponding branch reduction operation on the intermediate result, wherein the branch reduction operation is divided into simple branch reduction operation and branch reduction operation according to a complete historical record; and 4, step 4: and adding the result after the branch reduction and all the historical results into a new historical record table together, and transmitting the result to the next small-step query for continuing the branch reduction. Compared with the prior art, the method can eliminate useless intermediate results as early as possible according to the complete history record, fully considers the characteristics of a high performance network (RDMA), reduces the communication overhead, and can avoid the final result merging operation with huge overhead compared with the traditional one-step branch reducing method, thereby greatly improving the performance of the query system.

Description

Branch reduction method based on complete historical record
Technical Field
The invention relates to a branch reduction method for graph query, in particular to a branch reduction method based on a complete historical record.
Background
Graph structure data is more and more common in large-scale network applications, particularly, a large amount of data presents free and rich relevance, and graph data with strong relevance is widely applied to various network applications, such as some commercial search engines including Google (Google) and necessity (Bing) which use rdf (resource Description framework) to explicitly represent the content of a webpage. For the network applications for processing massive graph data, the execution speed of the online query of the user is a very critical ring, wherein a possible result branch reducing method is one of important means for reducing delay, and the efficient branch reducing method can eliminate incorrect results early, reduce communication overhead and improve the overall performance of a query system.
Remote Direct Memory Access (RDMA) is a high-performance network communication technology, which can directly Access Remote Memory addresses, including Direct read and write operations, and exhibits low latency and high throughput because RDMA can completely bypass the CPU of a target machine without the assistance of the target machine CPU, thereby exhibiting great advantages compared to conventional network communication. One significant property of RDMA is that the delay of RDMA remains substantially constant at low latency for a certain transfer data size, since a small amount of data does not fill up a high network bandwidth.
When a system executes a query request of a user, a plurality of useless intermediate results are usually generated, and if the results are reserved till the results are finally removed, huge waste of resources and large communication overhead are necessarily caused, so that the existing system generally adopts some specific branch reduction methods to remove the useless intermediate results. The existing RDF query system usually adopts a method of combining single-step branch reduction and final result merging operation to obtain the final result required by the user, and this method only includes the result of the previous step when each step is executed, so that it is not possible to completely eliminate the useless result, which brings extra communication overhead, and in addition, because each step still includes the useless result, it is necessary to concentrate all the results on one machine for merging operation at the end of execution, and this process is easily a performance bottleneck of the whole system.
Therefore, how to design an efficient branch reduction method, eliminate useless results as early as possible, reduce communication overhead, and avoid final time-consuming result combining operation as much as possible, thereby improving the overall performance of the distributed query system and accelerating the query process of a user has become a technical problem to be solved urgently by technical staff in the field.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a branch reducing method based on a complete historical record, which can make full use of the characteristics of a high-performance network, eliminate intermediate useless results as early as possible, avoid the final time-consuming merging operation and reduce the delay of a user query request.
The invention provides a branch reduction method based on a complete historical record, which comprises the following steps:
step 1: a client sends a query request, and a server receives the query request;
step 2: the server analyzes the query request and decomposes a query statement in the query request into a plurality of steps for execution, wherein each step in the plurality of steps is marked as a small step;
and step 3: executing the query process according to the small steps to obtain a query intermediate result, and performing branch reduction operation on the intermediate result to obtain a result after branch reduction;
and 4, step 4: and adding the result after the branch reduction and all the historical results into a new historical record table, and transmitting the new historical record table to the next small-step query for continuing the branch reduction.
Preferably, the step 1 comprises: the client selects a server to send a query request, the server monitors the query request, initializes and queries related data, clears a history record table and prepares for executing a query process.
Preferably, the step 2 includes: the server analyzes the query request after receiving the query request, wherein the query request comprises a plurality of query statements, and the server decomposes the query statements into a plurality of small steps for execution according to different query statements.
Preferably, the step 3 comprises:
step 3.1: matching data in the data set according to the small-step query statement, and if the number of entries in the complete history record is empty, performing data matching by starting from a constant in the query statement to obtain an intermediate result; if the number of entries in the complete history record is not null, matching the data set to obtain an intermediate result starting from the values of the variables existing in the query statement and the history record at the same time;
step 3.2: and carrying out different branch reduction operations on the intermediate result obtained by the small step according to the following different conditions:
when the query variable corresponding to the newly added intermediate result does not exist in the complete history, the branch reduction operation is performed according to the constant in the small-step execution, and the intermediate result which does not meet the condition of the constant is removed;
when the query variable corresponding to the newly added intermediate result has a record in the complete history, performing a branch reduction according to whether the variable value in the history is matched with the variable value in the intermediate result, specifically: aiming at each record in the complete history record corresponding to the newly added intermediate result, judging whether the variable value in the history record is equal to the corresponding variable value in the newly added intermediate variable, if not, rejecting the record, otherwise, keeping the record;
wherein, the query variable means: and the unknown quantity in the query statement needs to return a value corresponding to the unknown quantity in the query result.
Preferably, the step 4 comprises:
step 4.1: adding the result after the branch subtraction into the complete history table, adding a new column in the complete history table to represent a newly added query variable, and correspondingly increasing or reducing the number of entries in the complete history table;
step 4.2: the complete history list is transmitted to the next small-step query statement along with the query process for next branch reduction, and different transmission operations are executed according to the data related condition of the next small step as follows:
when the data involved in the execution of the next step is at the local machine, the complete history list is passed locally, without involving network transmission;
when the data involved in the execution of the next step is at the remote machine, the complete history list is sent to the remote server following the query request for continued execution.
Compared with the prior art, the invention has the following beneficial effects:
1. compared with the traditional one-step branch reducing method, the branch reducing method based on the complete historical record can avoid the final result combination operation with huge expense, thereby showing greater performance advantage.
2. The invention fully considers the characteristics of a high performance network (RDMA), and utilizes the characteristics to reduce the communication overhead of transmitting the complete history record as much as possible, so that the communication delay can be kept at a lower level, and the high network bandwidth is fully utilized.
3. The branch reducing method based on the complete historical record can be widely applied to a distributed query system, fully schedules limited resources, reduces the waste of the resources, reduces the delay of query requests as much as possible, and improves the performance of the whole query system.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the present invention using a full history based pruning method.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
The invention provides a branch reduction method based on a complete historical record, which comprises the following steps:
step 0: a plurality of servers load original data in parallel, distribute the data and perform some initialization operations;
step 1: a server receives a query request sent by a client;
step 2: the server analyzes the query request, and decomposes the query statement into a plurality of small steps (generally not more than 15 small steps) for execution;
and step 3: executing the query process according to the query small steps to obtain a query intermediate result, and performing corresponding branch reduction operation on the intermediate result;
and 4, step 4: and adding the result after the branch reduction and all the historical results into a new historical record table together, and transmitting the result to the next small-step query for continuing the branch reduction.
The step 1 comprises the following steps: the client selects a server with lower load (according to the number of the requests executed by the server) to send the query request, the server monitors the request, initializes the query related data, clears the history record result table and makes corresponding preparation for executing the query.
The step 2 comprises the following steps: the server analyzes the query request after receiving the query request, wherein the query request generally consists of a plurality of query statements, the server decomposes the query statements into a plurality of execution steps according to different query statements, and the query request generally consists of a plurality of triples under the RDF data format, so that the execution steps are divided according to the triples.
The step 3 comprises the following steps:
step 3.1: performing matching operation on data in the data set according to the statement of the query substep, and if the number of entries in the complete history record is empty, which is the case that the first query substep is usually executed, simply performing data matching by starting from a constant in the query statement to obtain an intermediate result; if the number of entries in the complete history record is not null, matching the data set to obtain an intermediate result starting from the value of a variable existing in the history record in the query statement at the same time, wherein the intermediate result refers to a value corresponding to the other end of the triple relative to the variable;
step 3.2: and carrying out different branch reduction operations on the middles obtained by the small step according to different conditions:
when the query variable corresponding to the newly added result does not exist in the complete history, a simple branch reduction operation is performed according to the constant in the query step, that is, the branch reduction is performed by judging whether the intermediate result obtained in step 3.1 is equal to the constant, and the result which does not meet the condition of the constant is removed;
when the query variable corresponding to the newly added result has a record in the complete history, the branch is subtracted according to whether the variable value in the history matches the new value, specifically: judging the value corresponding to the variable in the history record and the value corresponding to the variable in the new result to be equal for each record in the complete history record corresponding to the newly generated result, and rejecting the record if the values are not equal;
wherein, the query variable means: for the unknowns in the query statement, the query system needs to return their corresponding values in the query result.
The step 4 comprises the following steps:
step 4.1: adding the new result after the branch subtraction into the complete history table, wherein a column needs to be added in the complete history table to represent the newly added query variable, and the number of entries in the complete history table is correspondingly increased or reduced (the number of intermediate results is reduced due to the branch subtraction), wherein the number of entries refers to the number of rows in the complete history table;
step 4.2: after adding the new result into the complete history record table, the complete history record is transmitted to the next small-step query statement along with the query process for next branch reduction, and different transmission operations are executed according to the specific data related conditions of the next small step:
when the data involved in the execution of the next step is at the local machine, the complete history need only be simply passed locally, not involving network transmission;
when the data involved in the execution of the next step is at the remote machine, the complete history needs to be sent to the remote server to continue execution following the query sub-request;
more specifically, the complete history of the transmission in this step utilizes the characteristics of the high performance network (RDMA), and the RDMA communication mode has a significant characteristic: when the size of the transmitted data is small (e.g., less than 2000 bytes), the delay of the transmission remains low and substantially constant. The invention utilizes the characteristic to transmit the complete history record with smaller data volume, and can achieve higher transmission efficiency and lower transmission delay. This is because the number of steps and query variables of the query in the RDF query are usually small, the history is usually converted into a digital ID for representation, the data size is small, and if the next small step is executed locally, it is equivalent to locally transmitting the complete history, which can avoid the communication process.
The branch reduction method is realized based on the complete history record, wherein the complete history record is stored by using a dynamic table structure and is marked as a complete history record table, the complete history record table consists of columns and rows, the columns are used for representing query variables contained in a user query request, and the rows are used for storing record items of history results. The table is dynamically changed in the query process, that is, the number of rows and columns of the table may be increased or decreased in the query process, because results may be added in the query process (for example, a certain step of query is executed), or the results may be pruned (when a query statement has a loop condition), the dynamic table structure is used for storing the complete history record, which is concise and convenient, and the efficiency of operations for deleting rows and columns on the table and increasing the number of rows and columns can be ensured.
The invention adopts a branch reduction method based on the complete historical record instead of the traditional single-step branch reduction method, and the main reason is that the traditional branch reduction method always causes larger additional expenditure. The conventional single-step pruning method has the following problems:
(1) high communication overhead is caused, redundant useless intermediate results are always reserved to the end in the single-step branch reduction method, and therefore a large number of intermediate results cannot be eliminated, and communication waste is caused;
(2) and finally, result merging operation is performed, and since single-step pruning can only be judged according to the result of the previous step and all useless results cannot be eliminated, after the query statement is executed, some results still do not meet the final requirement, and finally all results need to be concentrated on one machine for final result merging operation, which may become the performance bottleneck of the whole system.
Compared with the traditional branch reduction method, the branch reduction method based on the complete historical record has the following advantages that:
(1) the time-consuming final result combination operation in the traditional branch reduction method is effectively avoided, all useless results can be completely eliminated in the execution process by transmitting the complete history record, the result combination is not required to be carried out after the final execution is finished, and all results required by users are contained in the history record;
(2) the RDMA communication is efficiently utilized to reduce the expense of transmitting the complete history record, and the RMDA friendly communication mode is adopted, so that the network bandwidth is effectively utilized, the transmission delay is reduced, and the resource waste in the traditional method is avoided.
In summary, the branch reduction method based on the complete history record provided by the invention can eliminate useless results as early as possible, save network bandwidth, fully utilize the characteristic of high-performance network transmission, and keep lower communication delay.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (4)

1. A pruning method based on a complete historical record is characterized by comprising the following steps:
step 1: a client sends a query request, and a server receives the query request;
step 2: the server analyzes the query request and decomposes a query statement in the query request into a plurality of steps for execution, wherein each step in the plurality of steps is marked as a small step;
and step 3: executing the query process according to the small steps to obtain a query intermediate result, and performing branch reduction operation on the intermediate result to obtain a result after branch reduction;
and 4, step 4: adding the result after the branch reduction and all the historical results into a new historical record table, and transmitting the new historical record table to the next small-step query for continuing the branch reduction;
the step 3 comprises the following steps:
step 3.1: matching data in the data set according to the small-step query statement, and if the number of entries in the complete history record is empty, performing data matching by starting from a constant in the query statement to obtain an intermediate result; if the number of entries in the complete history record is not null, matching the data set to obtain an intermediate result starting from the values of the variables existing in the query statement and the history record at the same time;
step 3.2: and carrying out different branch reduction operations on the intermediate result obtained by the small step according to the following different conditions:
when the query variable corresponding to the newly added intermediate result does not exist in the complete history, the branch reduction operation is performed according to the constant in the small-step execution, and the intermediate result which does not meet the condition of the constant is removed;
when the query variable corresponding to the newly added intermediate result has a record in the complete history, performing a branch reduction according to whether the variable value in the history is matched with the variable value in the intermediate result, specifically: aiming at each record in the complete history record corresponding to the newly added intermediate result, judging whether the variable value in the history record is equal to the corresponding variable value in the newly added intermediate variable, if not, rejecting the record, otherwise, keeping the record;
wherein, the query variable means: and the unknown quantity in the query statement needs to return a value corresponding to the unknown quantity in the query result.
2. The complete history record based pruning method according to claim 1, characterized in that the step 1 comprises: the client selects a server to send a query request, the server monitors the query request, initializes and queries related data, clears a history record table and prepares for executing a query process.
3. The complete history record based pruning method according to claim 1, characterized in that the step 2 comprises: the server analyzes the query request after receiving the query request, wherein the query request comprises a plurality of query statements, and the server decomposes the query statements into a plurality of small steps for execution according to different query statements.
4. The complete history record based pruning method according to claim 1, characterized in that the step 4 comprises:
step 4.1: adding the result after the branch subtraction into the complete history table, adding a new column in the complete history table to represent a newly added query variable, and correspondingly increasing or reducing the number of entries in the complete history table;
step 4.2: the complete history list is transmitted to the next small-step query statement along with the query process for next branch reduction, and different transmission operations are executed according to the data related condition of the next small step as follows:
when the data involved in the execution of the next step is at the local machine, the complete history list is passed locally, without involving network transmission;
when the data involved in the execution of the next step is at the remote machine, the complete history list is sent to the remote server following the query request for continued execution.
CN201611056390.XA 2016-11-24 2016-11-24 Branch reduction method based on complete historical record Active CN106599095B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611056390.XA CN106599095B (en) 2016-11-24 2016-11-24 Branch reduction method based on complete historical record

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611056390.XA CN106599095B (en) 2016-11-24 2016-11-24 Branch reduction method based on complete historical record

Publications (2)

Publication Number Publication Date
CN106599095A CN106599095A (en) 2017-04-26
CN106599095B true CN106599095B (en) 2020-07-14

Family

ID=58591987

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611056390.XA Active CN106599095B (en) 2016-11-24 2016-11-24 Branch reduction method based on complete historical record

Country Status (1)

Country Link
CN (1) CN106599095B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108491274A (en) * 2018-04-02 2018-09-04 深圳市华傲数据技术有限公司 Optimization method, device, storage medium and the equipment of distributed data management

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254001A (en) * 2011-07-14 2011-11-23 青岛海信网络科技股份有限公司 Efficient data management method and system
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN103455556A (en) * 2013-08-08 2013-12-18 成都市欧冠信息技术有限责任公司 Intelligent storage unit data clipping process
CN103593435A (en) * 2013-11-12 2014-02-19 河海大学 Approximate treatment system and method for uncertain data PT-TopK query

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8984019B2 (en) * 2012-11-20 2015-03-17 International Business Machines Corporation Scalable summarization of data graphs

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254001A (en) * 2011-07-14 2011-11-23 青岛海信网络科技股份有限公司 Efficient data management method and system
CN102546247A (en) * 2011-12-29 2012-07-04 华中科技大学 Massive data continuous analysis system suitable for stream processing
CN103455556A (en) * 2013-08-08 2013-12-18 成都市欧冠信息技术有限责任公司 Intelligent storage unit data clipping process
CN103593435A (en) * 2013-11-12 2014-02-19 河海大学 Approximate treatment system and method for uncertain data PT-TopK query

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
gStore: Answering SPARQL Queries via Subgraph Matching;Lei Zou 等;《Proceedings of the VLDB Eenowment》;20110531;第8卷(第4期);482-493 *

Also Published As

Publication number Publication date
CN106599095A (en) 2017-04-26

Similar Documents

Publication Publication Date Title
US10311055B2 (en) Global query hint specification
US20160147837A1 (en) Multisource semantic partitioning
US20180357111A1 (en) Data center operation
US9460154B2 (en) Dynamic parallel aggregation with hybrid batch flushing
US9298774B2 (en) Changing the compression level of query plans
US9576026B2 (en) System and method for distributed SQL join processing in shared-nothing relational database clusters using self directed data streams
CN111459418B (en) RDMA (remote direct memory Access) -based key value storage system transmission method
US11614970B2 (en) High-throughput parallel data transmission
WO2019219010A1 (en) Data migration method and device and computer readable storage medium
JP2021511588A (en) Data query methods, devices and devices
WO2018035799A1 (en) Data query method, application and database servers, middleware, and system
US8812489B2 (en) Swapping expected and candidate affinities in a query plan cache
CN103309958A (en) OLAP star connection query optimizing method under CPU and GPU mixing framework
CN107291770B (en) Mass data query method and device in distributed system
EP3251030B1 (en) Workload aware data placement for join-based query processing in a cluster
WO2023273544A1 (en) Log file storage method and apparatus, device, and storage medium
CN113568938B (en) Data stream processing method and device, electronic equipment and storage medium
CN114356971A (en) Data processing method, device and system
WO2023082681A1 (en) Data processing method and apparatus based on batch-stream integration, computer device, and medium
CN107636655B (en) System and method for providing data as a service (DaaS) in real time
AU2019241002B2 (en) Transaction processing method and system, and server
US9229969B2 (en) Management of searches in a database system
CN106599095B (en) Branch reduction method based on complete historical record
US20190327342A1 (en) Methods and electronic devices for data transmission and reception
US10866960B2 (en) Dynamic execution of ETL jobs without metadata repository

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant