CN109062929B - Query task communication method and system - Google Patents

Query task communication method and system Download PDF

Info

Publication number
CN109062929B
CN109062929B CN201810596030.1A CN201810596030A CN109062929B CN 109062929 B CN109062929 B CN 109062929B CN 201810596030 A CN201810596030 A CN 201810596030A CN 109062929 B CN109062929 B CN 109062929B
Authority
CN
China
Prior art keywords
query
metadata
server
intermediate result
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810596030.1A
Other languages
Chinese (zh)
Other versions
CN109062929A (en
Inventor
陈榕
陈海波
臧斌宇
管海兵
王思源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201810596030.1A priority Critical patent/CN109062929B/en
Publication of CN109062929A publication Critical patent/CN109062929A/en
Application granted granted Critical
Publication of CN109062929B publication Critical patent/CN109062929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a communication method and a communication system for query tasks, which comprise the following steps: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata of a query task; processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result; and if the data depended by the next sub-step is in a remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task. The invention reduces the cost of the whole communication process, avoids the contention of network resources and improves the performance of the whole inquiry system.

Description

Query task communication method and system
Technical Field
The invention relates to the technical field of communication, in particular to a query task communication method based on GPUDirect RDMA.
Background
In the big data era, the data size is getting bigger and bigger, for example, the number of web pages of the internet is as large as billions, and the huge data is often divided into a plurality of parts to be stored in a plurality of machines. To find data of interest in a vast data set, the software that provides the query service is typically run in a distributed environment consisting of multiple machines.
With the continuous development of hardware technology, a server equipped with a high-performance Graphics Processing Unit (GPU) is gradually appeared in a data center, and the GPU has stronger computing performance and higher memory bandwidth than the CPU, so the GPU is often used as an accelerator for computing tasks and is used as a supplement for the CPU. The great-grained GPU, which is widely used in data centers, has its own dedicated memory, which is separated from the system memory (CPU memory) used by the CPU. Therefore, before a calculation task is executed on the GPU, data required for calculation needs to be copied to the GPU memory before the calculation task can be initiated on the GPU.
When processing a query task in a distributed computing environment, it is often necessary to send intermediate results of the query task, involving intercommunication between machines. For example, when the server a sends the intermediate result of the query task to the server B, the intermediate result data needs to be copied from the GPU memory to the CPU memory first, and then the data is sent to the CPU memory of the server B through the network, and the server B copies the data to the GPU memory to continue processing the query task. Obviously, frequent data copying between the GPU and the CPU during the communication process significantly increases the time consumption of the query task, and may cause poor user experience for querying tasks with low delay tolerance.
RDMA: remote Direct Memory Access.
The recent development of GPUDirect RDMA technology by Invviata (NVDIA) aims to reduce unnecessary memory copy during communication between GPU servers and to directly send data in the GPU memory of server A to the GPU memory of server B through a high-performance network. This provides a new possibility for inter-server communication in a distributed computing environment. However, how to utilize the new technology to reduce the processing delay of the query task is a technical problem to be solved.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a query task communication method and a query task communication system.
The communication method for the query task provided by the invention comprises the following steps:
a registration step: distributing and loading the data set on each server in the cluster, and registering a GPU memory and a CPU memory on the servers for GPUDirect RDMA and RDMA respectively;
a query request step: sending the query request to a server in the cluster;
analyzing and inquiring: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata of a query task;
query processing steps: processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result;
if the data depended on by the next sub-step is in the remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task.
Preferably, the registering step includes:
and loading a data set on the servers in the cluster, carrying out initialization work, and respectively registering a GPU memory and a CPU memory in each server.
Preferably, the query requesting step includes:
and after receiving the query request, the server initializes the relevant data of the query task, and empties the intermediate result table to prepare for processing the query task.
Preferably, the step of parsing the query includes:
the server analyzes the query request, the query request comprises a plurality of query statements, and the query request is decomposed into a plurality of sub-steps to be executed according to different query statements; before each substep is performed, the data dependent on the substep is copied from the CPU memory to the GPU memory, and then the processing logic of the substep is executed on the GPU.
Preferably, the query processing step includes:
the server processes the query request from the first substep, and performs matching operation on the data set by using the query condition in the substep; the control flow logic of the query request is executed on a CPU, the matching operation on the data set is executed on a GPU, and a query intermediate result obtained by the matching operation is stored in a GPU memory; the data set is dispersedly stored in the whole cluster, and after a server receiving the query request locally executes a part of sub-steps, the server judges whether the data depended on by the next sub-step is local or not, and if so, the server continues to process the subsequent sub-steps; if not, then sending the intermediate result to the remote server, and executing the subsequent sub-steps by the remote server based on the obtained intermediate result;
the server sending the query intermediate result to the remote end comprises the following steps: taking the initial address of the GPU memory and the size of the query intermediate result as parameters, calling the unilateral operation of the RDMA network card, writing the query intermediate result into the GPU memory of the remote server, and querying the data information of the intermediate result belonging to the query task;
after the server sends the query intermediate result, metadata of a query task needs to be sent, the subsequent sub-steps of the query request are recorded in the metadata, and the remote server executes the subsequent sub-steps according to the metadata; the server serializes the metadata, copies the serialized metadata to a CPU memory, calls the single-side operation of the RDMA network card by taking the initial address of the buffer area and the size of the metadata as parameters, and writes the metadata into the CPU memory of the remote server.
Preferably, after receiving the intermediate result of the query, the remote server copies the intermediate result from the GPU memory to another GPU memory, and records the start address of the another GPU memory;
the remote server continues to receive the metadata of the query task, copies the metadata from the CPU memory to another CPU memory, and obtains the metadata information of the query task after deserialization; storing the recorded starting address of the GPU memory into metadata;
and the remote server executes the control flow logic of the query task on the CPU according to the metadata, continues to execute the subsequent substeps, copies the data depended by the substeps from the CPU memory to the GPU memory, and performs the matching operation of the data set on the GPU based on the intermediate result obtained previously.
The invention provides a query task communication system, which comprises:
a registration module: distributing and loading the data set on each server in the cluster, and registering a GPU memory and a CPU memory on the servers for GPUDirect RDMA and RDMA respectively;
the query request module: sending the query request to a server in the cluster;
an analysis query module: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata of a query task;
the query processing module: processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result;
if the data depended on by the next sub-step is in the remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task.
Preferably, the registration module includes: loading a data set on servers in a cluster, carrying out initialization work, and respectively registering a GPU memory and a CPU memory in each server; the query request module comprises: initializing relevant data of the query task after the server receives the query request, emptying an intermediate result table and preparing for processing the query task;
the parsing query module comprises: the server analyzes the query request, the query request comprises a plurality of query statements, and the query request is decomposed into a plurality of sub-steps to be executed according to different query statements; before each substep is performed, the data dependent on the substep is copied from the CPU memory to the GPU memory, and then the processing logic of the substep is executed on the GPU.
Preferably, the query processing module includes:
the server processes the query request from the first substep, and performs matching operation on the data set by using the query condition in the substep; the control flow logic of the query request is executed on a CPU, the matching operation on the data set is executed on a GPU, and a query intermediate result obtained by the matching operation is stored in a GPU memory; the data set is dispersedly stored in the whole cluster, and after a server receiving the query request locally executes a part of sub-steps, the server judges whether the data depended on by the next sub-step is local or not, and if so, the server continues to process the subsequent sub-steps; if not, then sending the intermediate result to the remote server, and executing the subsequent sub-steps by the remote server based on the obtained intermediate result;
the server sending the query intermediate result to the remote end comprises the following steps: taking the initial address of the GPU memory and the size of the query intermediate result as parameters, calling the unilateral operation of the RDMA network card, writing the query intermediate result into the GPU memory of the remote server, and querying the data information of the intermediate result belonging to the query task;
after the server sends the query intermediate result, metadata of a query task needs to be sent, the subsequent sub-steps of the query request are recorded in the metadata, and the remote server executes the subsequent sub-steps according to the metadata; the server serializes the metadata, copies the serialized metadata to a CPU memory, calls the single-side operation of the RDMA network card by taking the initial address of the buffer area and the size of the metadata as parameters, and writes the metadata into the CPU memory of the remote server.
Preferably, after receiving the intermediate result of the query, the remote server copies the intermediate result from the GPU memory to another GPU memory, and records the start address of the another GPU memory;
the remote server continues to receive the metadata of the query task, copies the metadata from the CPU memory to another CPU memory, and obtains the metadata information of the query task after deserialization; storing the recorded starting address of the GPU memory into metadata;
and the remote server executes the control flow logic of the query task on the CPU according to the metadata, continues to execute the subsequent substeps, copies the data depended by the substeps from the CPU memory to the GPU memory, and performs the matching operation of the data set on the GPU based on the intermediate result obtained previously.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the query task communication method based on GPUDirect RDMA, disclosed by the invention, the intermediate result generated when the query task is processed on the GPU can be directly sent to the GPU memory of the remote server from the local GPU memory, so that the copy times of data between the GPU memory and the CPU memory in the communication process are reduced, and further the expense of the whole communication process is reduced.
2. The invention decouples the sending of the data information (query intermediate result) and the control information (metadata) of the query task, the data information uses GPUDirect RDMA, the control information uses RDMA, and the data information and the control information are separately sent by using different communication channels, thereby avoiding the contention of network resources.
3. The query task communication method based on GPUDirect RDMA is suitable for a server cluster provided with a GPU supporting the GPUDirect RDMA technology and a network card, and avoids redundant data copy in the communication process, so that the processing delay of the query task can be reduced, and the performance of the whole query system can be improved.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a flow chart of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
As shown in fig. 1, a query task communication method provided by the present invention includes:
a registration step: and distributing and loading the data set on each server in the cluster, carrying out initialization work, and registering a GPU memory and a CPU memory on the servers for GPUDirect RDMA and RDMA respectively. These two memory regions are referred to as the "GPU RDMA buffer" and the "CPU RDMA buffer", respectively.
A query request step: sending the query request to a server in the cluster;
analyzing and inquiring: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata (control information) of a query task;
query processing steps: processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result;
if the data depended on by the next sub-step is in the remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task.
Specifically, in the query request step: and after receiving the query request, the server initializes the relevant data of the query task, and empties the intermediate result table to prepare for processing the query task.
In the step of resolving the query: the server analyzes the query request, the query request comprises a plurality of query statements, and the query request is decomposed into a plurality of sub-steps to be executed according to different query statements; before each substep is performed, the data dependent on the substep is copied from the CPU memory to the GPU memory, and then the processing logic of the substep is executed on the GPU.
In the query processing step: the server processes the query request from the first substep, and performs matching operation on the data set by using the query condition in the substep; the control flow logic of the query request is executed on a CPU, the matching operation on the data set is executed on a GPU, and a query intermediate result obtained by the matching operation is stored in a GPU memory; the data set is dispersedly stored in the whole cluster, and after a server receiving the query request locally executes a part of sub-steps, the server judges whether the data depended on by the next sub-step is local or not, and if so, the server continues to process the subsequent sub-steps; if not, then sending the intermediate result to the remote server, and executing the subsequent sub-steps by the remote server based on the obtained intermediate result;
the server sending the intermediate result of the query to the remote end comprises the following steps: taking the initial address of the GPU memory and the size of the query intermediate result as parameters, calling the unilateral operation of the RDMA network card, writing the query intermediate result into the GPU memory of the remote server, and querying the data information of the intermediate result belonging to the query task;
after the server sends the query intermediate result, metadata of a query task needs to be sent, the subsequent sub-steps of the query request are recorded in the metadata, and the remote server executes the subsequent sub-steps according to the metadata; the server serializes the metadata, copies the serialized metadata to a CPU memory, calls the single-side operation of the RDMA network card by taking the initial address of the buffer area, the size of the metadata and the like as parameters, and writes the metadata into the CPU memory of the remote server. The metadata of the query task includes, but is not limited to, the following information: 1) the size of the obtained query intermediate result; 2) a query substep resolved by the server; 3) a variable for storing GPU memory addresses.
After receiving the intermediate result of the query, the remote server copies the intermediate result from the GPU memory to another GPU memory and records the initial address of the other GPU memory;
the remote server continues to receive the metadata of the query task, copies the metadata from the CPU memory to another CPU memory, and obtains the metadata information of the query task after deserialization; storing the recorded starting address of the GPU memory into metadata;
and the remote server executes the control flow logic of the query task on the CPU according to the metadata, continues to execute the subsequent substeps, copies the data depended by the substeps from the CPU memory to the GPU memory, and performs the matching operation of the data set on the GPU based on the intermediate result obtained previously.
On the basis of the query task communication method, the invention further provides a query task communication system, which comprises the following steps:
a registration module: and distributing and loading the data set on each server in the cluster, and registering a GPU memory and a CPU memory on the servers for GPUDirect RDMA and RDMA respectively. These two memory regions are referred to as the "GPU RDMA buffer" and the "CPU RDMA buffer", respectively.
The query request module: sending the query request to a server in the cluster;
an analysis query module: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata of a query task;
the query processing module: processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result;
if the data depended on by the next sub-step is in the remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task.
Specifically, the query request module includes: initializing relevant data of the query task after the server receives the query request, emptying an intermediate result table and preparing for processing the query task; the analysis query module comprises: the server analyzes the query request, the query request comprises a plurality of query statements, and the query request is decomposed into a plurality of sub-steps to be executed according to different query statements; before each substep is performed, the data dependent on the substep is copied from the CPU memory to the GPU memory, and then the processing logic of the substep is executed on the GPU.
The query processing module comprises: the server processes the query request from the first substep, and performs matching operation on the data set by using the query condition in the substep; the control flow logic of the query request is executed on a CPU, the matching operation on the data set is executed on a GPU, and a query intermediate result obtained by the matching operation is stored in a GPU memory; the data set is dispersedly stored in the whole cluster, and after a server receiving the query request locally executes a part of sub-steps, the server judges whether the data depended on by the next sub-step is local or not, and if so, the server continues to process the subsequent sub-steps; if not, then sending the intermediate result to the remote server, and executing the subsequent sub-steps by the remote server based on the obtained intermediate result;
the server sending the intermediate result of the query to the remote end comprises the following steps: taking the initial address of the GPU memory and the size of the query intermediate result as parameters, calling the unilateral operation of the RDMA network card, writing the query intermediate result into the GPU memory of the remote server, and querying the data information of the intermediate result belonging to the query task;
after the server sends the query intermediate result, metadata of a query task needs to be sent, the subsequent sub-steps of the query request are recorded in the metadata, and the remote server executes the subsequent sub-steps according to the metadata; the server serializes the metadata, copies the serialized metadata to a CPU memory, calls the single-side operation of the RDMA network card by taking the initial address of the buffer area, the size of the metadata and the like as parameters, and writes the metadata into the CPU memory of the remote server.
After receiving the intermediate result of the query, the remote server copies the intermediate result from the GPU memory to another GPU memory and records the initial address of the other GPU memory;
the remote server continues to receive the metadata of the query task, copies the metadata from the CPU memory to another CPU memory, and obtains the metadata information of the query task after deserialization; storing the recorded starting address of the GPU memory into metadata;
and the remote server executes the control flow logic of the query task on the CPU according to the metadata, continues to execute the subsequent substeps, copies the data depended by the substeps from the CPU memory to the GPU memory, and performs the matching operation of the data set on the GPU based on the intermediate result obtained previously.
Further specifically, because the sending end separately sends the query intermediate result and the metadata, the receiving end needs to continue to receive the queried metadata after receiving the intermediate result, and compares whether the size of the query intermediate result recorded in the metadata is consistent with the size of the received intermediate result, so as to ensure that the integrity of the intermediate result and the metadata is not damaged in the network transmission process.
The query task communication method provided by the invention is realized based on the complete history record, the complete history record stores the intermediate result generated in each sub-step in the query task processing process, and the use of the complete history record has the advantage that the final result merging operation of the traditional single-step pruning method can be avoided, which is time-consuming because the single-step pruning method still has the result which does not meet the query condition after the query processing is finished, and finally all the results need to be concentrated on one server for final merging operation, which may become the performance bottleneck of the whole system.
The invention adopts a communication method based on GPUDirect RDMA instead of the traditional communication method, which has the following problems:
(1) without the support of GPUDirect RDMA technology, the data in the GPU memory is transferred between the servers, and multiple memory copy operations are needed, so that the response time of the query request is increased;
(2) data information and control information are transmitted together, and control and data streams are coupled together and contend for the same network resources, thereby reducing the performance of the transmitting end.
Compared with the traditional communication method, the GPUDirect RDMA-based communication method has the following advantages that:
1. the query intermediate result in the GPU memory of the server can be directly sent to the GPU memory of the remote server from the local GPU memory through a high performance network (RDMA), so that the data is prevented from being copied between the GPU memory and the CPU memory in the communication process, and the cost of the whole communication process is reduced;
2. the sending of data information (query intermediate result) and control information (metadata) of the query task is decoupled, the data information uses GPUDirect RDMA, the control information uses RDMA, and the contention of network resources is avoided.
Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (8)

1. A query task communication method, comprising:
a registration step: distributing and loading the data set on each server in the cluster, and registering a GPU memory and a CPU memory on the servers for GPUDirect RDMA and RDMA respectively;
a query request step: sending the query request to a server in the cluster;
analyzing and inquiring: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata of a query task;
query processing steps: processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result;
if the data depended on in the next substep is in a remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task;
the query processing step includes:
the server processes the query request from the first substep, and performs matching operation on the data set by using the query condition in the substep; the control flow logic of the query request is executed on a CPU, the matching operation on the data set is executed on a GPU, and a query intermediate result obtained by the matching operation is stored in a GPU memory; the data set is dispersedly stored in the whole cluster, and after a server receiving the query request locally executes a part of sub-steps, the server judges whether the data depended on by the next sub-step is local or not, and if so, the server continues to process the subsequent sub-steps; if not, then sending the intermediate result to the remote server, and executing the subsequent sub-steps by the remote server based on the obtained intermediate result;
the server sending the query intermediate result to the remote end comprises the following steps: taking the initial address of the GPU memory and the size of the query intermediate result as parameters, calling the unilateral operation of the RDMA network card, writing the query intermediate result into the GPU memory of the remote server, and querying the data information of the intermediate result belonging to the query task;
after the server sends the query intermediate result, metadata of a query task needs to be sent, the subsequent sub-steps of the query request are recorded in the metadata, and the remote server executes the subsequent sub-steps according to the metadata; the server serializes the metadata, copies the serialized metadata to a CPU memory, calls the single-side operation of the RDMA network card by taking the initial address of the buffer area and the size of the metadata as parameters, and writes the metadata into the CPU memory of the remote server.
2. The query task communication method according to claim 1, wherein the registering step includes:
and loading a data set on the servers in the cluster, carrying out initialization work, and respectively registering a GPU memory and a CPU memory in each server.
3. The query task communication method according to claim 1, wherein the query request step includes:
and after receiving the query request, the server initializes the relevant data of the query task, and empties the intermediate result table to prepare for processing the query task.
4. The query task communication method according to claim 1, wherein the step of parsing the query comprises:
the server analyzes the query request, the query request comprises a plurality of query statements, and the query request is decomposed into a plurality of sub-steps to be executed according to different query statements; before each substep is performed, the data dependent on the substep is copied from the CPU memory to the GPU memory, and then the processing logic of the substep is executed on the GPU.
5. The query task communication method according to claim 1, wherein the remote server copies the intermediate result from the GPU memory to another GPU memory after receiving the intermediate result of the query, and records a start address of the another GPU memory;
the remote server continues to receive the metadata of the query task, copies the metadata from the CPU memory to another CPU memory, and obtains the metadata information of the query task after deserialization; storing the recorded starting address of the GPU memory into metadata;
and the remote server executes the control flow logic of the query task on the CPU according to the metadata, continues to execute the subsequent substeps, copies the data depended by the substeps from the CPU memory to the GPU memory, and performs the matching operation of the data set on the GPU based on the intermediate result obtained previously.
6. A query task communication system, comprising:
a registration module: distributing and loading the data set on each server in the cluster, and registering a GPU memory and a CPU memory on the servers for GPUDirect RDMA and RDMA respectively;
the query request module: sending the query request to a server in the cluster;
an analysis query module: analyzing the query request at a server of the received query request, and decomposing a query statement in the query request into a plurality of sub-steps, wherein relevant information of the sub-steps belongs to metadata of a query task;
the query processing module: processing the query request step by step from the first substep of the plurality of substeps to obtain an intermediate query result;
if the data depended on in the next substep is in a remote server, respectively sending the query intermediate result and the metadata of the query task to the remote server in a GPUDirect RDMA and RDMA mode, and continuing to process the query request by the remote server according to the received query intermediate result and the metadata of the query task;
the query processing module comprises:
the server processes the query request from the first substep, and performs matching operation on the data set by using the query condition in the substep; the control flow logic of the query request is executed on a CPU, the matching operation on the data set is executed on a GPU, and a query intermediate result obtained by the matching operation is stored in a GPU memory; the data set is dispersedly stored in the whole cluster, and after a server receiving the query request locally executes a part of sub-steps, the server judges whether the data depended on by the next sub-step is local or not, and if so, the server continues to process the subsequent sub-steps; if not, then sending the intermediate result to the remote server, and executing the subsequent sub-steps by the remote server based on the obtained intermediate result;
the server sending the query intermediate result to the remote end comprises the following steps: taking the initial address of the GPU memory and the size of the query intermediate result as parameters, calling the unilateral operation of the RDMA network card, writing the query intermediate result into the GPU memory of the remote server, and querying the data information of the intermediate result belonging to the query task;
after the server sends the query intermediate result, metadata of a query task needs to be sent, the subsequent sub-steps of the query request are recorded in the metadata, and the remote server executes the subsequent sub-steps according to the metadata; the server serializes the metadata, copies the serialized metadata to a CPU memory, calls the single-side operation of the RDMA network card by taking the initial address of the buffer area and the size of the metadata as parameters, and writes the metadata into the CPU memory of the remote server.
7. The query task communication system of claim 6, wherein the registration module comprises: loading a data set on servers in a cluster, carrying out initialization work, and respectively registering a GPU memory and a CPU memory in each server; the query request module comprises: initializing relevant data of the query task after the server receives the query request, emptying an intermediate result table and preparing for processing the query task;
the parsing query module comprises: the server analyzes the query request, the query request comprises a plurality of query statements, and the query request is decomposed into a plurality of sub-steps to be executed according to different query statements; before each substep is performed, the data dependent on the substep is copied from the CPU memory to the GPU memory, and then the processing logic of the substep is executed on the GPU.
8. The query task communication system according to claim 6, wherein the remote server copies the intermediate result from the GPU memory to another GPU memory after receiving the intermediate result of the query, and records a start address of the another GPU memory;
the remote server continues to receive the metadata of the query task, copies the metadata from the CPU memory to another CPU memory, and obtains the metadata information of the query task after deserialization; storing the recorded starting address of the GPU memory into metadata;
and the remote server executes the control flow logic of the query task on the CPU according to the metadata, continues to execute the subsequent substeps, copies the data depended by the substeps from the CPU memory to the GPU memory, and performs the matching operation of the data set on the GPU based on the intermediate result obtained previously.
CN201810596030.1A 2018-06-11 2018-06-11 Query task communication method and system Active CN109062929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810596030.1A CN109062929B (en) 2018-06-11 2018-06-11 Query task communication method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810596030.1A CN109062929B (en) 2018-06-11 2018-06-11 Query task communication method and system

Publications (2)

Publication Number Publication Date
CN109062929A CN109062929A (en) 2018-12-21
CN109062929B true CN109062929B (en) 2020-11-06

Family

ID=64820127

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810596030.1A Active CN109062929B (en) 2018-06-11 2018-06-11 Query task communication method and system

Country Status (1)

Country Link
CN (1) CN109062929B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114741166A (en) * 2022-03-04 2022-07-12 阿里巴巴(中国)有限公司 Distributed task processing method, distributed system and first equipment

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550085A (en) * 2015-12-10 2016-05-04 浪潮电子信息产业股份有限公司 RDMA (remote direct memory Access) testing method based on GPUDerict
CN208013975U (en) * 2018-04-23 2018-10-26 苏州超集信息科技有限公司 The hardware device of on-line intelligence ability platform

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4414447B2 (en) * 2007-04-25 2010-02-10 株式会社ソニー・コンピュータエンタテインメント Information processing apparatus, information processing system, and information processing method
TW201044185A (en) * 2009-06-09 2010-12-16 Zillians Inc Virtual world simulation systems and methods utilizing parallel coprocessors, and computer program products thereof
US9514507B2 (en) * 2011-11-29 2016-12-06 Citrix Systems, Inc. Methods and systems for maintaining state in a virtual machine when disconnected from graphics hardware
KR101936950B1 (en) * 2016-02-15 2019-01-11 주식회사 맴레이 Computing device, data transfer method between coprocessor and non-volatile memory, and program including the same
CN108268208B (en) * 2016-12-30 2020-01-17 清华大学 RDMA (remote direct memory Access) -based distributed memory file system
CN108762915B (en) * 2018-04-19 2020-11-06 上海交通大学 Method for caching RDF data in GPU memory

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550085A (en) * 2015-12-10 2016-05-04 浪潮电子信息产业股份有限公司 RDMA (remote direct memory Access) testing method based on GPUDerict
CN208013975U (en) * 2018-04-23 2018-10-26 苏州超集信息科技有限公司 The hardware device of on-line intelligence ability platform

Also Published As

Publication number Publication date
CN109062929A (en) 2018-12-21

Similar Documents

Publication Publication Date Title
CN106844048B (en) Distributed memory sharing method and system based on hardware characteristics
US9563569B2 (en) Memory transformation in virtual machine live migration
EP2590090B1 (en) Dynamic interface to read database through remote procedure call
US20170337256A1 (en) System and method for memory synchronization of a multi-core system
CN112433812B (en) Virtual machine cross-cluster migration method, system, equipment and computer medium
CN110231977B (en) Database processing method and device, storage medium and electronic device
CN105373563B (en) Database switching method and device
CN116795647A (en) Method, device, equipment and medium for managing and scheduling heterogeneous resources of database
CN114547199A (en) Database increment synchronous response method and device and computer readable storage medium
CN109062929B (en) Query task communication method and system
CN113190528B (en) Parallel distributed big data architecture construction method and system
CN110955719B (en) Data access processing equipment, system and method
CN109388651B (en) Data processing method and device
CN109033184B (en) Data processing method and device
CN110069565B (en) Distributed database data batch processing method and device
CN112527900A (en) Method, device, equipment and medium for database multi-copy reading consistency
CN114661690A (en) Multi-version concurrency control and log clearing method, node, equipment and medium
US20210182191A1 (en) Free memory page hinting by virtual machines
CN111125108A (en) HBASE secondary index method, device and computer equipment based on Lucene
EP4390646A1 (en) Data processing method in distributed system, and related system
CN115982230A (en) Cross-data-source query method, system, equipment and storage medium of database
CN113268483B (en) Request processing method and device, electronic equipment and storage medium
CN112764897B (en) Task request processing method, device and system and computer readable storage medium
CN112527760A (en) Data storage method, device, server and medium
EP3678043A1 (en) Hybrid performance of electronic design automation (eda) procedures with delayed acquisition of remote resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant