Background
Presto is a distributed SQL query engine applied to the aspect of big data, all data processing and transmission are based on a memory and a network, the calculation process is completed in one go, no stage is divided, no intermediate temp stage exists, unnecessary I/O and delay overhead are avoided, and therefore the overall query efficiency is nearly 10 times higher than Hive.
Presto needs to split and load all metadata participating in computation into the memory of each compute node to complete computation during the computation, for example: querying, sorting, storing intermediate result sets, and the like. Presto supports the parallel execution of a plurality of jobs, so that the maximum value of the memory available for a single computation task on each computation node server needs to be set, the value is controlled by a parameter task.max-memory, and the maximum value of the parameter usually does not exceed 80% of the total memory size of the server, and the execution process is as shown in fig. 1.
The following problems are often encountered during Presto use:
1. the single server has less memory, a standard big data server usually configures 64GB or 128GB memory, the amount of stored data is usually about 10TB, the amount of data is much larger than the size of the memory,
2. under the condition of concurrent operation, the data volume participating in calculation is far larger than the memory size of the server,
3. the cost of directly expanding the server memory is high and is limited by the number of the server memory slots.
The prior art solutions to the above-mentioned problems have not been good solutions, and new solutions are needed to meet the requirements of large data volumes.
Detailed Description
The following description of exemplary embodiments of the invention, including various details of the embodiments of the invention to facilitate understanding, should be construed as merely illustrative. Accordingly, those skilled in the art will recognize that various modifications and changes may be made to the embodiments described herein without departing from the scope and spirit of the present invention.
Generally, Presto data processing requires that the memory size can hold the size of the whole data volume participating in calculation, otherwise, calculation is very slow or memory errors occur, which results in failure of calculation tasks, and meanwhile, the concurrency of Presto is not high due to too much dependence on the memory size.
Based on the shortcomings of the prior art, we propose a method for adding temporary TABLE spaces to each computing node of Presto to be used together with an internal memory, wherein the temporary TABLE spaces are mainly used for sorting operations and for storing temporary objects such as temporary TABLEs, intermediate sorting result sets and the like, and operations originally in the internal memory such as CREATE TABLE, SELECT DISTINCT, ORDER BY, GROUP BY, UNIONALL, MINUS, SORT-MERGE JOINS, HASH JOIN and the like can be used for the temporary TABLE spaces. The method solves the problem of insufficient memory capacity, and simultaneously improves the cluster performance and the concurrency capability. Meanwhile, the method does not need additional hardware investment and is simple to operate.
Fig. 2 shows a flowchart of a method 200 for mixed use of memory and temporary table space in a Presto compute node, according to an embodiment of the invention.
At step 210, the calculation data is transmitted to the Presto calculation node. At step 220, the memory of each compute node is computed. Then, in step 230 and step 240, it is determined whether the free memory of the compute node is larger than the required memory and whether the required memory is smaller than task. If step 230 determines "no" or step 240 determines "no," indicating that the memory of the compute node is not sufficient, then the process may proceed to step 250 to use the temporary table space. If the determinations in step 230 and step 240 are both "yes," indicating that the compute node memory is sufficient, then the process proceeds to step 280 to continue the computation, and then the process ends. In step 260 after step 250, determining whether the temporary table space is sufficient, if the temporary table space is sufficient, proceeding to step 280 to continue the calculation, and then ending; if the temporary table space is not sufficient, then proceed to step 270, where execution is very slow or in error, and then end.
In one embodiment, the Presto source code may be modified to identify the temporary table space. Specifically, the Presto temporary table space has the following characteristics:
the size of the temporary table space is maximum 32TB and does not exceed the size of the total capacity of the hard disk of the server.
After the process is executed, the temporary table space can automatically release data, the release is only marked as free and can be reused, and the disk space actually occupied is not really released.
The temporary table space uses a greedy algorithm, and the occupied storage space is only increased and not reduced.
When the temporary table space is created, a background process is automatically started to detect the effectiveness of the temporary table space, and when the temporary table space is deleted, the background process is simultaneously deleted.
The temporary table space stores intermediate results of the large-scale sort operation and the hash operation. It differs from the permanent tablespace in that it is composed of temporary data files, rather than permanent data files. The temporary tablespace does not store objects of a permanent type, so it does not require two extra copies. (like Hadoop Distributed File System (HDFS) generally has two copies)
When creating a temporary tablespace or adding a temporary data file to a temporary tablespace, the addition process is quite fast even if the temporary data file is large. This is because temporary data files are a special class of data files: sparse files, which will only write the file header and last block information when the temporary tablespace file is created. Its space is allocated late. This is why it is fast to create or add data files to the temporary tablespace.
In one embodiment, a temporary tablespace is managed. Specifically, managing the temporary tablespace includes: creating Presto temporary table space, adding data file, deleting data file, and modifying data file size. Syntax and examples are given below, respectively.
Create Presto temporary Table space:
grammar: CREATE TEMPORARY TABLESSPACE TABLESPACE _ name TEMPFILE
datefile_spec 1[,datefile_spec2]SIZE integer[k]DATANODE ALL
AUTOEXTEND OFF;
Example (c):
CREATE TEMPORARY TABLESPACE PRESTO-TMP TEMPFILE
′/u01/presto/predata/TMP01.dbf SIZE 8G DATANODE ALL AUTOEXTEND
OFF;
grammar: ALTER TABLESSPACE TABLESPACE _ name ADD TEMPFILE
datefile_spec 1[,datefile_spec2]SIZE integer[k]DATANODE ALL;
Example (c):
ALTER TABLESPACE PRESTO-TMP ADD TEMPFILE
′/u01/presto/predata/TMP02.dbf SIZE 8G DATANODE ALL;
grammar: ALTER TABLESSPACE TABLESPACE _ name DROP TEMPFILE
datefile_spec 1[,datefile_spec2]DATANODE ALL;
Example (c):
ALTER TABLESPACE PRESTO-TMP DROP TEMPFILE
′/u01/presto/predata/TMP02.dbf DATANODE ALL;
increasing the size of the data file:
grammar: ALTER PRESTO TEMPFILE datafile _ spec 1RESIZE integer [ k ]
DATANODE ALL;
Example (c):
ALTER PRESTO TEMPFILE′/u01/presto/predata/TMP02.dbf
RESIZE 16G DATANODE ALL;
fig. 3 shows a flowchart of a method 300 for mixed use of memory and temporary table space in a Presto compute node, according to an embodiment of the invention.
The method 300 includes: step 310, transmitting the calculation data to a Presto calculation node; and step 320, if the required memory exceeds the free memory of the Presto computing node or exceeds the maximum memory allowed to be used on the Presto computing node by a single computing task, using the temporary table space.
Fig. 4 shows an apparatus 400 for mixed usage of memory and temporary table space in Presto according to an embodiment of the present invention, including: a transmission module 410 configured to transmit the computation data to a Presto compute node; and a temporary tablespace module 420 configured to use the temporary tablespace if the required memory exceeds the free memory of the Presto computing node or exceeds the maximum memory allowed to be used on the Presto computing node by a single computing task.
It is to be noted that the foregoing is only illustrative of the preferred embodiments and principles of the present invention. Those skilled in the art will appreciate that the present invention is not limited to the specific embodiments described herein. Numerous obvious variations, adaptations and substitutions will occur to those skilled in the art without departing from the scope of the invention. The scope of the invention is defined by the appended claims.