CN105550284A - Method and device for mixed use of memory and temporary table space at Presto computational node - Google Patents

Method and device for mixed use of memory and temporary table space at Presto computational node Download PDF

Info

Publication number
CN105550284A
CN105550284A CN201510917282.6A CN201510917282A CN105550284A CN 105550284 A CN105550284 A CN 105550284A CN 201510917282 A CN201510917282 A CN 201510917282A CN 105550284 A CN105550284 A CN 105550284A
Authority
CN
China
Prior art keywords
temporary table
table space
presto
memory
computing node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510917282.6A
Other languages
Chinese (zh)
Other versions
CN105550284B (en
Inventor
戴东东
吕信
郭李明
袁安峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Jianfu Chain Management Co.,Ltd.
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201510917282.6A priority Critical patent/CN105550284B/en
Publication of CN105550284A publication Critical patent/CN105550284A/en
Application granted granted Critical
Publication of CN105550284B publication Critical patent/CN105550284B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1847File system types specifically adapted to static storage, e.g. adapted to flash memory or SSD
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Abstract

The invention provides a method for the mixed use of a memory and a temporary table space at a Presto computational node. The method comprises the following steps: transmitting computational data to the Presto computational node; and if the required memory exceeds the free memory of the Presto computational node or exceeds the maximum memory allowed to be used by a single computational task on the Presto computational node, using the temporary table space. According to the method, a temporary table space function is added in a Presto application; and by using the method for the mixed use of the memory and the temporary table space, the problem of insufficient cluster memory is effectively solved; and the method is simple and easy to use, so that the cluster query performance and concurrency can be greatly enhanced. The invention furthermore provides a corresponding device.

Description

At the method and apparatus in Presto computing node internal memory used in combination and temporary table space
Technical field
The present invention relates to computer technology, be specifically related to the method and apparatus at Presto computing node internal memory used in combination and temporary table space in large market demand.
Background technology
Presto is a kind of distributed SQL query engine being applied to large data aspect, all data processings and transmission are all based on internal memory and network, computation process is accomplished without any letup, not stage by stage, there is no the middle temp stage, avoid unnecessary I/O and postpone expense, therefore overall search efficiency exceeds nearly 10 times than Hive.
Presto, in computation process, needs all metadata participating in calculating to split and be loaded in the internal memory of each computing node to complete calculating, such as: inquire about, sort, deposit intermediate result collection etc.Presto supports that multiple job parallelism performs, therefore need to set single calculation task operable internal memory maximal value on each computing node server, this value is controlled by parametric t ask.max-memory, this parameter maximal value is no more than 80% of the total memory size of server under normal circumstances, and its implementation as shown in Figure 1.
Following problem is often run in Presto use procedure:
1, the internal memory of single server is less, and the large data server of standard configures the internal memory of 64GB or 128GB usually, and the data volume stored is usually at about 10TB, and data volume is far longer than the size of internal memory,
2, when there being concurrent operations, the memory size of data volume much larger than server of calculating is participated in,
3, directly the cost of dilatation server memory is higher, and is subject to the number restriction of server memory slot.
The problems referred to above prior art, without good solution, therefore needs new technical scheme to meet the requirement of big data quantity.
Summary of the invention
In view of this, the present invention proposes a kind of method of internal memory used in combination and temporary table space in Presto, comprising: calculating data are transferred to Presto computing node; If required memory more than Presto computing node free memory or exceed single calculation task allow on this Presto computing node use maximum memory, use temporary table space.
The invention allows for a kind of device of internal memory used in combination and temporary table space in Presto, comprising: transport module, be configured to calculating data to be transferred to Presto computing node; Temporary table space module, if be configured to required memory more than the free memory of Presto computing node or exceed single calculation task allow on this Presto computing node use maximum memory, use temporary table space.
The present invention adds temporary table space function in Presto application, and the method using internal memory and temporary table spatial mixing to use, efficiently solves the problem of cluster memory deficiency, and be simple and easy to use, can significant increase cluster query performance and concurrency.
Accompanying drawing explanation
Fig. 1 shows the process flow diagram of the Presto computing node only using internal memory in prior art.
Fig. 2 shows the process flow diagram of the method in Presto computing node internal memory used in combination and temporary table space according to the embodiment of the present invention.
Fig. 3 shows the process flow diagram of the method in internal memory used in combination and temporary table space in Presto computing node according to the embodiment of the present invention.
Fig. 4 shows the device in internal memory used in combination and temporary table space in Presto according to the embodiment of the present invention.
Embodiment
Below exemplary embodiment of the present invention is explained, comprising the embodiment of the present invention various details with contribute to understand, they should be thought it is only exemplary.Therefore, one of ordinary skill in the art appreciates that and can make various amendment and change to embodiment described herein, and do not depart from the scope of the present invention and spirit.
In general, Presto data handling requirements memory size can hold participate in calculate total data amount size, otherwise, very slowly or EMS memory error can be there is in calculating, cause calculation task failure, too rely on memory size simultaneously, also cause the concurrency of Presto not high.
Based on the shortcoming that prior art exists, we have proposed the method mixed together in each computing node increase temporary table space of Presto and internal memory, temporary table space is mainly used to do sorting operation and for storing the temporary object such as temporary table, middle ranking results collection, as the operation originally in internal memory such as CREATETABLE, SELECTDISTINCT, ORDERBY, GROUPBY, UNIONALL, MINUS, SORT-MERGEJOINS, HASHJOIN can use temporary table space.This method solve the problem of memory size deficiency, also improve cluster performance and concurrent capability simultaneously.Meanwhile, the method drops into without the need to extra hardware, operates also fairly simple.
Fig. 2 shows the process flow diagram of the method 200 in internal memory used in combination and temporary table space in Presto computing node according to the embodiment of the present invention.
In step 210, calculating data are transferred to Presto computing node.In step 220, calculate the internal memory of each computing node.Then judge whether the free memory of computing node is greater than required memory and whether required memory is less than task.max.memory (namely single calculation task operable internal memory maximal value on each computing node server) respectively in step 230 and step 240.If step 230 is judged as that "No" or step 240 are judged as "No", this shows the Out of Memory of computing node, at this moment may be advanced to step 250, uses temporary table space.If step 230 and step 240 judge all as "Yes", show that computing node internal memory is enough, at this moment proceed to step 280 and continue to calculate, then terminate.In the step 260 after step 250, judge that whether temporary table space is enough, if temporary space is enough, then proceeds to step 280 and continue to calculate, then terminate; If temporary table insufficient space, then proceed to step 270, perform very slowly or make mistakes, then terminating.
In one embodiment, Presto source code can be revised, identify temporary table space.Particularly, there is following feature in Presto temporary table space:
The size in temporary table space is 32TB to the maximum, and is no more than the size of server hard disc total volume.
After process is complete, between temporary table, sky can discharge data automatically, and release here is just labeled as the free time, can reuses, and the disk space that essence takies in fact does not really discharge.
Temporary table space uses greedy algorithm, and shared storage space only increases, and does not reduce.
When creating temporary table space, automatically enable the validity that a background process detects temporary table space, when behind deletion temporary table space, background process is deleted simultaneously.
Temporary table space stores the intermediate result of extensive sorting operation and Hash operation.It is what it was made up of ephemeral data file with permanent table space difference, instead of permanent data file.Temporary table space can not store the object of permanent type, so it does not need extra two copies.(as Hadoop distributed file system (HDFS), having two copies)
When creating temporary table space or temporary table space interpolation ephemeral data file, even if ephemeral data file is very large, adding procedure is also quite fast.This is because the ephemeral data file data file that to be a class special: sparse file, when temporary table space file creates, its only can writing in files head and last block message.Distribution is delayed in its space.Here it is is creating temporary table space or is adding the very fast reason of data file to temporary table space.
In one embodiment, temporary table space is managed.Particularly, management temporary table space comprises: create Presto temporary table space, increase data file, delete data file, the size of Update Table file.Below grammer and example is provided respectively.
create Presto temporary table space:
Grammer: CREATETEMPORARYTABLESPACEtablespace_nameTEMPFILE
datefile_spec1[,datefile_spec2]SIZEinteger[k]DATANODEALL
AUTOEXTENDOFF;
Example:
CREATETEMPORARYTABLESPACEPRESTO-TMPTEMPFILE
′/u01/presto/predata/TMP01.dbfSIZE8GDATANODEALLAUTOEXTEND
OFF;
increase data file:
Grammer: ALTERTABLESPACEtablespace_nameADDTEMPFILE
datefile_spec1[,datefile_spec2]SIZEinteger[k]DATANODEALL;
Example:
ALTERTABLESPACEPRESTO-TMPADDTEMPFILE
′/u01/presto/predata/TMP02.dbfSIZE8GDATANODEALL;
delete data file:
Grammer: ALTERTABLESPACEtablespace_nameDROPTEMPFILE
datefile_spec1[,datefile_spec2]DATANODEALL;
Example:
ALTERTABLESPACEPRESTO-TMPDROPTEMPFILE
′/u01/presto/predata/TMP02.dbfDATANODEALL;
increase data file size:
Grammer: ALTERPRESTOTEMPFILEdatefile_spec1RESIZEinteger [k]
DATANODEALL;
Example:
ALTERPRESTOTEMPFILE′/u01/presto/predata/TMP02.dbf
RESIZE16GDATANODEALL;
Fig. 3 shows the process flow diagram of the method 300 in internal memory used in combination and temporary table space in Presto computing node according to the embodiment of the present invention.
Method 300 comprises: step 310, and calculating data are transferred to Presto computing node; And step 320, if required memory more than Presto computing node free memory or exceed single calculation task allow on this Presto computing node use maximum memory, use temporary table space.
Fig. 4 shows the device 400 in internal memory used in combination and temporary table space in Presto according to the embodiment of the present invention, comprising: transport module 410, is configured to calculating data to be transferred to Presto computing node; And temporary table space module 420, if be configured to required memory more than the free memory of Presto computing node or exceed single calculation task allow on this Presto computing node use maximum memory, use temporary table space.
It should be noted that and these are only preferred embodiment of the present invention and principle.Person of skill in the art will appreciate that, the invention is not restricted to specific embodiment here.Those skilled in the art can make various significant change, adjustment and substitute, and does not depart from protection scope of the present invention.Scope of the present invention is defined by claims.

Claims (8)

1. the method in internal memory used in combination and temporary table space in Presto, comprising:
Calculating data are transferred to Presto computing node;
If required memory more than Presto computing node free memory or exceed single calculation task allow on this Presto computing node use maximum memory, use temporary table space.
2. method according to claim 1, wherein said temporary table space is used for sorting operation, Hash operation, storage temporary table or stores middle sequence collection.
3. method according to claim 1, wherein uses temporary table space to comprise the size creating Presto temporary table space, increase data file, delete data file and Update Table file.
4. method according to claim 3, enables a background process automatically to detect the validity in temporary table space when wherein creating temporary table space, when behind deletion temporary table space, deletes corresponding background process.
5. the device in internal memory used in combination and temporary table space in Presto, comprising:
Transport module, is configured to calculating data to be transferred to Presto computing node;
Temporary table space module, if be configured to required memory more than the free memory of Presto computing node or exceed single calculation task allow on this Presto computing node use maximum memory, use temporary table space.
6. device according to claim 5, wherein said temporary table space is used for sorting operation, Hash operation, storage temporary table or stores middle sequence collection.
7. device according to claim 5, wherein said temporary table space module is also configured to the size creating Presto temporary table space, increase data file, delete data file and Update Table file.
8. device according to claim 7, wherein said temporary table space module is also configured to, automatically enabling a background process to detect the validity in temporary table space when creating temporary table space, when behind deletion temporary table space, deleting corresponding background process.
CN201510917282.6A 2015-12-10 2015-12-10 Method and device for mixed use of memory and temporary table space in Presto computing node Active CN105550284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510917282.6A CN105550284B (en) 2015-12-10 2015-12-10 Method and device for mixed use of memory and temporary table space in Presto computing node

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510917282.6A CN105550284B (en) 2015-12-10 2015-12-10 Method and device for mixed use of memory and temporary table space in Presto computing node

Publications (2)

Publication Number Publication Date
CN105550284A true CN105550284A (en) 2016-05-04
CN105550284B CN105550284B (en) 2020-03-27

Family

ID=55829473

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510917282.6A Active CN105550284B (en) 2015-12-10 2015-12-10 Method and device for mixed use of memory and temporary table space in Presto computing node

Country Status (1)

Country Link
CN (1) CN105550284B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408580A (en) * 2018-10-31 2019-03-01 北京百分点信息科技有限公司 A kind of SQL compilation device and method across data source

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111958A (en) * 2013-04-22 2014-10-22 中国移动通信集团山东有限公司 Data query method and device
CN104965861A (en) * 2015-06-03 2015-10-07 上海新炬网络信息技术有限公司 Monitoring device for data access

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104111958A (en) * 2013-04-22 2014-10-22 中国移动通信集团山东有限公司 Data query method and device
CN104965861A (en) * 2015-06-03 2015-10-07 上海新炬网络信息技术有限公司 Monitoring device for data access

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
LEX LIAN: "Presto:Facebook的分布式SQL查询引擎", 《伯乐在线》 *
RENFENGJUN: "临时表空间(Temporary Tablespace)相关", 《CSDN博客》 *
胡欣杰: "《Oracle 9i数据库管理员指南 v9.0.1》", 30 April 2002 *
许宁: "临时表在数据库中的应用", 《沈阳化工学院学报》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408580A (en) * 2018-10-31 2019-03-01 北京百分点信息科技有限公司 A kind of SQL compilation device and method across data source

Also Published As

Publication number Publication date
CN105550284B (en) 2020-03-27

Similar Documents

Publication Publication Date Title
US20200356551A1 (en) Batch data ingestion
US10635543B2 (en) Multi stream deduplicated backup of collaboration server data
US9934263B1 (en) Big-fast data connector between in-memory database system and data warehouse system
US11010103B2 (en) Distributed batch processing of non-uniform data objects
US20170249246A1 (en) Deduplication and garbage collection across logical databases
KR102006513B1 (en) Application consistent snapshots of a shared volume
US9852220B1 (en) Distributed workflow management system
CN102831120A (en) Data processing method and system
US20220035786A1 (en) Distributed database management system with dynamically split b-tree indexes
US9836516B2 (en) Parallel scanners for log based replication
CN105069111A (en) Similarity based data-block-grade data duplication removal method for cloud storage
CN105404679A (en) Data processing method and apparatus
US10929100B2 (en) Mitigating causality discrepancies caused by stale versioning
CN102508902A (en) Block size variable data blocking method for cloud storage system
US10152493B1 (en) Dynamic ephemeral point-in-time snapshots for consistent reads to HDFS clients
CN103970875A (en) Parallel repeated data deleting method
TW201738781A (en) Method and device for joining tables
WO2022247316A1 (en) Storage object processing system, request processing method, gateway, and storage medium
WO2016101759A1 (en) Data routing method, data management device and distributed storage system
US8543581B2 (en) Synchronizing records between databases
US20220171791A1 (en) Replicating large statements with low latency
US10135924B2 (en) Computing erasure metadata and data layout prior to storage using a processing platform
US10521449B1 (en) Cross-region replication architecture
CN109992575B (en) Distributed storage system for big data
CN105550284A (en) Method and device for mixed use of memory and temporary table space at Presto computational node

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201113

Address after: No.8-6, Putou South Road, Haicang District, Xiamen City, Fujian Province

Patentee after: Xiamen xinjianfu e-commerce Co., Ltd

Address before: 100080 Beijing city Haidian District xingshikou Road No. 65 building 11C Creative Park West West west Shan East 1-4 layer 1-4 layer

Patentee before: BEIJING JINGDONG SHANGKE INFORMATION TECHNOLOGY Co.,Ltd.

Patentee before: BEIJING JINGDONG CENTURY TRADING Co.,Ltd.

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20210508

Address after: 361000 No.6, 8 Putou South Road, Haicang District, Xiamen City, Fujian Province

Patentee after: Xiamen Jianfu Chain Management Co.,Ltd.

Address before: No.8-6, Putou South Road, Haicang District, Xiamen City, Fujian Province 361022

Patentee before: Xiamen xinjianfu e-commerce Co., Ltd