CN116226095A - Memory calculation separation system of shared-architecture-free database - Google Patents

Memory calculation separation system of shared-architecture-free database Download PDF

Info

Publication number
CN116226095A
CN116226095A CN202310508386.6A CN202310508386A CN116226095A CN 116226095 A CN116226095 A CN 116226095A CN 202310508386 A CN202310508386 A CN 202310508386A CN 116226095 A CN116226095 A CN 116226095A
Authority
CN
China
Prior art keywords
module
persistent data
query
node
execution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310508386.6A
Other languages
Chinese (zh)
Inventor
江大白
胡增
汪刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Applied Technology Co Ltd
Original Assignee
China Applied Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Applied Technology Co Ltd filed Critical China Applied Technology Co Ltd
Priority to CN202310508386.6A priority Critical patent/CN116226095A/en
Publication of CN116226095A publication Critical patent/CN116226095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/172Caching, prefetching or hoarding of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a memory calculation separation system of a shared-architecture-free database, which comprises a cloud service control module, an elastic calculation module, an elastic local temporary storage module, a persistent data storage module and a query execution module, wherein the cloud service control module is used for receiving a database of a shared architecture; the cloud service control module is used for arranging the centralized service of end-to-end query execution; the elastic computing module is used for accessing computing resources in the cloud service system by utilizing the abstraction of the virtual warehouse; the elastic local temporary storage module is used for constructing a distributed temporary storage system and providing service for intermediate data by utilizing the distributed temporary storage system; the persistent data storage module is used for storing persistent data; and the query execution module is used for generating required execution tasks through the cloud server and scheduling the required execution tasks in nodes of the virtual warehouse. The present invention uses frequently read non-persistent data caches to reduce network traffic and improve data locality.

Description

Memory calculation separation system of shared-architecture-free database
Technical Field
The invention relates to the technical field of resource scheduling, in particular to a memory calculation separation system of a shared-architecture-free database.
Background
Conventional database systems are designed to handle repeated queries for data having predictable amounts and rates, e.g., data from within an organization: transaction systems, enterprise resource planning applications, customer relationship management applications, and the like. Today, more and more data comes from uncontrolled external sources (e.g., application logs, social media, web applications, and mobile systems, etc.), resulting in temporary, time-varying, and unpredictable query workloads. For such workloads, shared-nothing architecture can result in high cost, inflexibility, low performance, and inefficiency, affecting production application and cluster deployment.
The shared-nothing architecture has been the basis of traditional query execution engines and database systems, providing the underlying data storage services for today's cloud service platforms, and over the past few years databases of shared-nothing architecture have evolved to serve thousands of clients, performing millions of queries on pb-level data per day. In such an architecture, persistent data (e.g., customer data stored in table form) is partitioned into a set of computing nodes, each of which is responsible for only its local data. This shared-nothing architecture enables the query execution engine to scale well, providing cross-job isolation and good data locality, providing high performance services for various workloads. However, these benefits come at the cost of several major drawbacks:
1. hardware does not match workload: the shared-nothing architecture makes it difficult to achieve a perfect balance between the CPU, memory, storage and bandwidth resources provided by the compute nodes and the resources required by the workload. For example, a node configuration is well suited for bandwidth-intensive, lightweight compute bulk loading, but may not be well suited for compute-intensive, lightweight complex queries. However, many customers wish to run queries in a hybrid manner without having to set up separate clusters for each query type. Therefore, in order to achieve performance goals, resources must often be oversupplied; this results in an average resource under-utilization and higher traffic costs.
2. Lack of elasticity: even if the hardware resources on the compute nodes match workload demands, the data partitioning inherent in static parallelism and (inelastic) shared-nothing architectures limits the adaptation to data tilting and time-varying workloads. For example, customer-run queries have extremely skewed intermediate data sizes, the data sizes vary by more than 5 orders of magnitude, and the CPU requirements vary by as much as one order of magnitude within the same hour. Furthermore, shared-nothing architecture does not have effective resilience; the usual method of adding or deleting nodes to scale back resources requires the reallocation of large amounts of data, which not only increases network bandwidth requirements, but also results in significant performance degradation, especially when the entire cluster is still serving out-of-service.
For the problems in the related art, no effective solution has been proposed at present.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a system for separating the memory of a database without a shared architecture, which aims to overcome the technical problems in the prior art.
For this purpose, the invention adopts the following specific technical scheme:
the system comprises a cloud service control module, an elastic calculation module, an elastic local temporary storage module, a persistent data storage module and a query execution module;
the cloud service control module is used for arranging the centralized service of end-to-end query execution;
the elastic computing module is used for accessing computing resources in the cloud service system by utilizing the abstraction of the virtual warehouse, and providing services for clients by using the preheated node pool;
the elastic local temporary storage module is used for constructing a distributed temporary storage system and providing service for intermediate data by utilizing the distributed temporary storage system;
the persistent data storage module is used for storing persistent data;
and the query execution module is used for generating required execution tasks through the cloud server and scheduling the required execution tasks in nodes of the virtual warehouse.
Further, the centralized services include access control, query optimization and planning, scheduling, transaction management, and concurrency control.
Further, the persistent data is represented as client data, and is stored in a database in the form of a table;
the intermediate data is generated by a query operator.
Further, the elastic computing module comprises a resource access module and a node pool preheating module;
the resource access module is used for enabling a user to access computing resources in the cloud service system through abstraction of the virtual warehouse;
the node Chi Yure module is configured to provide services to clients using the preheated node pool and calculate elasticity on a fine-grained time scale.
Further, the persistent data storage module comprises a file dividing module, an attribute compression module and a stored file query execution module;
the file dividing module is used for horizontally dividing the storage table into storage files;
the attribute compression module is used for grouping and compressing the values of the individual attributes or columns in the storage file;
the stored file inquiry executing module is used for acquiring the head stored file, and reading and inquiring the columns required by executing by utilizing the offset of the columns in the head stored file.
Further, the generating the required execution task through the cloud server and scheduling the required execution task in the node of the virtual warehouse includes the following steps:
the client submits the query text to the cloud server;
the cloud server receives the query text and executes query analysis, query planning and optimization operations to generate required execution tasks;
scheduling the execution task on a node of the virtual warehouse, and executing read-write operation on the distributed temporary storage system and the persistent data storage module;
the query progress is tracked in real time through the cloud server, and the nodes are detected in real time by utilizing the collection performance counter;
rearranging the inquiry on the nodes of the virtual warehouse after detecting the node faults, and acquiring inquiry results;
and returning the query result to the virtual warehouse and returning the query result from the virtual warehouse to the client.
Further, the query execution module comprises a local perception task scheduling module and an unbalanced partition balancing strategy module;
the local perception task scheduling module is used for co-locating the execution task and the persistent data file and caching the persistent data file in the distributed temporary storage system by utilizing a local perception scheduling mechanism;
the unbalanced partition balancing strategy module is used for optimizing the nodes and improving load balancing.
Further, the local perception task scheduling module comprises a persistent data file distribution module and an execution task scheduling module;
the persistent data file distribution module is used for distributing the persistent data files through hash table file names and calculating nodes by using the distributed persistent data files;
the execution task scheduling module is used for scheduling the execution task operating on the persistent data file to the node to which the persistent data file hashes.
Further, the unbalanced partition balancing strategy module comprises a node distribution module and an optimal point acquisition module;
the node allocation module is used for allocating tasks from another node when the execution tasks of the nodes do not meet the expected completion time, and reading persistent data files required by the execution tasks from the persistent data storage module;
the optimal point acquisition module is used for acquiring an optimal node between two extremes by using a scheduler and scheduling a single execution task to the optimal node.
Further, the two extremes include co-locating the execution tasks with the cached persistent data file and locating all of the execution tasks in a single node.
The beneficial effects of the invention are as follows:
1. the invention can decouple the calculation and the persistent storage to realize elasticity, decompose the calculation storage through the distributed temporary storage system, and reduce network traffic and improve data locality by using frequently read non-persistent data caches.
2. The invention provides service for clients by using the preheated node pool, realizes the calculation elasticity on the fine-granularity time scale, so that the cloud calculation pricing with the granularity per hour is cost-effective, realizes the efficient utilization of the CPU, and not only realizes the efficient remote storage but also does not consume the CPU excessively.
3. The invention can realize the centralized service of end-to-end query execution through the cloud service end, and realize the operations of access control, query optimization and planning, scheduling, transaction management, concurrency control and the like through the cloud service end, designs and realizes the cloud service end as multi-user and long life cycle service, has enough replication to realize high availability and scalability, so that the failure of a single service node can not cause the loss of state or availability.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a functional block diagram of a memory separation system for a shared-nothing architecture database in accordance with an embodiment of the present invention;
FIG. 2 is an overall architecture diagram of a memory separation system without a shared architecture database according to an embodiment of the invention.
In the figure:
1. a cloud service control module; 2. an elasticity calculation module; 3. an elastic local temporary storage module; 4. a persistent data storage module; 5. and a query execution module.
Detailed Description
For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used for illustrating the embodiments and for explaining the principles of the operation of the embodiments in conjunction with the description thereof, and with reference to these matters, it will be apparent to those skilled in the art to which the present invention pertains that other possible embodiments and advantages of the present invention may be practiced.
According to an embodiment of the invention, a memory and computation separation system of a shared-nothing architecture database is provided.
The present invention will be further described with reference to the accompanying drawings and the specific embodiments, as shown in fig. 1, a system for separating a memory of a database without a shared architecture according to an embodiment of the present invention, where the system for separating a memory includes a cloud service control module 1, an elastic computing module 2, an elastic local temporary storage module 3, a persistent data storage module 4, and a query execution module 5.
Specifically, in the cloud service environment, storage and calculation coupling can cause a great deal of resource waste and other expenditure, the system performance and processing speed are reduced, the scalability of the system is enhanced by using a design of separating a storage unit from a calculation unit, a user can independently increase a database server to improve the processing capacity, and the storage server is increased to expand the database capacity; the fault tolerance of the system is enhanced, under the separated architecture, single-point faults of any link can be prevented through redundant configuration, and the continuous service capability of the database system is enhanced.
The cloud service control module 1 is used for arranging the centralized service executed by the end-to-end query.
Wherein the centralized services include access control, query optimization and planning, scheduling, transaction management, and concurrency control.
Specifically, as shown in fig. 2, centralized control is performed by the Cloud Service, and all users interact with a centralized layer named as a Cloud Service (CS) and submit queries, and the centralized layer is responsible for access control, query optimization and planning, scheduling, transaction management, concurrency control, and the like. Cloud services are designed and implemented as multi-user and long life cycle services with enough replication to achieve high availability and scalability. Thus, failure of a single serving node does not result in loss of state or availability, although some queries may fail and be re-executed unconsciously.
The elastic computing module 2 is configured to access computing resources in the cloud service system by using an abstraction of the virtual repository, and provide services for clients by using the preheated node pool.
The elastic computing module 2 comprises a resource access module and a node pool preheating module.
The resource access module is used for enabling the user to access the computing resources in the cloud service system through the abstraction of the virtual warehouse.
Specifically, a virtual repository (VM) refers to the abstraction of allocation of physical resources into individual virtual machine implementations, which are running on a cloud service system. The node Chi Yure module is configured to provide services to clients using the preheated node pool and calculate elasticity on a fine-grained time scale.
Specifically, a user accesses computing resources in a cloud service system through an abstraction of a Virtual Warehouse (VW). Each virtual warehouse is essentially a set of AWS EC2 instances, customer queries are performed in a distributed fashion, customers pay for computing time based on the virtual warehouse size, and each virtual warehouse can be elastically scaled according to the customer's requirements. To support elasticity on a fine-grained time scale (e.g., tens of seconds), a pre-heated EC2 instance pool is provided, which includes the steps of:
after receiving the request, only the EC2 instance needs to be added to or deleted from the virtual repository (in the case of addition, most of the requests can be supported directly from the pre-heated instance pool, thereby avoiding starting a new EC2 instance, reducing response time). Each virtual warehouse may run multiple concurrent queries, in fact, many clients run multiple virtual warehouses (e.g., one for data ingestion and one for executing OLAP queries).
Specifically, the AWS EC2 instance is a cloud service provided by amazon, EC2 is a machine configuration model in the cloud service, and after a request is received, only the EC2 instance needs to be added to or deleted from the virtual repository (in the case of addition, most of the requests can be supported directly from the warm-up instance pool, so as to avoid instance start-up time). Each virtual warehouse may run multiple concurrent queries. Elasticity can thus be calculated on a fine-grained time scale.
The elastic local temporary storage module 3 is used for constructing a distributed temporary storage system and providing service for the intermediate data by utilizing the distributed temporary storage system.
In particular, intermediate data has different performance requirements than persistent data, which are not met by existing persistent data stores (e.g., S3 does not provide the low latency and high throughput attributes required for intermediate data to ensure minimal blocking of computing nodes); a distributed temporary storage system is thus established to meet the needs of intermediate data, which is co-located with the computing nodes of the database and is specifically designed to automatically expand as nodes are added or deleted. And with the addition and deletion of the nodes, the distributed temporary storage system does not need to re-partition or re-combine the data, so that the core limit of the shared-nothing architecture is reduced, and each virtual warehouse runs an independent distributed temporary storage system and is only used for queries running on a specific virtual warehouse.
The persistent data storage module 4 is used for storing persistent data.
The persistent data storage module comprises a file dividing module, an attribute compression module and a stored file query execution module.
The file dividing module is used for horizontally dividing the storage table into storage files.
The attribute compression module is used for grouping and compressing the values of the individual attributes or columns in the storage file.
Specifically, the values of the individual attributes or columns in the storage file are grouped and compressed through user-defined rules.
The stored file inquiry executing module is used for acquiring the head stored file, and reading and inquiring the columns required by executing by utilizing the offset of the columns in the head stored file.
Specifically, all persistent data is stored in a remote, resolved persistent data store. S3 is a cloud service provided by amazon for storing customer data in persistent data stores, which is stored in S3 despite the relatively low latency and throughput performance of S3 due to the resiliency, high availability and persistence attributes of S3. S3, supporting the storage of immutable files, wherein the files can only be completely covered, and even the additional operation is not allowed. However, S3 supports read requests for file portions. In order to store the table in S3, it is horizontally divided into large, immutable files, which correspond to blocks in a conventional database system. In each file, the values of each individual attribute or column are grouped together and compressed, each file having a header that stores the offset for each column in the file, so that the partial read function of S3 can be used to read only the columns required for query execution. In particular, all virtual warehouses belonging to the same customer can access the same shared table through remote persistent storage, thus eliminating the need to physically copy data from one virtual warehouse to another.
The query execution module 5 is configured to generate a required execution task through the cloud server, and schedule the required execution task in a node of the virtual warehouse.
The method for generating the required execution tasks through the cloud server and scheduling the required execution tasks in the nodes of the virtual warehouse comprises the following steps:
and the client submits the query text to the cloud server.
And the cloud server receives the query text and executes query analysis, query planning and optimization operations to generate required execution tasks.
The execution tasks are scheduled on nodes of the virtual warehouse and read-write operations are performed on the distributed temporary storage system and the persistent data storage module 4.
And tracking the query progress in real time through the cloud server, and detecting the nodes in real time by utilizing the collection performance counter.
Specifically, the cloud server refers to a server program deployed on a cloud server, and can provide functions such as internet-based service and data storage, and the cloud server continuously tracks the progress of each query, collects performance counters, and rearranges the queries on computing nodes in a virtual warehouse after detecting a node failure.
After detecting the node failure, the queries on the nodes of the virtual warehouse are rearranged and query results are obtained.
And returning the query result to the virtual warehouse and returning the query result from the virtual warehouse to the client.
The query execution module 5 comprises a local perception task scheduling module and an unbalanced partition balancing strategy module.
Specifically, to fully utilize the distributed temporary storage system, each task is co-located with persistent data profiles, and nodes are computed using a local aware scheduling mechanism (recall that the profiles may be cached in the temporary storage system) using consistent hash table filenames to allocate the persistent data files. Thus, for a fixed virtual warehouse size, each persistent data file is cached on a particular node and tasks that operate on the persistent data file are scheduled to the node to which the file is always hashed.
In particular, the result of the above-described scheduling scheme is that query parallelism is tightly coupled with consistent hashes of files on nodes—one query is scheduled to be cache-local and can be distributed across all nodes in the virtual warehouse.
Consider, for example, a client having persistent data of 100 tens of thousands of files and running a virtual warehouse with 10 nodes. Consider two queries, the first running on 100 files and the second operating on 100000 files; then, two queries are likely to run on all 10 nodes, since the file is hashed consistently on all 10 nodes. The order of magnitude of the read and write persistence bytes varies almost independently of the number of nodes in the virtual warehouse, with intermediate data exchanged over the network increasing as the number of nodes used increases.
The local perception task scheduling module is used for co-locating the execution task and the persistent data file and caching the persistent data file in the distributed temporary storage system by utilizing a local perception scheduling mechanism.
The unbalanced partition balancing strategy module is used for optimizing the nodes and improving load balancing.
The local perception task scheduling module comprises a persistent data file distribution module and an execution task scheduling module.
The persistent data file distribution module is used for distributing the persistent data files through hash table file names and calculating nodes by using the distributed persistent data files.
The execution task scheduling module is used for scheduling the execution task operating on the persistent data file to the node to which the persistent data file hashes.
The unbalanced partition balancing strategy module comprises a node distribution module and an optimal point acquisition module.
The node allocation module is configured to allocate a task from another node when the execution task of the node does not meet the expected completion time, and read a persistent data file required for executing the task from the persistent data storage module 4.
The optimal point acquisition module is used for acquiring an optimal node between two extremes by using a scheduler and scheduling a single execution task to the optimal node.
Wherein the two extremes include co-locating the execution tasks with the cached persistent data file and locating all of the execution tasks in a single node.
In particular, consistent hashing results in unbalanced partitioning, and in order to avoid node overload and improve load balancing, an unbalanced partition balancing strategy is used, which is a simple optimization that allows a node to distribute tasks from another node if the task expectation completion time (sum of execution time and latency) of a new node is low. When this occurs, the persistent data files needed to perform the task will be read from the remote persistent data store, rather than from the node that was originally scheduled to perform the task. This may avoid increasing the load on the already overloaded node that was originally scheduled to perform the task (which may only occur if the node is overloaded).
The scheduler may put tasks on the nodes using two extreme options: one is to co-locate tasks with cached persistent data, which may schedule all queries on all nodes in the virtual warehouse; while such a scheduling policy may minimize network traffic for reading persistent data, it may result in increased network traffic for intermediate data exchanges. The other extreme is to put all tasks on a single node. This would avoid the need for network transmission for intermediate data exchanges, but would increase network traffic for persistent data reads. Both of these extremes may not be the correct choice for all queries. The co-designed query scheduler will select the correct set of nodes to obtain an optimal point between the two extremes and then schedule a single task onto these nodes.
Wherein the persistent data is represented as client data and stored in a database in the form of a table; the intermediate data is generated by a query operator.
Specifically, the persistent data is client data, which is stored in a database in the form of a table. Each table can be read by multiple queries, even simultaneously, over time, and therefore the storage time of these tables is long, requiring strong persistence and availability guarantees.
The intermediate data is generated by a query operator (e.g., a connection), and is typically used by nodes that are involved in executing the query, so that the effective time of the intermediate data is relatively short. Furthermore, to avoid nodes being blocked during intermediate data access, low-latency high-throughput access to intermediate data is guaranteed in preference to strong persistence, and if a failure occurs during the (short) lifecycle of the intermediate data, the failed query part can simply be re-run.
Metadata is represented as corresponding files, statistics, transaction logs, locks, etc. mapped from object directories or database tables into persistent storage.
In summary, by means of the above technical solution of the present invention, the present invention can decouple computation and persistent storage to achieve elasticity, decompose computation and storage through a distributed temporary storage system, and use frequently read non-persistent data caches to reduce network traffic and improve data locality; according to the invention, the preheated node pool is used for providing services for clients, so that the calculation elasticity on the fine-granularity time scale is realized, the cloud calculation pricing with the granularity per hour is cost-effective, the efficient utilization of the CPU is realized, and the efficient remote storage is realized without excessive consumption of the CPU; the invention can realize the centralized service of end-to-end query execution through the cloud service end, and realize the operations of access control, query optimization and planning, scheduling, transaction management, concurrency control and the like through the cloud service end, designs and realizes the cloud service end as multi-user and long life cycle service, has enough replication to realize high availability and scalability, so that the failure of a single service node can not cause the loss of state or availability.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims (10)

1. The system is characterized by comprising a cloud service control module, an elastic calculation module, an elastic local temporary storage module, a persistent data storage module and a query execution module;
the cloud service control module is used for arranging the centralized service of end-to-end query execution;
the elastic computing module is used for accessing computing resources in the cloud service system by utilizing the abstraction of the virtual warehouse, and providing services for clients by using the preheated node pool;
the elastic local temporary storage module is used for constructing a distributed temporary storage system and providing service for intermediate data by utilizing the distributed temporary storage system;
the persistent data storage module is used for storing persistent data;
and the query execution module is used for generating required execution tasks through the cloud server and scheduling the required execution tasks in nodes of the virtual warehouse.
2. The system of claim 1, wherein the centralized services include access control, query optimization and planning, scheduling, transaction management, and concurrency control.
3. A system for the segregation of memory in a shared-architecture-free database according to claim 2, wherein the persistent data is represented as customer data and stored in the database in the form of tables;
the intermediate data is generated by a query operator.
4. A system for computing and separating a shared-architecture-free database as recited in claim 3, wherein said elastic computing module comprises a resource access module and a node pool pre-heating module;
the resource access module is used for enabling a user to access computing resources in the cloud service system through abstraction of the virtual warehouse;
the node Chi Yure module is configured to provide services to clients using the preheated node pool and calculate elasticity on a fine-grained time scale.
5. The system of claim 4, wherein the persistent data storage module comprises a file partitioning module, an attribute compression module, and a stored file query execution module;
the file dividing module is used for horizontally dividing the storage table into storage files;
the attribute compression module is used for grouping and compressing the values of the individual attributes or columns in the storage file;
the stored file inquiry executing module is used for acquiring the head stored file, and reading and inquiring the columns required by executing by utilizing the offset of the columns in the head stored file.
6. The system for separating storage of a database without shared architecture according to claim 5, wherein the generating the required execution task by the cloud server and scheduling the required execution task in the node of the virtual warehouse comprises the steps of:
the client submits the query text to the cloud server;
the cloud server receives the query text and executes query analysis, query planning and optimization operations to generate required execution tasks;
scheduling the execution task on a node of the virtual warehouse, and executing read-write operation on the distributed temporary storage system and the persistent data storage module;
the query progress is tracked in real time through the cloud server, and the nodes are detected in real time by utilizing the collection performance counter;
rearranging the inquiry on the nodes of the virtual warehouse after detecting the node faults, and acquiring inquiry results;
and returning the query result to the virtual warehouse and returning the query result from the virtual warehouse to the client.
7. The system of claim 6, wherein the query execution module comprises a local aware task scheduling module and an unbalanced partition balancing policy module;
the local perception task scheduling module is used for co-locating the execution task and the persistent data file and caching the persistent data file in the distributed temporary storage system by utilizing a local perception scheduling mechanism;
the unbalanced partition balancing strategy module is used for optimizing the nodes and improving load balancing.
8. The system of claim 7, wherein the local aware task scheduling module comprises a persistent data file allocation module and an execution task scheduling module;
the persistent data file distribution module is used for distributing the persistent data files through hash table file names and calculating nodes by using the distributed persistent data files;
the execution task scheduling module is used for scheduling the execution task operating on the persistent data file to the node to which the persistent data file hashes.
9. The system of claim 8, wherein the unbalanced partition balancing policy module comprises a node allocation module and a best point acquisition module;
the node allocation module is used for allocating tasks from another node when the execution tasks of the nodes do not meet the expected completion time, and reading persistent data files required by the execution tasks from the persistent data storage module;
the optimal point acquisition module is used for acquiring an optimal node between two extremes by using a scheduler and scheduling a single execution task to the optimal node.
10. The system of claim 9, wherein the two extremes include co-locating execution tasks with cached persistent data files and locating all execution tasks in a single node.
CN202310508386.6A 2023-05-08 2023-05-08 Memory calculation separation system of shared-architecture-free database Pending CN116226095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310508386.6A CN116226095A (en) 2023-05-08 2023-05-08 Memory calculation separation system of shared-architecture-free database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310508386.6A CN116226095A (en) 2023-05-08 2023-05-08 Memory calculation separation system of shared-architecture-free database

Publications (1)

Publication Number Publication Date
CN116226095A true CN116226095A (en) 2023-06-06

Family

ID=86584706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310508386.6A Pending CN116226095A (en) 2023-05-08 2023-05-08 Memory calculation separation system of shared-architecture-free database

Country Status (1)

Country Link
CN (1) CN116226095A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150234896A1 (en) * 2014-02-19 2015-08-20 Snowflake Computing Inc. Adaptive distribution method for hash operations
CN109075994A (en) * 2016-04-28 2018-12-21 斯诺弗雷克计算公司 More depot complexes
CN109923533A (en) * 2016-11-10 2019-06-21 华为技术有限公司 It will calculate and separate with storage to improve elasticity in the database
US20200026695A1 (en) * 2018-07-17 2020-01-23 Snowflake Inc. Incremental Clustering Of Database Tables
CN113261000A (en) * 2019-11-27 2021-08-13 斯诺弗雷克公司 Dynamic shared data object masking

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150234896A1 (en) * 2014-02-19 2015-08-20 Snowflake Computing Inc. Adaptive distribution method for hash operations
CN109075994A (en) * 2016-04-28 2018-12-21 斯诺弗雷克计算公司 More depot complexes
CN109923533A (en) * 2016-11-10 2019-06-21 华为技术有限公司 It will calculate and separate with storage to improve elasticity in the database
US20200026695A1 (en) * 2018-07-17 2020-01-23 Snowflake Inc. Incremental Clustering Of Database Tables
CN113261000A (en) * 2019-11-27 2021-08-13 斯诺弗雷克公司 Dynamic shared data object masking

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ACEVOLVE: "Snowflake、Delta Lake 两大新型数仓对比分析", Retrieved from the Internet <URL:https://zhuanlan.zhihu.com/p/350958074> *
数据一哥: "Snowflake & Delta Lake两大新型数仓对比分析", pages 1 - 14, Retrieved from the Internet <URL:https://blog.csdn.net/Arvinzr/article/details/121143514> *

Similar Documents

Publication Publication Date Title
Vuppalapati et al. Building an elastic query engine on disaggregated storage
CN112534396B (en) Diary watch in database system
US20230385262A1 (en) System And Method For Large-Scale Data Processing Using An Application-Independent Framework
KR102441299B1 (en) Batch data collection into database system
Xue et al. Seraph: an efficient, low-cost system for concurrent graph processing
Fernandez et al. Liquid: Unifying Nearline and Offline Big Data Integration.
CN102495857B (en) Load balancing method for distributed database
Deka A survey of cloud database systems
CN107329982A (en) A kind of big data parallel calculating method stored based on distributed column and system
US20060277155A1 (en) Virtual solution architecture for computer data systems
US9774676B2 (en) Storing and moving data in a distributed storage system
JP2005196602A (en) System configuration changing method in unshared type database management system
Esteves et al. Quality-of-service for consistency of data geo-replication in cloud computing
US11698886B2 (en) Cluster instance balancing of a database system across zones
US9697220B2 (en) System and method for supporting elastic data metadata compression in a distributed data grid
US11080207B2 (en) Caching framework for big-data engines in the cloud
Chandra et al. A study on cloud database
EP3818453A1 (en) System for optimizing storage replication in a distributed data analysis system using historical data access patterns
US11609910B1 (en) Automatically refreshing materialized views according to performance benefit
Poess et al. Large scale data warehouses on grid: Oracle database 10 g and HP proliant servers
CN116226095A (en) Memory calculation separation system of shared-architecture-free database
Schall et al. Energy and Performance-Can a Wimpy-Node Cluster Challenge a Brawny Server?
Babu et al. Dynamic colocation algorithm for Hadoop
Louis Rodríguez et al. Workload management for dynamic partitioning schemes in replicated databases
US11966368B2 (en) Cluster balancing for zones of a database system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination