CN113849478A

CN113849478A - Cloud native big data analysis engine

Info

Publication number: CN113849478A
Application number: CN202111018815.9A
Authority: CN
Inventors: 张颖峰; 颜文泽; 张旭
Original assignee: Moment Intelligence Force Shanghai Information Technology Co ltd
Current assignee: Moment Intelligence Force Shanghai Information Technology Co ltd
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-12-28

Abstract

The invention discloses a cloud native big data analysis engine, and particularly relates to the technical field of big data analysis, wherein the cloud native big data analysis engine comprises the following contents: the OLAP database is partitioned into different partitions according to the distribution of the data, and each partition is referred to as a Shard, each storing data files in the S3 object store. The invention provides a design realization of a novel OLAP analytical database, which is suitable for being deployed by utilizing public cloud infrastructure for enterprises, provides a cheap and high-speed OLAP analytical database solution, and is one of important components of the cloud native infrastructure of the enterprises; meanwhile, the invention is not only suitable for public cloud deployment, but also can provide a quick and cheap high-performance OLAP analysis engine in a privatized computer room, so that enterprise application adopting the invention as a basic component can be easily switched between different public clouds and private clouds.

Description

Cloud native big data analysis engine

Technical Field

The invention relates to the technical field of big data analysis, in particular to a cloud native big data analysis engine.

Background

The OLAP analytical database is one of the cornerstones of public cloud services, and enables an analyst to quickly, consistently and interactively insights data from various aspects so as to achieve the aim of deeply understanding the data. The OLAP database of the public cloud contains the following categories: moving an OLAP database with an open source to a public cloud, and providing DBaaS (database as a service) service; public cloud providers rely on their own power to develop OLAP databases specifically for cloud users. When different enterprises deploy applications on the public cloud, there are two different ways to select the OLAP database: one is to directly adopt the two types of hosted OLAP database services of the public cloud, and the other is to purchase the public cloud infrastructure, including computing and storage, and then self-deploy and maintain the open-source OLAP database.

The prior art has the following defects: in the process of migrating the enterprise to the cloud native architecture, various difficult choices are faced, which not only lie in the complexity and various limitations of the OLAP technology itself, but also lie in that resources that can be purchased by a public cloud also have particularity compared with privatization deployment: the basic computing resources that public clouds can purchase include virtual machines and physical hosts, the latter being expensive, the former typically providing only small local storage, and the big data-type virtual machines of public clouds typically mounting large local storage to a single machine, which makes it difficult to utilize more parallel computing resources. Thus, an open source OLAP hosted by a public cloud is selected: neither row-storage, column-storage, nor index-based schemes are inexpensive nor high-performance, they are not cost-effective to store and are not optimal in performance.

Disclosure of Invention

Therefore, the invention provides a cloud native big data analysis engine, which is suitable for being deployed by utilizing public cloud infrastructure for enterprises by providing a cheap and high-speed OLAP analysis type database solution through the design realization of a novel OLAP analysis type database, and is one of important components of the cloud native infrastructure of the enterprises; meanwhile, the invention is not only suitable for public cloud deployment, but also can provide a rapid and cheap high-performance OLAP analysis engine in a privatized computer room, so that enterprise application adopting the invention as a basic component can be easily switched between different public clouds and private clouds, thereby solving the problems of high price and low performance caused by selecting open source OLAP hosted by the public clouds in the prior art.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions: a cloud native big data analytics engine, comprising:

s1, dividing the OLAP database into different partitions according to the distribution of the data, and calling each partition as a Shard, wherein each Shard stores the data file in an S3 object storage;

s2, storing the record in a line memory format for each inserted record, and storing the record in an S3 object memory; generating a corresponding column memory for each recorded column, and storing the column memory in S3; allocating a self-increment integer ID for each record, and then constructing a corresponding inverted index for the vast majority of the records by adopting a Bitmap technology (BSI);

s3, storing for the bottom layer of the row memory and the Bitmap inverted index by adopting a common Key Value interface: an open source Key Value embedded engine Pebble is adopted to provide a Key Value interface; the bottom layer of the Pebbel is an SST file and can be correspondingly compressed on different levels; a separate bulk interface is implemented for the inventory: constructing a simple block index in the memory: recording the maximum and minimum values per fixed number of columns in an effort to avoid invalid IOs when scanning;

s4, providing a global WAL log service for shared access of all Shard, wherein when data is inserted, the data is firstly inserted into the global WAL and then respectively inserted into different Shard;

s5, providing full and full SQL support: the SQL execution layer is constructed based on relational algebra instead of Plan, and can be directly judged according to attributes: and if the attribute a has the inverted index, executing index plan, otherwise, storing the index plan, scanning the full table and filtering.

Further, the BSI index comprises a group of bitmaps, the group of bitmaps are converted into binary representations according to the selected column values, the binary representations are vertically cut into the bitmaps, and an inverted index is not constructed for the column selection of the character string with the higher base number.

Further, all Shard internal storage, including the Pebble storage for Key Value interface, and the column store proprietary storage, turn off the WAL log function.

Furthermore, each Shard is mounted to a file cache of the block storage, and different cache mechanisms are respectively provided for the line memory and the Pebble which the Bitmap reverse index depends on and the proprietary format of the column memory; aiming at the storage of the bubbles, the cached objects take SST files at the bottom layer of the bubbles as units; specifically, the SST files at Level 0 and Level 1 are stored in the file cache preferentially, and then pushed to the object of S3 for storage, and for the column storage, the conventional LRU cache mechanism with the unit of the file unit of the column storage is used.

Further, the global WAL log service is selected as an open source Pulsar, and the global WAL log service includes a block storage and a matching S3 object storage at the bottom layer.

Further, each Shard defaults to only starting one instance, and the working sequence is as follows: the new virtual machine loads a Shard-hosted object file from the shared S3 object store and continues to consume data from the global WAL log service, ensuring consistency of subsequent data.

Furthermore, the cloud native big data analysis engine supports data import of Schema Free, any JSON data is received as input, and the basic JSON type is interpreted as the basic SQL type String, Number, Boolean, and NULL of SQL is processed for NULL.

Further, the native big data analysis engine serves a public cloud environment and a private cloud environment.

The invention has the following advantages:

1. the invention is an OLAP analytical database relying on public cloud S3 object storage, in order to give consideration to cost and high performance, the invention adopts multiple data format storage, including row storage, column storage and inverted index, so that different types of query can be served by different formats, different advantages of row storage, column storage and index can be fully utilized to serve different types of query, the selection cost of a user is reduced as much as possible, and high-performance query service is provided for the user;

2. the invention solves the problem that the common inverted index is difficult to service the full SQL capability, provides high concurrent SQL filtering and SQL aggregation capability for numerical value types including floating point numbers by still adopting the inverted index, and leads the inverted index to work on the S3 object storage;

3. the invention provides a Schema Free inserting means for dealing with changeable service scenes on the basis of public cloud object storage, and by providing the Schema Free data storage capacity, a user does not need to define database fields and types in advance when inserting data, so that great convenience is brought to the user storage operation, and the database Schema does not need to be modified aiming at complicated and changeable service scenes; the invention not only provides the deployment capability on the public cloud object storage, but also provides the same query analysis service in a privatized environment, and the invention depends on the encapsulation of the existing Multi-Raft mechanism, so that the Multi-Raft mechanism can run in a single library mode, can simultaneously manage a conventional Key Value engine and customized column storage, and can uniformly encapsulate a mechanism for managing local storage and managing S3 object storage based on the Multi-Raft mechanism.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.

FIG. 1 is a basic framework provided by the present invention;

FIG. 2 is a binary diagram of BSI indexing according to the present invention;

fig. 3 is a schematic diagram of a cloud environment provided by the present invention.

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to the accompanying drawings 1-3 in the specification, the cloud-native big data analysis engine provided by the invention comprises the following contents:

the OLAP database is partitioned into different partitions according to the distribution of data, and each partition is called Shard. Each Shard stores the data files in the S3 object store. Meanwhile, in order to overcome the disadvantage of insufficient throughput of the S3 file system, each Shard mounts a block of storage for caching as the files stored in S3.

For each inserted record, the invention proceeds as follows:

1. the record is stored in a line memory format and saved to the object storage of S3.

2. And generating a corresponding column memory for each recorded column, and storing the column memory in S3.

3. Each record is assigned a self-incrementing integer ID, and then for most of the columns of records, the corresponding inverted index is generated. Specifically, a Bitmap technology is adopted to construct the inverted index. A Bitmap index, also stored in the S3 object store.

Therefore, in the invention, any data has three formats of row storage, column storage and Bitmap inverted index. Although the data may be expanded by at least 3 times compared with the original record, because the data is stored in the S3 object storage, the cost of the data is less than 1/10 compared with the local disk or block storage, and therefore the overall overhead is not increased but reduced much compared with other OLAP databases. In the SQL query plan, as long as the query column concerned has the Bitmap inverted index, the execution of SQL is completed through the Bitmap inverted index.

Not all columns are able to create an inverted index. The conventional inverted index can only be established for an enumerated character string, such as an elastic search. For numeric types, the latter employs a less efficient BKD Tree, which is less well-behaved for query filtering and cannot satisfy aggregated queries over numeric column types. The invention adopts BSI technology to establish a Bitmap inverted index. The BSI index contains a set of bitmaps that are converted to a binary representation based on the selected column values and sliced vertically into bitmaps, as shown in fig. 2. In the example of fig. 2, only 15 bitmaps are required to represent values from 0 to 30000, so that the row numbers can be conveniently set in the bitmaps, and thus a Bitmap inverted index is also established for the fields of the integer type. For floating-point number fields, the invention also employs BSI techniques, which only need to be considered separately for the radix and mantissa. The BSI index solves the disadvantage that the inverted index is difficult to establish aggregated queries and high-speed range filtering for numeric fields. However, for string columns, and with a high radix, there is no advantage in building a Bitmap index, a typical example being MD5 encrypted user string ID. The present invention may choose not to construct an inverted index for such scenarios.

In the invention, the row memory and the column memory can be closed through configuration. Since for OLAP databases, line memory is only rarely used, this is random point queries. Such queries are common to OLTP type databases. Because the user closes the line memory according to the requirement, the resources can be further saved. The column is not adopted in most cases, and generally, the column is not stored until the column related to the query has no inverted index or the analysis query on the column contains a Like type and needs to be scanned in a large range.

In the invention, the bottom layer of the line memory and the Bitmap inverted index adopts a common Key Value interface for storage. The Key of the inverted index is a specific Value of the column, and the Value is a row number list in which rows the specific Value exists. Because the Bitmap is a sparse data structure, the invention compresses the Bitmap inverted index by adopting a Roaring Bitmap format, thereby reducing the IO expense. In order to avoid the overhead caused by continuous updating of the reverse index, the reverse index is firstly constructed in the memory, and the memory is written into the memory for storage after the memory bucket is full. The invention provides a Key Value interface by adopting an open source Key Value embedded engine Pebble. The bottom layer of the Pebbel is an SST file, corresponding compression is carried out on different levels, the working mechanism of the Pebbel is similar to that of the popular RocksDB, WAL logs and multi-line transaction guarantee are not needed, and the throughput is several times higher than that of the RocksDB.

In the invention, the column memory does not depend on a Key Value interface, and the throughput of scanning records of the column memory can be greatly reduced, so that the invention realizes a single batch interface for the column memory, and is characterized in that the data of each column of each record is directly stored without any serialization overhead. Constructing a simple block index in the memory: the maximum and minimum values per fixed number of columns are recorded in an effort to avoid invalid IOs when scanning. The present invention's columnar storage format mirrors the stand-alone storage engine format of the open-source OLAP engine ClickHouse, as the latter is currently the fastest open-source OLAP database.

In the invention, each Shard is mounted to a file cache of a block storage, and different caching mechanisms are respectively provided aiming at the pebbles depended on by the line memory and the Bitmap inverted index and the proprietary format of the column memory. For the storage of the pebbles, cached objects take the SST files at the bottom of the pebbles as units. Specifically, the SST files at Level 0 and Level 1 are stored preferentially in the file cache and then stored in the push-to-S3 object. For the column store, the conventional LRU caching mechanism is used in units of column store file units.

In the invention, all Shard internal storages, including the Pebbel storage of a Key Value interface and the column storage proprietary storage, close the WAL log function, and aim to reduce IO overhead. The present invention additionally provides a global WAL log service that provides shared access to all Shard. When inserting data, the data is inserted into the global WAL first and then inserted into different Shard respectively. Therefore, even if the virtual machine where Shard is located fails, another virtual machine can be quickly reconstructed, and reconstructed data can be recovered from the global WAL.

The global WAL log service is selected as an open source Pulsar, and comprises a block storage and a matched S3 object storage at the bottom layer. The Pulsar is responsible for migrating cold data in the WAL to S3. The reason why popular Kafka was not selected is that the latter is not very friendly to cloud-originated and lacks the ability to automatically move data cold and hot.

In the present invention, each Shard defaults to only one instance, however, in some cases, additional instances of replicas may need to be launched for some shards in the following order: the new virtual machine loads a Shard-hosted object file from the shared S3 object store and continues to consume data from the global WAL log service, ensuring consistency of subsequent data.

A typical scenario for launching an additional instance of a copy is: if the throughput of the system reaches the limit, the throughput needs to be increased by adding copies. Thus, to make the process as rapid as possible, the present invention uses kubernets containers for the programming. The Shard instance of the present invention, which actually runs in a container, is not necessarily a virtual machine. The present invention will dynamically request container resources from the kubernets container cloud based on the load of the query.

The present invention provides full and complete SQL support. The SQL execution layer is constructed based on relational algebra rather than Plan.

The invention directly judges according to the attribute: and if the attribute a has the inverted index, executing index plan, otherwise, storing the index plan, scanning the full table and filtering.

The invention can be configured to support the import of Schema Free data, at the moment, arbitrary JSON data is accepted as input, and the basic JSON type can be interpreted as the basic SQL type String, Number, Boolean, and NULL processing as SQL. Object of JSON is treated as SQL sub-relation and Array exists as an Array, but the type of each element of the Array may be different. In the present invention, the types of the individual values of a particular attribute may be different, but each value has its own defined type, and in addition, a special type of sub-relationship is supported. For example, for attribute a, two values { "a": B "} and {" c ":1," a ": c" } are accepted, then for attribute a there is a sub-relationship whose attributes are a and c, which contains 2 tuples.

Although the invention is mainly used for public cloud environment, the invention can be deployed and used in private cloud environment. This includes two approaches: if a private cloud can provide an object storage interface that is compatible with S3, such as a Ceph distributed file system, the present invention can be deployed directly. If the private cloud provides only physical machines, the present invention will rely on its own storage abstraction to manage the underlying storage, with the specific definitions as shown in FIG. 3. Fig. 3 is a distributed storage library implemented based on Multi-Raft multiple group strongly coherent protocol. Some new databases, such as TiDB, employ similar mechanisms to provide underlying storage. The Multi-Raft mechanism is abstracted, separated from a bottom storage engine and decoupled from an upper application, so that the Multi-Raft mechanism is an embedded distributed storage management tool rather than an independently running process. Due to the compute and storage separation architecture of the present invention, all of the aforementioned designs, in a privatized environment, can run on top of the Multi-Raft framework described above. The Multi-Raft framework is responsible for managing Key Value storage engines of the pebbles and also managing proprietary column storage engines at the same time, and provides a Multi-copy mechanism and automatic load balancing of the storage engines.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A cloud-native big data analysis engine, characterized in that: the method comprises the following steps:

2. The cloud-native big data analysis engine of claim 1, wherein: the BSI index contains a set of bitmaps which are converted into binary representations according to the selected column values and are vertically cut into bitmaps, and an inverted index is not constructed for column selection of character strings with higher base numbers.

3. The cloud-native big data analysis engine of claim 1, wherein: all Shard internal storage, including the Pebble storage for Key Value interfaces, and the column store proprietary storage, turn off the WAL log function.

4. The cloud-native big data analysis engine of claim 1, wherein: each Shard is mounted to a file cache of the block storage, and different caching mechanisms are respectively provided for the line memory and the Pebble depending on the Bitmap inverted index and the proprietary format of the column memory; aiming at the storage of the bubbles, the cached objects take SST files at the bottom layer of the bubbles as units; specifically, the SST files at Level 0 and Level 1 are stored in the file cache preferentially, and then pushed to the object of S3 for storage, and for the column storage, the conventional LRU cache mechanism with the unit of the file unit of the column storage is used.

5. The cloud-native big data analysis engine of claim 1, wherein: the global WAL log service is selected as an open-source Pulsar, and the global WAL log service comprises a block storage and a matched S3 object storage at the bottom layer.

6. The cloud-native big data analysis engine of claim 1, wherein: each Shard defaults to only starting one instance, and the working sequence is as follows: the new virtual machine loads a Shard-hosted object file from the shared S3 object store and continues to consume data from the global WAL log service, ensuring consistency of subsequent data.

7. The cloud-native big data analysis engine of claim 1, wherein: the cloud native big data analysis engine supports data import of Schema Free, any JSON data is received as input at the moment, and the basic JSON type is interpreted as the basic SQL type String, Number and Boolean, and is treated as the NULL of SQL for NULL.

8. The cloud-native big data analysis engine of claim 1, wherein: the native big data analytics engine serves a public cloud environment and a private cloud environment.