WO2024114409A1 - 数据处理方法和数据处理系统 - Google Patents

数据处理方法和数据处理系统 Download PDF

Info

Publication number
WO2024114409A1
WO2024114409A1 PCT/CN2023/132294 CN2023132294W WO2024114409A1 WO 2024114409 A1 WO2024114409 A1 WO 2024114409A1 CN 2023132294 W CN2023132294 W CN 2023132294W WO 2024114409 A1 WO2024114409 A1 WO 2024114409A1
Authority
WO
WIPO (PCT)
Prior art keywords
computing
group
data
computing group
memory table
Prior art date
Application number
PCT/CN2023/132294
Other languages
English (en)
French (fr)
Inventor
王奇
贾扬清
姜伟华
蒋光然
周彪
朱展延
杨源秦
Original Assignee
杭州阿里云飞天信息技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州阿里云飞天信息技术有限公司 filed Critical 杭州阿里云飞天信息技术有限公司
Publication of WO2024114409A1 publication Critical patent/WO2024114409A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2219Large Object storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Definitions

  • the present disclosure relates to a data processing method and system, and in particular to the reading, writing, analysis and processing of massive data.
  • a technical problem to be solved by the present disclosure is to provide a data processing method and a data processing system, which can conveniently and flexibly implement isolation of computing resources when computing resources for various scenarios or tasks share data.
  • a data processing system including: multiple computing groups, computing resources of each computing group are isolated from each other, wherein a computing group with a write function maintains a first memory table, the computing group with a write function is configured to write data to be written into a data storage device into the first memory table, and write the data in the first memory table into a physical table in the data storage device corresponding to the first memory table; and each computing group also respectively maintains at least one second memory table, each second memory table respectively corresponds to a first memory table in other computing groups with a write function, and the computing group is configured to synchronize the second memory table with the corresponding first memory table.
  • the data processing system may also include: a gateway for allocating task requests to computing groups corresponding to the task requests; and/or a metadata storage device for managing metadata of physical tables in the data storage device and providing metadata services for multiple computing groups, wherein the multiple computing groups share metadata; and/or a data storage device for storing physical tables.
  • a gateway for allocating task requests to computing groups corresponding to the task requests
  • a metadata storage device for managing metadata of physical tables in the data storage device and providing metadata services for multiple computing groups, wherein the multiple computing groups share metadata
  • a data storage device for storing physical tables.
  • the metadata storage is also used to manage computing group configuration information.
  • the system also includes a data engine controller. Used to respond to the number of user instructions or task requests, perform at least one of the following operations: create a computing group and store the computing group configuration information in a metadata storage; enable a new computing group based on the computing group configuration information in the metadata storage to perform corresponding data processing; suspend the computing group so that it no longer performs data processing; destroy the computing group and release the computing resources allocated to the computing group; adjust the computing resources allocated to the computing group; adjust the computing resources allocated to each computing subgroup in the computing group; and increase or decrease computing subgroups in the computing group, wherein the computing resources of each computing subgroup are isolated from each other and maintain a first memory table and/or a second memory table respectively.
  • the physical table stores data in a row-column coexistence manner, and the row data and column data corresponding to the same data are written into the same physical table through one write task to ensure the atomicity of the row data write operation and the column data write operation corresponding to the same data.
  • the computing group includes at least one of the following: a computing group for offline writing; a computing group for real-time writing; a computing group for providing data query services; and a computing group for providing data analysis services.
  • a computing group with a write function maintains multiple first memory tables corresponding to multiple physical tables on the data storage device; each computing group maintains a corresponding number of second memory tables to correspond to the multiple first memory tables in other computing groups with a write function.
  • the task request directed to the unavailable computing group is converted to direct to the computing group that performs the same task, or based on the computing group configuration information of the unavailable computing group in the metadata storage, a new computing group is enabled to execute the corresponding task request.
  • a new computing subgroup is created in the computing group and/or a task request directed to the unavailable computing subgroup is switched to another computing subgroup in the computing group.
  • the task request directed to the unavailable computing node is switched to other computing nodes in the computing group, and the original first memory table and/or second memory table on the unavailable computing node is reconstructed on the other computing nodes.
  • a data processing method includes: providing multiple computing groups, and the computing resources of each computing group are isolated from each other; the computing group with a write function maintains a first memory table; the computing group with a write function writes the data to be written into the data storage into the first memory table, and writes the data in the first memory table into a physical table in the data storage corresponding to the first memory table; each computing group also maintains at least one second memory table respectively, and each second memory table corresponds to other computing groups with a write function; and synchronizing the second memory table with the first memory table in the computing group corresponding to it.
  • a computing device comprising: a processor; and a memory on which executable codes are stored, and when the executable codes are executed by the processor, the processor executes the method described in the first aspect above.
  • a computer program product comprising an executable code, and when the executable code is executed by a processor of an electronic device, the processor is caused to execute the method as described in the first aspect above.
  • a non-transitory machine-readable storage medium on which executable code is stored.
  • the executable code is executed by a processor of an electronic device, the processor executes the above-mentioned first aspect. method.
  • the data processing system of the present disclosure can conveniently and flexibly implement isolation of computing resources when computing groups used for various scenarios or tasks share data.
  • FIG1 schematically shows the architecture of a data processing system according to the present disclosure.
  • FIG. 2 schematically shows a flow chart of a data processing method according to the present disclosure.
  • FIG3 shows a schematic diagram of the structure of a computing device that can be used to implement the above data processing method according to an embodiment of the present invention.
  • the present disclosure proposes a solution for isolating computing resources, which decomposes computing resources into different computing groups.
  • a computing group can also be called a "virtual warehouse”.
  • Data and metadata can be shared between computing groups, and the physical files on the data storage can be fully reused.
  • a computing group can be composed of a group of computing nodes in a data warehouse system architecture, for example. Users can flexibly customize the size of each computing group as needed. It provides users with core capabilities such as resource isolation, elasticity, multi-active computing, and high availability.
  • Users can apply for multiple computing groups, and each computing group shares the same data. Users can expand the number and configuration of computing groups as needed, but only need to store and operate one copy of the data.
  • FIG1 schematically shows the architecture of a data processing system according to the present disclosure.
  • the data processing system architecture can be used as a data warehouse (ie, data warehouse) architecture.
  • the system architecture can support real-time writing and real-time analysis and processing, and therefore can also be referred to as a "real-time data warehouse”.
  • the system architecture can set up multiple computing groups according to user needs, applications or configurations, allocate computing resources to these computing groups, and isolate the computing resources of each computing group from each other.
  • the computing resources of a computing group may be divided into multiple computing nodes, or “workers”.
  • the computing resources of a computing group can also be measured based on the number of CPUs (or “cores”) allocated to it, such as 4 cores, 16 cores, 32 cores, etc.
  • a computing group may include multiple container groups (pods), and a container group may be used as a computing node or a worker.
  • a container group may be set to have 16 cores as a basic computing resource group, and the number of container groups in each computing group may be set as needed.
  • the computing resources in the computing group can be further divided into multiple computing subgroups, or called “clusters", and the computing resources of each computing subgroup are isolated from each other.
  • multiple container groups in a computing group may be divided into multiple computing subgroups.
  • each computing subgroup in the computing group includes multiple container groups.
  • computing subgroups can have resources equivalent to those of general computing groups, such as computing nodes (containers), CPUs, memory, etc., and computing resources are isolated from each other, just like computing groups.
  • Computing subgroups belonging to the same computing group can have the same configuration and process task requests of the same type or scenario.
  • the system can expand or reduce the number of computing subgroups in a computing group as needed to adapt to changes in task request traffic.
  • computing groups can be elastically expanded independently, for example, computing resources can be elastically allocated and computing groups can be created on demand.
  • the configuration of the computing group can be set according to the user's needs and/or task request traffic.
  • User queries or write tasks with different loads can be run on different computing groups, and the computing resources between the computing groups are independent of each other (that is, isolated).
  • the types of tasks performed by each computing group can also be flexibly configured by the user. By configuring different computing groups to perform different types of tasks, or tasks involving different scenarios or different applications, computing resource isolation between tasks of different types, scenarios or applications can be achieved.
  • some computing groups may be dedicated to performing write operations and have the function of writing data to a data storage device, such as the write computing group 1 for performing offline write tasks and the write computing group 2 for performing real-time write tasks shown in FIG. 1 .
  • some computing groups may perform read operations and have the function of reading data from a data storage device, such as the service computing group shown in FIG. 1 .
  • some computing groups may be used to perform data analysis operations, and may have the function of reading data from a data storage device, and in some cases, may also have the function of writing data such as analysis results to a data storage device, such as in FIG. Analysis calculation group 1 and analysis calculation group 2 are shown.
  • write-write isolation can be achieved, such as isolation (separation) between different real-time write tasks, isolation (separation) between different offline write tasks, and isolation (separation) between real-time write tasks and offline write tasks.
  • read-write isolation can be achieved, such as read-write isolation between write tasks such as real-time writing and offline writing and read tasks such as service analysis, ad hoc analysis, and online analysis.
  • read-read isolation can be achieved, such as read-read isolation between different read tasks such as service analysis, ad hoc analysis, and online analysis.
  • application scenario isolation can be achieved.
  • a user's instance may be used by multiple work departments, and multiple work departments may perform data analysis and processing in different ways, with different logics and different focuses.
  • the finance department may process from the perspective of financial data analysis
  • the marketing department may analyze and process from the perspective of product sales. Therefore, the data processing tasks of multiple work departments can be isolated according to different computing groups, and different work departments correspond to different computing groups, so as to achieve complete resource isolation between work departments and avoid related impacts between application scenarios or work fields.
  • each work department can further perform write-write isolation, read-write isolation, read-read isolation, etc., thereby achieving a very flexible variety of resource isolation methods.
  • a user can correspond to an enterprise, and the departments within the enterprise can share enterprise data.
  • a data processing system architecture instance created by the user can also be used by multiple enterprises that can share data.
  • different computing groups can be assigned to each enterprise to isolate the computing resources that perform data processing tasks for each enterprise.
  • the first memory table may also be referred to as a "master table” or a “leader table”.
  • each computing group also maintains at least one second memory table, each second memory table corresponds to a first memory table in other computing groups with write function, and each computing group is configured to synchronize the second memory table it maintains with the first memory table corresponding to the second memory table.
  • the second memory table may also be referred to as a "slave table” or a “follower table”.
  • a computing group with a write function may maintain multiple first memory tables, corresponding to multiple physical tables on the data storage.
  • Multiple tables involved in a computing group may be referred to as a "table group”, and multiple memory tables may be referred to as A plurality of physical tables corresponding to a “memory table group” may be referred to as a "physical table group”.
  • a plurality of first memory tables maintained by a computing group may be referred to as a "first memory table group”.
  • a computing group may include 4 computing units, maintaining a table group including 16 first memory tables, and these first memory tables may be evenly distributed on each computing unit, that is, each computing unit may maintain 4 first memory tables. Assume that on the data storage device, the computing group corresponds to 2 shards. The 16 physical tables corresponding to the 16 first memory tables may be evenly distributed in the 2 shards, for example, 8 physical tables in each shard.
  • each computing group maintains a corresponding number of second memory tables, which may be called “second memory table groups", for example, to correspond to the multiple first memory tables (ie, first memory table groups) in other computing groups having write functions.
  • each computing sub-group may also maintain the first memory table and/or the second memory table respectively, like the computing group.
  • tablette group includes corresponding memory table groups and physical table groups.
  • first memory table group in the computing group corresponds to the physical table group on the data storage device, and the computing group writes the data in the first memory table group to the corresponding physical table group.
  • the second memory table group in the computing group corresponds to the physical table group by associating/synchronizing with the first memory table group.
  • the write computing group 1 for offline writing maintains the first memory table group “leader tg-a”, and each of the other computing groups maintains the corresponding second memory table group “follower tg-a”.
  • write computing group 1 When write computing group 1 needs to write data, write computing group 1 writes the data into the first memory table group "leader tg-a”, and writes the data in the first memory table group "leader tg-a” into the physical table group corresponding to the data shard 1 corresponding to write computing group 1 in the data storage.
  • the corresponding second memory tables "follower tg-a" in other computing groups can quickly synchronize with the first memory table group "leader tg-a” to have the same data.
  • the delay of this synchronization can be low enough, and 99% of the data changes can be synchronized within 5ms.
  • write computing group 2 maintains the first memory table group "leader tg-b”
  • analysis computing group 1 maintains the first memory table group “leader tg-c”
  • analysis computing group 2 maintains the first memory table group "leader tg-d”.
  • Each computing group maintains the corresponding second memory table group table "follower tg-b”, “follower tg-c”, and "follower tg-d”.
  • memory data can be shared between different computing groups quickly and efficiently while computing resources are isolated, thereby achieving sharing of all data.
  • each computing group (or computing subgroup) with a write function does not maintain a second memory table corresponding to the first memory table maintained by itself.
  • a second memory table corresponding to the first memory table maintained by itself may be maintained in the computing group.
  • the corresponding first memory table (such as "leader tg-a") and the second memory table (such as "follower tg-a") in the same computing group (or the same computing subgroup) may be arranged on different computing nodes (such as workers) of the computing group (or computing subgroup).
  • the data processing system architecture may further include a gateway, a metadata storage, and a data storage.
  • the gateway distributes the task request to the computing group corresponding to the task request.
  • the gateway may assign the task request to the corresponding computing group according to the type of the task request, the scenario or application targeted by the task request, the computing group pointing information contained in the task request, and the like.
  • the data storage that is, the physical storage system, may be, for example, a centralized or distributed storage system, for storing the physical table.
  • the data processing system disclosed in the present invention completely separates computing and storage in terms of technical architecture.
  • computing systems and storage systems can be built based on the cloud, making full use of cloud characteristics, which can have great advantages in terms of performance, concurrency, and ease of use, thus reflecting the characteristics of cloud-native architecture to a large extent.
  • the data on the data storage can be divided into multiple partitions (shards) for storage, such as "Shard 1", “Shard 2", “Shard M”, “Shard N” in Figure 1.
  • a shard corresponds to a computing group.
  • a computing group can correspond to one or more shards.
  • a computing group with write function writes data to its corresponding shard.
  • the metadata storage manages the metadata of the physical tables in the data storage and provides metadata services for multiple computing groups, which share metadata.
  • each computing group can obtain metadata information of the data stored in each shard on the data storage, and thus read the corresponding data.
  • each computing group can share the data stored on the data storage.
  • the metadata storage can also be used to manage computing group configuration information to facilitate flexible creation, adjustment, and destruction of computing groups.
  • the data processing system may further include a data engine controller for managing the computing group.
  • the data engine controller can flexibly perform various management operations on the computing group in response to user instructions or the number (traffic) of task requests.
  • the data engine controller can create a computing group in response to a user's instruction and store the computing group configuration information in the metadata storage.
  • the user can configure each computing group according to application or scenario requirements through the data engine controller.
  • the data engine controller can enable a new computing group to perform corresponding data processing based on the computing group configuration information in the metadata storage.
  • the data engine controller can suspend it so that it no longer performs data processing.
  • the data engine controller can destroy it and release the computing resources allocated to the computing group for use by other computing groups.
  • the data engine controller can also assist users in elastically scaling computing resources to adapt to changes in user needs or task request traffic.
  • Compute groups can be created, destroyed, or reconfigured on demand at any time. Creating or destroying a compute group does not affect user data stored in the data store (or data warehouse, data warehouse).
  • the number of user instructions or task requests can be adjusted by the data engine controller to allocate computing resources to the computing group or computing subgroup (Cluster), such as the number of CPUs.
  • the computing group can also automatically and elastically expand and contract computing subgroups (Cluster) according to the load dynamics.
  • the computing group can further include computing subgroups. It is also possible to increase or decrease computing subgroups (Cluster) in the computing group, for example, through the data engine controller, to achieve the expansion and contraction of computing resources in units of computing subgroups (Cluster).
  • Increasing or decreasing the number of resources allocated to a computing group or computing subgroup can be called “vertical elasticity”, “vertical scaling”, or “vertical expansion and contraction”.
  • Increasing or decreasing the number of computing subgroups in a computing group can be called “horizontal elasticity”, “horizontal scaling”, or “horizontal expansion and contraction”.
  • Vertical elasticity and horizontal elasticity have similar effects but each has its own advantages and disadvantages. In some different scenarios, you can flexibly choose according to needs or application/scenario requirements.
  • the user may directly issue instructions to adjust computing resources, such as instructions to adjust computing resources allocated to a computing group or computing subgroup, and instructions to increase or decrease computing subgroups in a computing group.
  • users can also flexibly set dynamic adjustment strategies.
  • a first traffic threshold may be set, and when the number of task requests directed to a computing group is higher than the first traffic threshold, the computing resources of the computing group are increased, or the computing resources of at least one computing subgroup in the computing group are increased, or a computing subgroup is added to the computing group.
  • Options may be provided to the user to select a specific method for expanding computing resources.
  • a setting module may also be provided to the user so that the user can set a method for expanding computing resources under what circumstances, so that computing resources are expanded in a corresponding manner under different circumstances.
  • a second traffic threshold may be set.
  • the computing resources of the computing group are reduced, or the computing resources of at least one computing subgroup in the computing group are reduced, or computing subgroups are reduced in the computing group.
  • Options may be provided to users to select a specific method for reducing computing resources.
  • a setting module may also be provided to users so that users can set which method to use to reduce computing resources under what circumstances, so that computing resources can be reduced in corresponding ways under different circumstances.
  • the elasticity of computing groups can bring users more cost-effective services. At almost the same price, users can enjoy faster performance.
  • a workload takes 15 hours to execute on 4 computing nodes, it may only take 2 hours to execute on 30 computing nodes.
  • the prices of these two modes are similar, the user experience is fundamentally different: at the same cost, the faster the performance, the better the user experience.
  • the elasticity of the computing group provides users with a good experience. Users can dynamically configure the computing group to complete computing tasks faster without any additional expenses.
  • the data processing system (data warehouse or data warehouse) disclosed in the present invention has a natural computing and storage separation architecture, so it can also be highly scalable in both computing and storage, with dual flexibility.
  • Row-based data and column-based data have their own advantages and disadvantages and can be applied to different application scenarios. For example, it is more convenient to use row-based data when performing point query processing. However, it is often more convenient to use column-based data when performing data analysis.
  • a row storage writing task and a column storage writing task are used to perform data row storage and data column storage respectively.
  • the life cycles of these two tasks are managed separately. When one task is completed, it is not certain whether the other task has also been completed. The synchronization and simultaneous validity of row data and column data cannot be guaranteed.
  • an innovative row-column coexistence method is adopted.
  • the physical table can store data in a row-column coexistence manner.
  • Row-column coexistence means that the computing group writes the row data and column data corresponding to the same data into the same physical table through a write task.
  • the write task ends only after the row data and column data have been written. In this way, the atomicity of the row data write operation and the column data write operation corresponding to the same data can be guaranteed, that is, they take effect at the same time.
  • the data processing system can intelligently generate an optimized plan for query: when performing a point query, a plan for point query will be generated, and when performing an analysis, a plan for point query will be generated.
  • the system can also intelligently analyze whether point-by-point analysis would be better for some query tasks, and then generate a plan that can query both row-based data and column-based data at the same time. This eliminates the need to query two tables at the same time, and truly achieves integrated analysis services for a piece of data.
  • row-column mixed storage there is another way to store row data and column data at the same time, which can be called "row-column mixed storage”. That is, only one copy of the data is stored in form, and a certain amount of data is defined as a data block. The data in a certain column of the data block is stored by column. Different columns of the same data block are stored continuously. Figuratively speaking, it is as if this data block defines a table with multiple rows and columns.
  • this implementation method is not as good as the aforementioned row-column coexistence method in terms of performance.
  • data warehouse data storage system
  • AZ cluster level unavailability
  • the task request directed to the unavailable computing group can be converted to the computing group that executes the same task.
  • a new computing group can be enabled based on the computing group configuration information of the unavailable computing group in the metadata storage to execute the corresponding task request.
  • the task request directed to the unavailable computing subgroup can be switched to other computing subgroups in the computing group. If no other computing subgroups are available in the computing group or the tasks assigned to other computing subgroups are busy, a new computing subgroup can be created in the computing group.
  • the task request directed to the unavailable computing node can be switched to other computing nodes in the computing group, and the original first memory table and/or second memory table on the unavailable computing node can be rebuilt on the other computing nodes.
  • the query can be quickly switched to other normal nodes, which can greatly reduce the impact of node failures on user queries, thereby achieving high query availability within the computing group.
  • computing resources in a computing group or computing subgroup are insufficient due to unavailability of computing nodes, as described above, computing resources such as computing nodes or CPUs allocated to the computing group or computing subgroup may be dynamically adjusted.
  • the following describes a data processing method that can be executed by the above data processing system according to the present disclosure with reference to FIG. 2 .
  • FIG. 2 schematically shows a flow chart of a data processing method according to the present disclosure.
  • step S210 multiple computing groups are provided, and computing resources of each computing group are isolated from each other.
  • step S220 computing groups with a write function among the plurality of computing groups respectively maintain a first memory table.
  • step S230 the computing group with a write function writes the data to be written into the data storage into the first memory table.
  • step S240 the computing group with a write function writes the data in the first memory table into a physical table corresponding to the first memory table in the data storage.
  • data can be stored in a physical table in a row-column coexistence manner, and row data and column data corresponding to the same data can be written into the same physical table through a write task to ensure the atomicity of the row data write operation and column data write operation corresponding to the same data.
  • each computing group also maintains at least one second memory table, and each second memory table corresponds to another computing group having a write function.
  • step S260 the second memory table is synchronized with the first memory table in its corresponding computing group.
  • various operations may be performed on the computing group, for example, in response to a user's instruction or the number of task requests.
  • a compute group may be created and the compute group configuration information may be stored in a metadata store.
  • a new computing group may also be enabled based on the computing group configuration information in the metadata storage to perform corresponding data processing.
  • calculation groups can be managed flexibly.
  • the computing resources allocated to each computing subgroup in the computing group may be adjusted.
  • computing resources of each computing subgroup are isolated from each other, and the first memory table and/or the second memory table are maintained respectively.
  • the task request directed to the unavailable computing group can be converted to the computing group that performs the same task.
  • a new computing group can be enabled based on the computing group configuration information of the unavailable computing group in the metadata storage to perform the corresponding task request.
  • the task request directed to the unavailable computing subgroup is switched to another computing subgroup in the computing group. If there are no other computing subgroups available in the computing group or the tasks assigned to other computing subgroups are busy, a new computing subgroup can be created in the computing group.
  • the task request directed to the unavailable computing node can be switched to other computing nodes in the computing group, and the original first memory table and/or second memory table on the unavailable computing node can be rebuilt on the other computing nodes.
  • the query can be quickly switched to other normal nodes, which can greatly reduce the impact of node failures on user queries, thereby achieving high query availability within the computing group.
  • computing resources in a computing group or computing subgroup are insufficient due to unavailability of computing nodes, as described above, computing resources such as computing nodes or CPUs allocated to the computing group or computing subgroup may be dynamically adjusted.
  • the present disclosure provides a new data processing system architecture, which can also be called a "real-time data warehouse architecture" in some embodiments, which truly realizes the integration of analysis services for a piece of data for users.
  • a new data processing system architecture which can also be called a "real-time data warehouse architecture" in some embodiments, which truly realizes the integration of analysis services for a piece of data for users.
  • it is also possible to completely isolate the resources of many scenarios such as real-time scenarios, analysis scenarios, service scenarios, and offline processing scenarios, so as to achieve high-throughput writing and flexible query without interfering with each other.
  • the service query QPS increases, the query jitter can be significantly reduced, effectively solving the problem of system load conflicts in different scenarios, greatly reducing possible uncontrollable risks, and achieving high availability of services.
  • the data processing system of the disclosed embodiment can provide users with high ease of use, high operability, and high reliability. It not only has the characteristics of flexibility (ready to use), high availability, resource isolation, high elasticity and strong scalability, but also supports transactions, standard SQL syntax, and semi-structured and unstructured data, and tries its best to solve many pain points and difficulties of users in data analysis, so that "data value" is no longer out of reach, but is getting closer and closer to reality, sounding the clarion call for data value.
  • FIG3 shows a schematic diagram of the structure of a computing device that can be used to implement the above data processing method according to an embodiment of the present invention.
  • computing device 300 includes memory 310 and processor 320 .
  • Processor 320 may be a multi-core processor or may include multiple processors.
  • processor 320 may include a general-purpose main processor and one or more special coprocessors, such as a graphics processing unit (GPU), a digital signal processor (DSP), etc.
  • processor 320 may be implemented using a customized circuit, such as an application-specific integrated circuit (ASIC) or a field programmable gate array (FPGA).
  • ASIC application-specific integrated circuit
  • FPGA field programmable gate array
  • the memory 310 may include various types of storage units, such as system memory, read-only memory (ROM), and permanent storage devices. Among them, ROM can store static data or instructions required by the processor 320 or other modules of the computer.
  • the permanent storage device may be a readable and writable storage device.
  • the permanent storage device may be a non-volatile storage device that does not lose the stored instructions and data even after the computer is powered off.
  • the permanent storage device uses a large-capacity storage device (such as a magnetic or optical disk, flash memory) as a permanent storage device.
  • the permanent storage device may be a removable storage device (such as a floppy disk, optical drive).
  • the system memory may be a readable and writable storage device or a volatile readable and writable storage device, such as a dynamic random access memory.
  • the system memory may store some or all instructions and data required by the processor at run time.
  • the memory 310 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), and disks and/or optical disks may also be used.
  • the memory 310 may include a removable storage device that can be read and/or written, such as a laser disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, double-layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, mini SD card, Micro-SD card, etc.), a magnetic floppy disk, etc.
  • a removable storage device that can be read and/or written, such as a laser disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, double-layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, mini SD card, Micro-SD card, etc.), a magnetic floppy disk, etc.
  • the computer-readable storage medium does not contain carrier waves and transient electronic signals transmitted wirelessly or wired.
  • the memory 310 stores executable codes, and when the executable codes are processed by the processor 320 , the processor 320 can execute the data processing method mentioned above.
  • the method according to the present invention may also be implemented as a computer program or a computer program product, which includes computer program code instructions for executing the above steps defined in the above method of the present invention.
  • each square box in the flow chart or block diagram can represent a part of a module, a program segment or a code, and the part of the module, the program segment or the code contains one or more executable instructions for realizing the specified logical function.
  • the functions marked in the square box can also occur in a sequence different from that marked in the accompanying drawings. For example, two continuous square boxes can actually be executed substantially in parallel, and they can sometimes be executed in the opposite order, depending on the functions involved.
  • each square box in the block diagram and/or the flow chart, and the combination of the square boxes in the block diagram and/or the flow chart can be implemented with a dedicated hardware-based system that performs the specified function or operation, or can be implemented with a combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本公开涉及一种数据处理方法和数据处理系统。该数据处理系统括:多个计算组,各计算组的计算资源相互隔离。具有写入功能的计算组维护第一内存表,具有写入功能的计算组被配置为将要写入数据存储器的数据写入第一内存表,并将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表。各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组中的第一内存表。计算组被配置为使第二内存表与其所对应的第一内存表同步。由此,本公开的数据处理系统能够在用于各种场景或任务的计算组共享数据的情况下,方便而又灵活地实现计算资源的隔离。

Description

数据处理方法和数据处理系统
本申请要求于2022年11月30日提交中国专利局、申请号为202211513575.4、申请名称为“数据处理方法和数据处理系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及一种数据处理方法和系统,特别涉及海量数据的读写及分析处理。
背景技术
随着互联网领域的持续快速发展,对海量数据的写入、读取、分析等处理的要求越来越高。
已提出一站式实时数据仓库引擎,支持海量数据实时写入、实时更新、实时分析,支持标准SQL(结构化查询语言),支持PB(1PB=1024TB=250字节)级数据多维在线分析(OLAP)与即席分析(Ad Hoc),支持高并发低延迟的在线数据服务,提供离线/在线一体化全栈数仓解决方案。
然而,用户在使用数据仓库引擎同时服务于实时场景、分析场景、服务场景、离线加工场景等多个场景时,遇到了不同场景的系统计算资源负载冲突的问题,进而影响用户服务的高可用性。
因此,仍然需要一种改进的数据处理方案以处理上述技术问题。
发明内容
本公开要解决的一个技术问题是提供一种数据处理方法和数据处理系统,其能够在用于各种场景或任务的计算资源共享数据的情况下,方便而又灵活地实现计算资源的隔离。
根据本公开的第一个方面,提供了一种数据处理系统,包括:多个计算组,各计算组的计算资源相互隔离,其中,具有写入功能的计算组维护第一内存表,具有写入功能的计算组被配置为将要写入数据存储器的数据写入第一内存表,并将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表;并且各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组中的第一内存表,并且计算组被配置为使第二内存表与其所对应的第一内存表同步。
可选地,该数据处理系统还可以包括:网关,用于将任务请求分配到与任务请求对应的计算组;以及/或者元数据存储器,用于管理数据存储器中的物理表的元数据,并为多个计算组提供元数据服务,多个计算组共享元数据;以及/或者数据存储器,用于存储物理表。
可选地,元数据存储器还用于管理计算组配置信息,该系统还包括数据引擎控制器, 用于响应于用户的指令或任务请求的数量,执行下述至少一项操作:创建计算组,并将计算组配置信息存储在元数据存储器中;基于元数据存储器中的计算组配置信息启用新的计算组以执行相应的数据处理;挂起计算组,使其不再执行数据处理;销毁计算组,释放分配给该计算组的计算资源;调整分配给计算组的计算资源;调整分配给计算组中各计算子组的计算资源;以及在计算组中增加或减少计算子组,其中,各计算子组的计算资源相互隔离,且分别维护第一内存表和/或第二内存表。
可选地,物理表以行列并存的方式存储数据,通过一个写任务将同一数据对应的行数据和列数据写入同一个物理表中,以保证同一数据对应的行数据写入操作和列数据写入操作的原子性。
可选地,计算组包括下述至少一种:用于离线写入的计算组;用于实时写入的计算组;用于提供数据查询服务的计算组;以及用于提供数据分析服务的计算组。
可选地,具有写入功能的计算组维护多个第一内存表,对应于数据存储器上多个物理表;各计算组中维护相应数量的第二内存表,以与具有写入功能的其它计算组中的多个第一内存表相对应。
可选地,响应于计算组不可用,在存在执行相同任务的其它计算组的情况下,将指向不可用计算组的任务请求转为指向执行相同任务的计算组,或者基于元数据存储器中不可用计算组的计算组配置信息,启用新的计算组,以执行相应任务请求。
可选地,响应于计算组内计算子组不可用,在计算组内创建新的计算子组和/或将指向不可用计算子组的任务请求切换到计算组内其它计算子组。
可选地,响应于计算组内计算节点不可用,将指向不可用计算节点的任务请求切换到计算组内其它计算节点,并在其它计算节点上重新构建不可用计算节点上原有的第一内存表和/或第二内存表。
根据本公开的第二个方面,一种数据处理方法,包括:提供多个计算组,各计算组的计算资源相互隔离;具有写入功能的计算组维护第一内存表;具有写入功能的计算组将要写入数据存储器的数据写入第一内存表,并将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表;各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组;以及使第二内存表与其对应的计算组中的第一内存表同步。
根据本公开的第三个方面,提供了一种计算设备,包括:处理器;以及存储器,其上存储有可执行代码,当可执行代码被处理器执行时,使处理器执行如上述第一方面所述的方法。
根据本公开的第四个方面,提供了一种计算机程序产品,包括可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如上述第一方面所述的方法。
根据本公开的第五个方面,提供了一种非暂时性机器可读存储介质,其上存储有可执行代码,当可执行代码被电子设备的处理器执行时,使处理器执行如上述第一方面所述的 方法。
由此,本公开的数据处理系统能够在用于各种场景或任务的计算组共享数据的情况下,方便而又灵活地实现计算资源的隔离。
附图说明
通过结合附图对本公开示例性实施方式进行更详细的描述,本公开的上述以及其它目的、特征和优势将变得更加明显,其中,在本公开示例性实施方式中,相同的参考标号通常代表相同部件。
图1示意性示出了根据本公开的数据处理系统架构。
图2示意性地示出了根据本公开的数据处理方法的流程图。
图3示出了根据本发明一实施例可用于实现上述数据处理方法的计算设备的结构示意图。
具体实施方式
下面将参照附图更详细地描述本公开的优选实施方式。虽然附图中显示了本公开的优选实施方式,然而应该理解,可以以各种形式实现本公开而不应被这里阐述的实施方式所限制。相反,提供这些实施方式是为了使本公开更加透彻和完整,并且能够将本公开的范围完整地传达给本领域的技术人员。
为解决前述计算资源之间负载冲突的问题,本公开提出了一种计算资源隔离的方案,将计算资源分解为不同的计算组。
在一些情况下,计算组也可以称为“虚拟数仓(Virtual Warehouse)”。计算组之间可以共享数据和元数据,数据存储器上的物理文件可以完全复用。
计算组可以由例如数仓系统架构的一组计算节点组成。用户可以十分灵活地按需定制每个计算组的大小。对用户提供资源隔离、弹性、计算多活、高可用等核心能力。
用户可以申请多个计算组,每个计算组之间共享同一份数据。用户可以任意按需扩展计算组的数量和配置,但是只需要存储和操作一份数据。
下面参考附图详细描述根据本公开的数据处理方案。
一、计算资源隔离。
图1示意性示出了根据本公开的数据处理系统架构。
该数据处理系统架构可以用作数仓(即,数据仓库)架构。根据本公开,该系统架构能够支持实时写入和实时分析处理,因此也可以称为“实时数仓”。
如图1所示,该系统架构可以根据用的需求、申请或配置,设置多个计算组,将计算资源分配给这些计算组,各计算组的计算资源相互隔离。
计算组的计算资源例如可以分为多个计算节点,或称为“工作机(worker)”。或者, 计算组的计算资源也可以根据为其分配的CPU(或称为“核(core)”)的数量来衡量,例如4核(4core)、16核、32核等。
在一个实施例中,一个计算组可以包括多个容器组(pod),一个容器组可以作为一个计算节点,或工作机(worker)。例如,可以设定一个容器组有16核,作为基本计算资源组,可以根据需要来设定各计算组中容器组的数量。
计算组中的计算资源还可以进一步分为多个计算子组,或者称为“集簇(cluster)”,各计算子组的计算资源相互隔离。
在实施例中,可以将一个计算组中的多个容器组划分到多个计算子组中。或者说,计算组中的每个计算子组分别包括多个容器组。
这样,计算子组可以具有和一般计算组相当的资源,例如计算节点(容器)、CPU、内存等等,计算子组和计算子组之间,就像计算组和计算组之间一样,计算资源相互隔离。只是属于同一个计算组的计算子组例如可以具有相同的配置,可以处理相同类型或场景的任务请求。系统可以根据需要扩增或缩减计算组中计算子组的数量,以适应任务请求流量的变化。
不同计算组(或不同计算子组)并不会共享同一个计算节点,因此不同计算组(或不同计算子组)上运行的查询或写入任务之间互相不会有资源竞争和性能影响。这样可以保证不同负载的查询和/或写入之间互不影响性能,做到很好的资源隔离。
如前文所述,计算组可以独立地弹性扩展,可以例如进行计算资源的弹性分配,按需创建计算组。
计算组的配置可以根据用户的需求和/或任务请求流量来设置。
这些计算组所拥有的资源既所执行的功能等可以基于相应的场景来进行灵活的配置。
每个计算组上可以并发运行多个查询,每个查询请求会被路由到某个计算组上进行执行。
用户不同负载(例如涉及不同类型、不同场景或不同应用)的查询或写入任务可以运行在不同的计算组上,计算组之间的计算资源是互相独立(也即,隔离)的。
各计算组执行的任务类型等也可以由用户灵活的配置。通过配置不同的计算组来执行不同类型的任务,或者涉及不同场景或不同应用的任务,可以实现不同类型、不同场景或不同应用的任务之间的计算资源隔离。
例如,一些计算组可以专用于执行写入操作,具有向数据存储器写入数据的功能,例如图1中示出的用于执行离线写入任务的写入计算组1、用于执行实时写入任务的写入计算组2。
又例如,一些计算组可以执行读取操作,具有从数据存储器读取数据的功能,例如图1中示出的服务计算组。
又例如,一些计算组可以用于执行数据分析操作,可以具有从数据存储器读取数据的功能,一些情况下,也可以具有向数据存储器写入数据如分析结果等的功能,例如图1中 示出的分析计算组1和分析计算组2。
这样,用户可以基于计算组实现多种隔离方式。
例如,可以实现写写隔离,如,不同的实时写入任务之间相互隔离(分离),不同的离线写入任务之间相互隔离B(分离),以及实时写入任务与离线写入任务之间隔离(分离)。
又例如,可以实现读写隔离,如,实时写入、离线写入等写入任务与服务分析、即席分析、在线分析等读取任务之间的读写隔离。
又例如,可以实现读读隔离,如,服务分析、即席分析、在线分析等不同读取任务之间的读读隔离。
又例如,可以实现应用场景隔离。如,用户的一个实例可能有多个工作部门在使用,多个工作部门可能会以不同的方式、不同的逻辑、不同的侧重点来进行数据分析处理,例如财务部门可能以财务数据分析的角度进行处理,市场部门可能会以产品销售的角度进行分析处理。因此,可以将多个工作部门的数据处理任务按照不同的计算组隔离开,不同的工作部门对应不同的计算组,实现工作部门之间完全的资源隔离,避免应用场景或工作领域之间相关影响。
当然,各个工作部门内部还可以进一步进行写写隔离、读写隔离、读读隔离等,从而可以实现十分灵活的多种资源隔离方式。
例如,一个用户可以对应于一个企业,企业内各部门之间可以共享企业数据。在一些不同企业可以共享数据的特殊情况下,用户创建的一个数据处理系统架构实例也可以供这些能够共享数据的多个企业来共同使用。这种情况下,可以为各个企业分配不同的计算组,将为各企业执行数据处理任务的计算资源隔离开。
二、计算资源隔离下的数据共享。
如前文所述,多个计算组各自的计算资源相互隔离。另一方面,数据存储器中存储的数据应当对相互隔离的各个计算组共享。例如,一个用户申请了多个计算组来处理数据,这些计算组会对数据存储器上的物理表进行数据写入、修改、读取等处理。需要允许这些计算组之间能够实时共享物理表中的数据。
为此,具有写入功能的计算组可以维护第一内存表,将要写入数据存储器的数据写入第一内存表,并将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表。
第一内存表也可以称为“主表”或“引导(leader)表”。
另一方面,各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组中的第一内存表,并且各计算组被配置为使其维护的第二内存表与该第二内存表所对应的第一内存表同步。
第二内存表也可以称为“从表”或“跟随(follower)表”。
实践中,具有写入功能的计算组可以维护多个第一内存表,对应于数据存储器上多个物理表。一个计算组涉及的多个表可以称为“表组(table group)”,多个内存表可以称为 “内存表组”,对应的多个物理表可以称为“物理表组”。一个计算组维护的多个第一内存表可以称为“第一内存表组”。
例如,计算组可以包括4个计算单元,维护包括16个第一内存表的表组,每个计算单元上可以均衡地分配这些第一内存表,即每个计算单元可以维护4个第一内存表。假设在数据存储器上,该计算组对应2个分片(shard)。与该16个第一内存表对应的16个物理表可以均衡地分配在这2个分片中,例如每个分片中8个物理表。
相应地,各计算组中维护相应数量的第二内存表,例如可以称为“第二内存表组”,以与具有写入功能的其它计算组中的多个第一内存表(即第一内存表组)相对应。
另外,在计算组又被分为多个计算子组的情况下,各个计算子组也可以象计算组一样,分别维护第一内存表和/或第二内存表。
一般而言,“表组”的概念包含相对应的内存表组和物理表组。在本公开的上下文中,计算组中的第一内存表组与数据存储器上的物理表组对应,计算组将第一内存表组中的数据写入到对应的物理表组中。而计算组中的第二内存表组通过与第一内存表组关联/同步,而与物理表组相对应。
在图1所示的示例中,用于离线写入的写入计算组1维护第一内存表组“leader tg-a”,其它各计算组皆维护与其对应的第二内存表组“follower tg-a”。
当写入计算组1需要写入数据时,写入计算组1将数据写入第一内存表组“leader tg-a”,并将第一内存表组“leader tg-a”中的数据写入到数据存储器中写入计算组1对应的数据分片(shard)1中对应的物理表组中。
随着第一内存表组“leader tg-a”发生变化,其它各计算组中对应的第二内存表“follower tg-a”都可以迅速与第一内存表组“leader tg-a”同步,以便具有相同的数据。目前,这个同步的延迟已经可以做到足够低,对于99%的数据变化能够在5ms内完成同步。
类似地,写入计算组2维护第一内存表组“leader tg-b”,分析计算组1维护第一内存表组“leader tg-c”,分析计算组2维护第一内存表组“leader tg-d”。各个计算组都分别维护相对应的第二内存表组表“follower tg-b”、“follower tg-c”、“follower tg-d”。
这样,能够在计算资源隔离的情况下,快速有效地实现不同计算组之间内存数据的共享,并从而实现所有数据的共享。
图1所示的示例中,各个具有写入功能的计算组(或计算子组),如写入计算组1中没有维护与其自身维护的第一内存表对应的第二内存表。然而,这不是必须的。在一些情况下,计算组中可以维护与自身维护的第一内存表对应的第二内存表。优选地,同一个计算组(或同一个计算子组)中相对应的第一内存表(如“leader tg-a”)和第二内存表(如“follower tg-a”)可以布置在计算组(或计算子组)不同的计算节点(如工作机(worker))上。
三、总体系统架构。
如图1所示,该数据处理系统架构还可以包括网关、元数据存储器以及数据存储器。
网关将任务请求分配到与该任务请求对应的计算组。
例如,网关可以根据任务请求的类型、任务请求所针对的场景或应用、任务请求中包含的计算组指向信息等等,将任务请求分配到相应的计算组。
数据存储器,也即物理存储系统,例如可以是集中式或分布式的存储系统,用于存储物理表。
本公开的数据处理系统在技术架构上将计算和存储彻底分离。例如可以基于云构建计算系统和存储系统,充分利用云特性,在性能、并发性和易用性等方面都可以具有非常大的优势,从而在很大程度上体现出了云原生架构的特点。
数据存储器上的数据可以分为多个分区(shard)来存储,例如图1中的“分片1”、“分片2”、“分片M”、“分片N”等。一个分片对应于一个计算组。一个计算组可以对应于一个或多个分片。具有写入功能的计算组将数据写入到其所对应的分片中。
元数据存储器管理数据存储器中的物理表的元数据,并为多个计算组提供元数据服务,多个计算组共享元数据。换言之,各个计算组都可以获取数据存储器上各分片中存储的数据的元数据信息,从而读取相应数据。由此,各计算组可以共享数据存储器上存储的数据。
另外,元数据存储器还可以用于管理计算组配置信息,以便于灵活地创建、调整、销毁计算组。
如图1所示,该数据处理系统还可以包括数据引擎控制器,用于对计算组进行管理。
例如,数据引擎控制器可以响应于用户的指令或任务请求的数量(流量),灵活地对计算组执行各种管理操作。
数据引擎控制器可以响应于用户的指令来创建计算组,并将计算组配置信息存储在所述元数据存储器中。用户可以通过数据引擎控制器来根据应用或场景需要来配置各个计算组。
根据需要,数据引擎控制器可以基于元数据存储器中的计算组配置信息,启用新的计算组以执行相应的数据处理。
对于暂时不需要的计算组,数据引擎控制器可以将其挂起,使其不再执行数据处理。
对于不再需要的计算组,数据引擎控制器可以将其销毁,释放分配给该计算组的计算资源,以供其它计算组使用。
数据引擎控制器还可以协助用户实现计算资源的弹性扩缩容,以适配用户的需求或任务请求流量的变化。
四、弹性。
计算组可以在任意时间进行按需地创建、销毁或者重新配置。创建或者销毁计算组不会影响数据存储器(或数据仓库、数仓)中存储的用户数据。
还可以根据计算需求动态地申请计算资源,实现更大程度上的资源弹性。例如响应于 用户的指令或任务请求的数量,可以通过数据引擎控制器来调整分配给计算组或计算子组(Cluster)的计算资源,例如CPU的数量等。
计算组还可以根据负载动态的自动弹性扩缩计算子组(Cluster)。如上文所述,计算组中可以进一步包括计算子组。还可以例如通过数据引擎控制器在计算组中增加或减少计算子组(Cluster),以计算子组(Cluster)为单位来实现计算资源的扩缩容。
增加或减少分配给计算组或计算子组的资源数量,例如CPU数量,可以称为“纵向弹性”、“纵向伸缩”或“纵向扩缩”。增加或减少计算组中计算子组的数量可以称为“横向弹性”、“横向伸缩”或“横向扩缩”。纵向弹性和横向弹性效果接近而各有优劣,在一些不同的场景中,可以根据需要或应用/场景需要来灵活地选择。
用户可以直接发出指令来调整计算资源,例如指令调整分配给计算组或计算子组的计算资源,指令在计算组中增加或减少计算子组。
或者,用户也可以灵活地设置动态调整的策略。
例如,可以设置第一流量阈值,当指向一个计算组的任务请求的数量高于第一流量阈值时,增加该计算组的计算资源,或者增加该计算组中至少一个计算子组的计算资源,或者在该计算组中增加计算子组。可以为用户提供选项,以选择具体采用何种方式来进行计算资源的扩增。也可以为用户提供设置模块,以便用户设置在何种情况下采用何种方式来扩增计算资源,从而在不同情况下采用相应方式来扩增计算资源。
又例如,还可以设置第二流量阈值,当指向一个计算组的任务请求的数量低于第二流量阈值时,减少该计算组的计算资源,或者减少该计算组中至少一个计算子组的计算资源,或者在该计算组中减少计算子组。可以为用户提供选项,以选择具体采用何种方式来进行计算资源的缩减。也可以为用户提供设置模块,以便用户设置在何种情况下采用何种方式来缩减计算资源,从而在不同情况下采用相应方式来缩减计算资源。
这样,可以实现非常灵活的计算组配置与扩缩容。
计算组的弹性可以为用户带来性价比更高的服务,在几乎同样的价格下,用户可以享受更快的性能。
例如:如果某个工作负载在4个计算节点上执行需要花费15个小时,那么在30个计算节点上执行可能只需要花费2个小时。虽然这两种模式的价格差不多,但是带给用户的体验却有着根本的区别:在同样花费的情况下,性能越快用户感受就越好。而计算组的弹性恰恰为用户提供了良好体验的选择,用户可以动态配置计算组,以更快地完成计算任务,但是并不需要额外多的花费开销。
除了前述计算组的弹性,本公开的数据处理系统(数据仓库或数仓)具有天然的计算存储分离架构,因此还可以同时做到计算、存储两方面皆高度可扩展,具有双重弹性。
五、行列共存。
目前,在物理表中存储数据时,往往需要以行的方式进行存储,即行存,又需要以列 的方式进行存储,即列存。行存数据与列存数据各有优劣,可以适用于不同的应用场景。例如,在进行点查处理时,使用行存数据会比较方便。而要进行数据分析时,往往使用列存数据会更加方便。
在常规数据写入方案中,对于同一份数据,用一个行存写入任务和一个列存写入任务来分别执行数据行存和数据列存。这两个任务的生命周期是分别管理的,一个任务完成时,不能确定另一个任务是否也已完成。行数据与列数据的同步、同时有效性不能保证。
而在根据本公开的数据处理方案的实施例中,采用了创新的行列共存方式实现。物理表可以以行列并存的方式存储数据。行列并存是指,计算组通过一个写任务,将同一数据对应的行数据和列数据写入同一个物理表中。行数据和列数据都已完成写入后,该写任务才结束。由此,可以保证同一数据对应的行数据写入操作和列数据写入操作的原子性,即同时生效。
换言之,一份数据同时存储为行存和列存两种共存的格式,数据在进行读写时,行存和列存同时原子生效。用户只需关心数据写入和读取本身即可。
用户不再需要同时建多张表,并对多张表同时进行写入,来实现上述的效果。由此,可以避免多份写入带来的资源开销、资源相互影响、数据一致性等诸多问题。
用户在查询时,数据处理系统可以智能地生成优化的计划方案(Plan)进行查询:在进行点查时会生成走行存点查的计划方案,在进行分析时会生成走列存的计划方案。
在进行分析时,例如用户同时需要行存点查分析和列存多维分析时,系统还可以智能地分析出是否有部分查询任务走点查方案会更优,进而生成可同时查行存数据和列存数据的计划方案,不再需要同时查两张表才能实现,真正做到对一份数据的分析服务一体化。
例如,在一些现有的数据仓库引擎方案中,用户需要同时建两张表并对两张表同时写入来实现同时存储行数据和列数据的效果,会带来两份写入额外的资源开销,且需要用户保证两张表的数据一致性等诸多问题。
相应地,使用这样的现有数据仓库引擎,当用户同时需要行存点查分析和列存多维分析时,需要同时查两张表才能实现,且可能无法智能的生成优化的计划方案,没办法真正对用户实现一份数据的分析服务一体化。
而本公开的实施例通过前述创新的行列共存方案,解决了这样的技术问题。
另外,还可以通过另外一种方式来实现同时存储行数据和列数据,可以称为“行列混合存储”。即,形式上只存一份数据,一定数据量的数据定义为一个数据块(block),数据在数据块中的某一列,是按列存储的。同一个数据块的不同列,又是连续存储的。形象地来描述,就好像这个数据块定义了一个多行多列的表格一样。但是这种实现方式在性能上不如前述行列共存方式好。
六、高可用性。
在本公开的数据存储系统(数据仓库)中,可容忍节点、计算子组、计算组级的不可 用或故障。进一步地,结合数据仓库系统等的容灾实例可实现对集群(AZ)/区域级的不可用的容忍,对用户提供更大程度的高可用能力。
具体说来,响应于计算组不可用,可以在存在执行相同任务的其它计算组的情况下,将指向不可用计算组的任务请求转为指向执行相同任务的计算组。或者,例如在当前不存在执行相同任务的其它计算组的情况下,可以基于元数据存储器中不可用计算组的计算组配置信息,启用新的计算组,以执行相应任务请求。
响应于计算组内计算子组不可用,在计算组中有其它计算子组可用的情况下,可以将指向不可用计算子组的任务请求切换到计算组内其它计算子组。在计算组中没有其它计算子组可用或分配给其它计算子组的任务较为繁忙的情况下,可以在计算组内创建新的计算子组。
响应于计算组内计算节点不可用,可以将指向不可用计算节点的任务请求切换到计算组内其它计算节点,并在所述其它计算节点上重新构建所述不可用计算节点上原有的第一内存表和/或第二内存表。这样,在节点出故障(Failover)后,可以快速将查询切到其它正常的节点上,可大大减少节点故障对用户查询的影响,从而在计算组内部实现查询高可用性。
另外,如果由于计算节点不可用,导致计算组或计算子组内计算资源不足,如上文所述,可以动态调整分配给计算组或计算子组的计算资源如计算节点或CPU等。
七、数据处理方法。
下面参考图2描述根据本公开可由上述数据处理系统执行的数据处理方法。
图2示意性地示出了根据本公开的数据处理方法的流程图。
如图2所示,在步骤S210,提供多个计算组,各计算组的计算资源相互隔离。
在步骤S220,多个计算组中具有写入功能的计算组分别维护第一内存表。
在步骤S230,具有写入功能的计算组将要写入数据存储器的数据写入第一内存表。
在步骤S240,具有写入功能的计算组将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表。
这里,可以在物理表中以行列并存的方式存储数据,通过一个写任务将同一数据对应的行数据和列数据写入同一个物理表中,以保证同一数据对应的行数据写入操作和列数据写入操作的原子性。
在步骤S250,各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组。
在步骤S260,使第二内存表与其对应的计算组中的第一内存表同步。
由此,在计算资源相互隔离的计算组之间实现数据共享。
进一步地,可以例如响应于用户的指令或任务请求的数量,对计算组执行各种操作。
例如,可以创建计算组,并将计算组配置信息存储在元数据存储器中。
还可以基于元数据存储器中的计算组配置信息启用新的计算组以执行相应的数据处理。
还可以挂起计算组,使其不再执行数据处理。
还可以销毁计算组,释放分配给该计算组的计算资源。
由此,可以灵活地管理计算组。
另外,也可以调整分配给计算组的计算资源。
或者,也可以调整分配给计算组中各计算子组的计算资源。
还可以在计算组中增加或减少计算子组。其中,各计算子组的计算资源相互隔离,且分别维护第一内存表和/或第二内存表。
由此,可以灵活地实现计算资源的弹性缩放。
进一步地,还可以在各种层级实现高可用性,可容忍节点、计算子组、计算组级的不可用或故障。
例如,响应于计算组不可用,在存在执行相同任务的其它计算组的情况下,可以将指向不可用计算组的任务请求转为指向执行相同任务的计算组。或者,可以基于元数据存储器中不可用计算组的计算组配置信息,启用新的计算组,以执行相应任务请求。
响应于计算组内计算子组不可用,在计算组中有其它计算子组可用的情况下,将指向不可用计算子组的任务请求切换到计算组内其它计算子组。在计算组中没有其它计算子组可用或分配给其它计算子组的任务较为繁忙的情况下,可以在计算组内创建新的计算子组,
响应于计算组内计算节点不可用,可以将指向不可用计算节点的任务请求切换到计算组内其它计算节点,并在其它计算节点上重新构建不可用计算节点上原有的第一内存表和/或第二内存表。这样,在节点出故障(Failover)后,可以快速将查询切到其它正常的节点上,可大大减少节点故障对用户查询的影响,从而在计算组内部实现查询高可用性。
另外,如果由于计算节点不可用,导致计算组或计算子组内计算资源不足,如上文所述,可以动态调整分配给计算组或计算子组的计算资源如计算节点或CPU等。
由此,本公开提供了一种新的数据处理系统架构,在一些实施例中也可以称为“实时数仓架构”,真正对用户实现了一份数据的分析服务一体化。而且,还可以将实时场景、分析场景、服务场景、离线加工场景等诸多场景的完全资源隔离,实现高吞吐写入和灵活查询互不干扰,服务查询QPS增长的同时,查询抖动可明显减少,有效地解决不同场景的系统负载冲突的问题,大大减少可能出现的不可控风险,实现服务的高可用性。
本公开实施例的数据处理系统能够为用户提供高易用性、高可操作性,而且能够提供高可靠性,不仅具有灵活性(即买即用)、高可用性、资源隔离、高弹性和强扩展性等特点,而且支持事务、标准SQL语法和半结构化、非结构化数据,尽可能解决用户在数据分析上的诸多痛点难点问题,让“数据价值”不再可望不可即,而是越来越接近现实,吹响数据价值的号角。
图3示出了根据本发明一实施例可用于实现上述数据处理方法的计算设备的结构示意图。
参见图3,计算设备300包括存储器310和处理器320。
处理器320可以是一个多核的处理器,也可以包含多个处理器。在一些实施例中,处理器320可以包含一个通用的主处理器以及一个或多个特殊的协处理器,例如图形处理器(GPU)、数字信号处理器(DSP)等等。在一些实施例中,处理器320可以使用定制的电路实现,例如特定用途集成电路(ASIC,Application Specific Integrated Circuit)或者现场可编程逻辑门阵列(FPGA,Field Programmable Gate Arrays)。
存储器310可以包括各种类型的存储单元,例如系统内存、只读存储器(ROM),和永久存储装置。其中,ROM可以存储处理器320或者计算机的其他模块需要的静态数据或者指令。永久存储装置可以是可读写的存储装置。永久存储装置可以是即使计算机断电后也不会失去存储的指令和数据的非易失性存储设备。在一些实施方式中,永久性存储装置采用大容量存储装置(例如磁或光盘、闪存)作为永久存储装置。另外一些实施方式中,永久性存储装置可以是可移除的存储设备(例如软盘、光驱)。系统内存可以是可读写存储设备或者易失性可读写存储设备,例如动态随机访问内存。系统内存可以存储一些或者所有处理器在运行时需要的指令和数据。此外,存储器310可以包括任意计算机可读存储媒介的组合,包括各种类型的半导体存储芯片(DRAM,SRAM,SDRAM,闪存,可编程只读存储器),磁盘和/或光盘也可以采用。在一些实施方式中,存储器310可以包括可读和/或写的可移除的存储设备,例如激光唱片(CD)、只读数字多功能光盘(例如DVD-ROM,双层DVD-ROM)、只读蓝光光盘、超密度光盘、闪存卡(例如SD卡、min SD卡、Micro-SD卡等等)、磁性软盘等等。计算机可读存储媒介不包含载波和通过无线或有线传输的瞬间电子信号。
存储器310上存储有可执行代码,当可执行代码被处理器320处理时,可以使处理器320执行上文述及的数据处理方法。
上文中已经参考附图详细描述了根据本发明的数据处理方案。
此外,根据本发明的方法还可以实现为一种计算机程序或计算机程序产品,该计算机程序或计算机程序产品包括用于执行本发明的上述方法中限定的上述各步骤的计算机程序代码指令。
或者,本发明还可以实施为一种非暂时性机器可读存储介质(或计算机可读存储介质、或机器可读存储介质),其上存储有可执行代码(或计算机程序、或计算机指令代码),当所述可执行代码(或计算机程序、或计算机指令代码)被电子设备(或计算设备、服务器等)的处理器执行时,使所述处理器执行根据本发明的上述方法的各个步骤。
本领域技术人员还将明白的是,结合这里的公开所描述的各种示例性逻辑块、模块、 电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。
附图中的流程图和框图显示了根据本发明的多个实施例的系统和方法的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,所述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标记的功能也可以以不同于附图中所标记的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术的改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (14)

  1. 一种数据处理系统,包括:
    多个计算组,各计算组的计算资源相互隔离,
    其中,具有写入功能的计算组维护第一内存表,所述具有写入功能的计算组被配置为将要写入数据存储器的数据写入第一内存表,并将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表;并且
    各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组中的第一内存表,并且所述计算组被配置为使所述第二内存表与其所对应的第一内存表同步。
  2. 根据权利要求1所述的数据处理系统,还包括:
    网关,用于将任务请求分配到与所述任务请求对应的计算组;以及/或者
    元数据存储器,用于管理数据存储器中的物理表的元数据,并为所述多个计算组提供元数据服务,所述多个计算组共享所述元数据;以及/或者
    数据存储器,用于存储所述物理表。
  3. 根据权利要求2所述的数据处理系统,其中,所述元数据存储器还用于管理计算组配置信息,该系统还包括数据引擎控制器,用于响应于用户的指令或任务请求的数量,执行下述至少一项操作:
    创建计算组,并将计算组配置信息存储在所述元数据存储器中;
    基于元数据存储器中的计算组配置信息启用新的计算组以执行相应的数据处理;
    挂起计算组,使其不再执行数据处理;
    销毁计算组,释放分配给该计算组的计算资源;
    调整分配给计算组的计算资源;
    调整分配给计算组中各计算子组的计算资源;以及
    在计算组中增加或减少计算子组,其中,各计算子组的计算资源相互隔离,且分别维护第一内存表和/或第二内存表。
  4. 根据权利要求1所述的数据处理系统,其中,
    所述物理表以行列并存的方式存储数据,通过一个写任务将同一数据对应的行数据和列数据写入同一个物理表中,以保证同一数据对应的行数据写入操作和列数据写入操作的原子性。
  5. 根据权利要求1所述的数据处理系统,其中,所述计算组包括下述至少一种:
    用于离线写入的计算组;
    用于实时写入的计算组;
    用于提供数据查询服务的计算组;以及
    用于提供数据分析服务的计算组。
  6. 根据权利要求1所述的数据处理系统,其中,
    具有写入功能的计算组维护多个第一内存表,对应于数据存储器上多个物理表;
    各计算组中维护相应数量的第二内存表,以与具有写入功能的其它计算组中的多个第一内存表相对应。
  7. 根据权利要求1所述的数据处理系统,其中,
    响应于计算组不可用,在存在执行相同任务的其它计算组的情况下,将指向不可用计算组的任务请求转为指向执行相同任务的计算组,或者基于元数据存储器中不可用计算组的计算组配置信息,启用新的计算组,以执行相应任务请求;并且/或者
    响应于计算组内计算子组不可用,在计算组内创建新的计算子组和/或将指向不可用计算子组的任务请求切换到计算组内其它计算子组;并且/或者
    响应于计算组内计算节点不可用,将指向不可用计算节点的任务请求切换到计算组内其它计算节点,并在所述其它计算节点上重新构建所述不可用计算节点上原有的第一内存表和/或第二内存表。
  8. 一种数据处理方法,包括:
    提供多个计算组,各计算组的计算资源相互隔离;
    具有写入功能的计算组维护第一内存表;
    所述具有写入功能的计算组将要写入数据存储器的数据写入第一内存表,并将第一内存表中的数据写入数据存储器中与第一内存表对应的物理表;
    各计算组还分别维护至少一个第二内存表,各第二内存表分别对应于具有写入功能的其它计算组;以及
    使所述第二内存表与其对应的计算组中的第一内存表同步。
  9. 根据权利要求8所述的数据处理方法,还包括:响应于用户的指令或任务请求的数量,执行下述至少一项操作:
    创建计算组,并将计算组配置信息存储在所述元数据存储器中;
    基于元数据存储器中的计算组配置信息启用新的计算组以执行相应的数据处理;
    挂起计算组,使其不再执行数据处理;
    销毁计算组,释放分配给该计算组的计算资源;
    调整分配给计算组的计算资源;
    调整分配给计算组中各计算子组的计算资源;以及
    在计算组中增加或减少计算子组,其中,各计算子组的计算资源相互隔离,且分别维护第一内存表和/或第二内存表。
  10. 根据权利要求9所述的数据处理方法,其中,
    在所述物理表中以行列并存的方式存储数据,通过一个写任务将同一数据对应的行数据和列数据写入同一个物理表中,以保证同一数据对应的行数据写入操作和列数据写入操作的原子性。
  11. 根据权利要求8所述的数据处理方法,还包括:
    响应于计算组不可用,在存在执行相同任务的其它计算组的情况下,将指向不可用计算组的任务请求转为指向执行相同任务的计算组,或者基于元数据存储器中不可用计算组的计算组配置信息,启用新的计算组,以执行相应任务请求;并且/或者
    响应于计算组内计算子组不可用,在计算组内创建新的计算子组和/或将指向不可用计算子组的任务请求切换到计算组内其它计算子组;并且/或者
    响应于计算组内计算节点不可用,将指向不可用计算节点的任务请求切换到计算组内其它计算节点,并在所述其它计算节点上重新构建所述不可用计算节点上原有的第一内存表和/或第二内存表。
  12. 一种计算设备,包括:
    处理器;以及
    存储器,其上存储有可执行代码,当所述可执行代码被所述处理器执行时,使所述处理器执行如权利要求8至11中任何一项所述的方法。
  13. 一种计算机程序产品,包括可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求8至11中任何一项所述的方法。
  14. 一种非暂时性机器可读存储介质,其上存储有可执行代码,当所述可执行代码被电子设备的处理器执行时,使所述处理器执行如权利要求8至11中任何一项所述的方法。
PCT/CN2023/132294 2022-11-30 2023-11-17 数据处理方法和数据处理系统 WO2024114409A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211513575.4 2022-11-30
CN202211513575.4A CN115544025B (zh) 2022-11-30 2022-11-30 数据处理方法和数据处理系统

Publications (1)

Publication Number Publication Date
WO2024114409A1 true WO2024114409A1 (zh) 2024-06-06

Family

ID=84721672

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/132294 WO2024114409A1 (zh) 2022-11-30 2023-11-17 数据处理方法和数据处理系统

Country Status (2)

Country Link
CN (1) CN115544025B (zh)
WO (1) WO2024114409A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115544025B (zh) * 2022-11-30 2023-03-24 阿里云计算有限公司 数据处理方法和数据处理系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795206A (zh) * 2018-08-02 2020-02-14 阿里巴巴集团控股有限公司 用于促进集群级缓存和内存空间的系统和方法
US20200104385A1 (en) * 2018-09-28 2020-04-02 International Business Machines Corporation Sharing container images utilizing a distributed file system
CN111327681A (zh) * 2020-01-21 2020-06-23 北京工业大学 一种基于Kubernetes的云计算数据平台构建方法
US20210124690A1 (en) * 2019-10-25 2021-04-29 Servicenow, Inc. Memory-efficient virtual document object model for structured data
CN115544025A (zh) * 2022-11-30 2022-12-30 阿里云计算有限公司 数据处理方法和数据处理系统

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102378151B (zh) * 2010-08-16 2014-06-04 深圳业拓讯通信科技有限公司 信息共享平台及方法
CN110213352B (zh) * 2019-05-17 2020-12-18 北京航空航天大学 名字空间统一的分散自治存储资源聚合方法
CN112579287A (zh) * 2020-12-16 2021-03-30 跬云(上海)信息科技有限公司 一种基于读写分离及自动伸缩的云编排系统及方法
CN114780252B (zh) * 2022-06-15 2022-11-18 阿里云计算有限公司 数据仓库系统的资源管理方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110795206A (zh) * 2018-08-02 2020-02-14 阿里巴巴集团控股有限公司 用于促进集群级缓存和内存空间的系统和方法
US20200104385A1 (en) * 2018-09-28 2020-04-02 International Business Machines Corporation Sharing container images utilizing a distributed file system
US20210124690A1 (en) * 2019-10-25 2021-04-29 Servicenow, Inc. Memory-efficient virtual document object model for structured data
CN111327681A (zh) * 2020-01-21 2020-06-23 北京工业大学 一种基于Kubernetes的云计算数据平台构建方法
CN115544025A (zh) * 2022-11-30 2022-12-30 阿里云计算有限公司 数据处理方法和数据处理系统

Also Published As

Publication number Publication date
CN115544025B (zh) 2023-03-24
CN115544025A (zh) 2022-12-30

Similar Documents

Publication Publication Date Title
Agrawal et al. Database scalability, elasticity, and autonomy in the cloud
US10747673B2 (en) System and method for facilitating cluster-level cache and memory space
US9996427B2 (en) Parallel backup for distributed database system environments
US11726984B2 (en) Data redistribution method and apparatus, and database cluster
US10157214B1 (en) Process for data migration between document stores
WO2021254135A1 (zh) 任务执行方法及存储设备
WO2024114409A1 (zh) 数据处理方法和数据处理系统
US11379492B2 (en) Internal resource provisioning in database systems
US20100325473A1 (en) Reducing recovery time for business organizations in case of disasters
US9323791B2 (en) Apparatus and method for expanding a shared-nothing system
US20160103845A1 (en) Enhanced Handling Of Intermediate Data Generated During Distributed, Parallel Processing
US11263236B2 (en) Real-time cross-system database replication for hybrid-cloud elastic scaling and high-performance data virtualization
CN103177059A (zh) 用于数据库计算引擎的分离处理路径
JP6412244B2 (ja) 負荷に基づく動的統合
WO2014169649A1 (zh) 一种数据处理方法、装置及计算机系统
US9836516B2 (en) Parallel scanners for log based replication
WO2019109854A1 (zh) 分布式数据库数据处理方法、装置、存储介质及电子装置
WO2020191930A1 (zh) 一种有效降低容器化关系型数据库i/o消耗的方法
TW201738781A (zh) 資料表連接方法及裝置
US11625503B2 (en) Data integrity procedure
JP2019191951A (ja) 情報処理システム及びボリューム割当て方法
EP4057160A1 (en) Data reading and writing method and device for database
CN117677943A (zh) 用于混合数据处理的数据一致性机制
CN104410531A (zh) 冗余的系统架构方法
US20240037104A1 (en) A system and method for hierarchical database operation accelerator