WO2023077451A1 - 一种基于列存数据库的流式数据处理方法及系统 - Google Patents

一种基于列存数据库的流式数据处理方法及系统 Download PDF

Info

Publication number
WO2023077451A1
WO2023077451A1 PCT/CN2021/129076 CN2021129076W WO2023077451A1 WO 2023077451 A1 WO2023077451 A1 WO 2023077451A1 CN 2021129076 W CN2021129076 W CN 2021129076W WO 2023077451 A1 WO2023077451 A1 WO 2023077451A1
Authority
WO
WIPO (PCT)
Prior art keywords
window
data
time
processing
batch
Prior art date
Application number
PCT/CN2021/129076
Other languages
English (en)
French (fr)
Inventor
程学旗
郭嘉丰
李冰
邱强
张志斌
Original Assignee
中国科学院计算技术研究所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院计算技术研究所 filed Critical 中国科学院计算技术研究所
Priority to PCT/CN2021/129076 priority Critical patent/WO2023077451A1/zh
Publication of WO2023077451A1 publication Critical patent/WO2023077451A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Definitions

  • the invention belongs to the field of distributed computing, is specifically applied in the direction of distributed streaming data computing, and particularly relates to a method and system for processing streaming data based on column storage databases.
  • Streaming data computing engines are gradually emerging and penetrating into various industries.
  • cloud service providers provide streaming data computing engines, which can be used in scenarios such as data aggregation, data association, data monitoring, and data analysis.
  • the current mainstream streaming data computing engines are represented by systems such as Apache Flink, Apache Spark Streaming, and Storm. They use directed acyclic graphs to represent user jobs, and their programming models are more flexible than MapReduce.
  • the contemporary streaming data computing engine implements data aggregation in the time dimension through window technology, and supports out-of-order message processing through event messages.
  • the streaming data computing engine uses window technology to aggregate data in the time dimension.
  • Common windows include rolling and sliding windows.
  • a rolling window is also called a fixed time window, which aggregates data at fixed time intervals, such as summarizing data every day;
  • a sliding window is also called a jumping window, which defines a window with a fixed size and slides at a fixed time interval, such as available
  • the statistics table of the last week is generated every day.
  • the sliding time window degenerates into a rolling time window.
  • the sliding interval is smaller than the size of the time window, the sliding windows overlap. At this time, a record may belong to multiple different windows.
  • Streaming data computing engines process data in the time dimension, and usually support two types of time semantics, processing time and event time.
  • the processing time is the time when the message enters the computing engine, and the data is bound with increasing timestamps according to the order in which it enters the system. Since the processing time semantics uses the machine physical time, the window can be triggered according to the machine time. Data processing in this mode is relatively simple.
  • the event time refers to the time when the data actually occurred. However, after the data is generated, it may arrive at the server out of order due to network instability and other reasons, or it may not be able to reach the server due to network failure. Therefore, in the event time processing mode, the system cannot use machine time to judge whether all the data in the window is complete.
  • the water level is a flag estimated by the system using a specific algorithm, which is used to mark that all the data in a certain window has arrived at that moment.
  • a heuristic algorithm is used to obtain the data. out.
  • the system cannot predict the exact position of the water level, so there may still be late data arriving after the water level, and this part of the data is processed separately. Since data may be delayed for hours or even days, and data discarding is unacceptable in some fields such as finance, how to cache a large amount of window data in late data processing scenarios is a major challenge for streaming data computing systems.
  • the storage and computing modes of the streaming computing engine are divided into row-based and column-based storage.
  • the row-based mode refers to the system storing data and performing calculations in units of data tables, as shown in Figure 1.
  • the row-based storage mode is a very intuitive storage mode, and its storage mode is similar to the table storage mode that humans are used to. Its advantage is that each data attribute of the same record can be efficiently manipulated, and it is friendly to transaction operations.
  • the row storage mode needs to read all the data of each record row by row when reading data, if a query only needs to use some attributes in the data record, this mode will cause irrelevant read and write overhead. When the data record This overhead can severely impact system performance when there are very many attributes.
  • the mainstream streaming data computing engines such as Apache Flink and Apache Spark Streaming, use the row storage mode, which brings lower latency to the system in scenarios such as data cleaning, filtering, and conversion.
  • the column storage mode means that the system maintains data records and performs calculations according to the columns of the data table.
  • Each column of the data table represents an attribute of the data record, and all data records are sorted by attributes and stored in memory, as shown in Figure 2. Its storage mode is not as intuitive as the row storage mode.
  • the column-based storage mode was born to improve the performance of data analysis scenarios. Since each attribute of each data record is stored discontinuously, the operation speed of a single data record is slower than that of the row storage mode, and it is not friendly to transaction operations. However, because the column storage mode can only retrieve the specified data attributes without reading all the data, it can greatly reduce the data read and write overhead in scenarios that require data filtering, and is friendly to memory in data aggregation scenarios.
  • the data analysis scenario has its unique advantages. Column storage-based storage mode is widely used in data analysis engines, such as HBase, ClickHouse, etc.
  • the performance of data analysis scenarios in line storage mode is low.
  • the mainstream streaming data computing engine is designed and optimized for log data processing, and adopts line storage and computing mode to realize real-time message processing.
  • the performance of the row-storage computing mode is low in data analysis scenarios.
  • Studies have shown that the throughput of mainstream streaming data computing engines may be 500 times or more lower than that of column-storage data analysis engines such as SQL Server and Shark. Since the column storage engine can use hardware resources more efficiently in scenarios such as data sorting or aggregation, it has unique advantages in big data analysis scenarios.
  • due to the lack of support for incremental computing models such as mainstream databases streaming data computing cannot be supported.
  • the purpose of the present invention is to improve the computing efficiency of a streaming data computing system in a data analysis scenario, and propose a streaming data computing method and system using column storage and a computing engine.
  • the present invention proposes a streaming data processing method based on column-stored data, which includes:
  • Step 1 Obtain the column-stored streaming data to be processed and its corresponding processing tasks, divide the streaming data into batch data blocks based on the time dimension, and assign each piece of data in the batch data block according to the preset window mode Assign window number;
  • Step 2 Divide the batch data block into multiple intermediate data blocks, each intermediate data block only contains data with the same window serial number, perform pre-aggregation calculation on the data of each intermediate data block, and generate a pre-aggregated intermediate state;
  • Step 3 According to the preset streaming data time processing mode, extract the pre-aggregated intermediate state of the corresponding window number from the internal storage and execute the corresponding processing task, and output the task execution result as the streaming data processing result.
  • step 2 includes: directly discarding window expired data or discarding after window expires for a specified time when performing the pre-aggregation process.
  • the stream data processing method based on column storage data, wherein the stream data time processing mode in step 3 is processing time or event time processing mode;
  • the processing time processing mode use the computer machine time to execute the processing task to set the trigger, so that when the machine time reaches the end time of the window, the window processing command is called, and the pre-aggregation intermediate state of the window corresponding to the end time of the window is selected and executed correspondingly processing tasks;
  • step 1 includes:
  • the window mode is a rolling window
  • the sum of the window start time and the window size of the data in the batch data block is used as the window end time, and the window sequence number is based on the window end time
  • the window mode is a sliding window
  • the stream data processing method based on column storage data, wherein the stream data is physiological data, image data or log text data collected by sensors in real time; the processing task corresponding to the stream data is database statistics task.
  • the present invention also proposes a streaming data processing system based on column storage data, which includes:
  • Module 1 is used to obtain the column-stored stream data to be processed and its corresponding processing tasks, divide the stream data into batch data blocks based on the time dimension, and create batch data blocks for each batch data block according to the preset window mode.
  • each intermediate data block only contains data with the same window serial number, performs pre-aggregation calculation on the data of each intermediate data block, and generates a pre-aggregated intermediate state;
  • Module 3 is used to extract the pre-aggregated intermediate state of the window number corresponding to the window from the internal storage according to the preset streaming data time processing mode, execute the corresponding processing task, and output the task execution result as the streaming data processing result.
  • module 2 is used to directly discard the window expired data or discard the window expired after a specified time when performing the pre-aggregation process.
  • the stream data processing system based on column storage data, wherein the stream data time processing mode in module 3 is processing time or event time processing mode;
  • the processing time processing mode use the computer machine time to execute the processing task to set the trigger, so that when the machine time reaches the end time of the window, the window processing command is called, and the pre-aggregation intermediate state of the window corresponding to the end time of the window is selected and executed correspondingly processing tasks;
  • the stream data processing system based on column storage data, wherein the module 1 is used for
  • the window mode is a rolling window
  • the sum of the window start time and the window size of the data in the batch data block is used as the window end time, and the window sequence number is based on the window end time
  • the window mode is a sliding window
  • the stream data processing system based on column storage data, wherein the stream data is physiological data, image data or log text data collected by sensors in real time; the processing tasks corresponding to the stream data are database statistical tasks.
  • the present invention has the advantages of:
  • the invention proposes a streaming data computing system using a column storage engine. Compared with existing technologies, the system improves the throughput of data analysis scenarios while maintaining low latency by using columnar storage and computing engines, combined with pre-aggregation technology.
  • the throughput of the system in the Yahoo streaming data computing benchmark test is 14.8 times that of Apache Flink, a well-known system in the industry. In a typical data analysis scenario using the New York taxi dataset, the throughput exceeds Flink and Apache Spark Streaming by more than 2,700 times.
  • FIG. 1 is a schematic diagram of a row storage mode
  • Figure 2 is a schematic diagram of column storage storage mode
  • Figure 3 is a diagram of the system usage mode
  • FIG. 4 is a schematic diagram of a streaming data processing flow
  • Figure 5 is a syntactic diagram for creating WindowView
  • Figure 6 is an example diagram of the use of the water level line
  • Figure 7 is an example diagram of the use of the late strategy
  • Figure 8 is a definition diagram of the TUMBLE function
  • Figure 9 is an example diagram of the use of the TUMBLE function
  • Figure 10 is a HOP function definition diagram
  • Figure 11 is an example diagram of the use of the HOP function.
  • the inventor proposed a streaming data computing system based on the column storage engine, which reduces the processing delay of the column storage engine through window segmentation, window ID compression, and window computing state pre-aggregation Engine optimization technology implements expired window persistence to support that expired data will never be discarded.
  • the streaming data computing system using the column storage computing engine uses the column storage computing engine; technical effect: the system divides the streaming data into batch data blocks in the time dimension, and uses data blocks instead of single data as the data computing unit, making full use of Column storage and computing technologies accelerate aggregation operations;
  • window pre-aggregation technology technical effect: pre-aggregate computing tasks into computing intermediate states, reduce the amount of computing when the window is triggered, and reduce computing delays;
  • This system realizes the streaming data processing under the structured query semantic SQL through the view mode
  • the system of the present invention converts the relational source data table into streaming data by defining the WindowView view table, and after processing in the streaming form in the WindowView, Output the processing results to the target table, as shown in Figure 3.
  • WindowView will monitor the source data table and automatically read the newly inserted data when data is inserted.
  • the source data table can be any table in the system, such as ordinary relational data table, and some special tables such as distributed table, Kafka table, file table, and Null table, etc. Among them, distributed computing can be realized through distributed tables, and data can be directly inserted into WindowView through Null tables to realize non-displacement processing of streaming data.
  • Figure 4 shows the WindowView streaming data processing flow.
  • Process 1 Create a WindowView table using SQL statements.
  • the syntax for creating a WindowView is similar to creating a database view table, as shown in Figure 5. See Table 1 for keyword descriptions.
  • the system supports the following water level mechanism, and its usage example is shown in Figure 6:
  • STRICTLY_ASCENDING The water level is submitted according to the maximum time observed by the system, and the data time is less than the maximum observation time, so it is not considered late.
  • the maximum time is the "latest time” of all logs observed by the system. If the system observes that the log sequence is: 1, 5, 3, 4, then the "maximum time” is 5.
  • the use of "maximum time” here instead of “latest time” is because time is expressed in the form of "time stamp" in the system. The larger the number, the newer the time.
  • ASCENDING The water level is submitted according to the maximum observed time of the system minus 1. If the data time is not greater than the maximum observed time, it is not considered late.
  • BOUNDED Submit the watermark at the maximum time observed by the system minus the fixed time interval.
  • the system uses Window Function (window function) to assign a window number to the data set.
  • Window function window function
  • the window number is a unique identifier used to identify a window.
  • the system supports TUMBLE (scrolling) and HOP (sliding) window functions.
  • the TUMBLE window function defines a window that rolls at fixed time intervals on the time dimension, and its definition is shown in Figure 8.
  • the parameter time_attr is the timestamp contained in the data, and the function now() can also be used to specify the data time as the current system time; the parameter interval is used to specify the window size; the parameter timezone is an optional parameter, which is used to specify a time zone different from the system , which defaults to the system time zone.
  • Figure 9 is an example of the use of the TUMBLE function, which defines a tumbling time window of size one day.
  • the HOP window function defines a window with a fixed size that slides on the time dimension, and its definition is shown in Figure 10.
  • the parameter time_attr is the timestamp contained in the data, and the function now() can also be used to specify the data time as the current system time;
  • the parameter hop_interval is the window sliding interval;
  • the parameter window_interval is the window size, when the window size is greater than the sliding interval, the sliding window exists Overlap, when the window size is equal to the sliding interval, the window degenerates into a rolling window. When the window size is smaller than the sliding interval, the window becomes discontinuous.
  • Figure 11 is an example of the use of the HOP function, which defines a time window with a window size of three days and a sliding interval of one day, which can be used to count the data of the last three days every day.
  • Process 2 During streaming data processing, the newly arrived data can be appended to the system source data table by the user application. Data sources such as kafka can also be automatically monitored by the system, and new data will be automatically inserted into the source data table when it arrives.
  • Data sources such as kafka can also be automatically monitored by the system, and new data will be automatically inserted into the source data table when it arrives.
  • Process 3 WindowView automatically monitors the update of the source data table, and the newly inserted data is automatically pushed to WindowView when the source data table is updated.
  • Process 4 In order to give full play to the advantages of the column storage engine, the data will be temporarily cached after being inserted into WindowView. After a certain amount of data has been accumulated, WindowView will package the accumulated data into data blocks and process them in units of data blocks.
  • the data block packaging strategy can be configured to trigger a packaging operation according to the number of data entries, the size of the data volume, and the time interval.
  • Process five If the user computing task includes window aggregation operations, filter the window expired data in the data block.
  • the system supports discarding expired data directly, or discarding after the window expires for a period of time, and the length of time can be specified in the WindowView creation statement.
  • Process 6 Calculate and assign a window number for each piece of data in the data block, the steps are as follows, where the timestamp is the processing time or event time of the data record:
  • Process 6.1 If the window is a rolling window, get the window start time.
  • the window start time can be calculated using, for example, the method in Table 2 below.
  • Procedure 6.2 Use the start time + window size obtained in procedure 6.1 as the window end time.
  • Process 6.3 assign the window end time obtained in process 6.2 as the window serial number.
  • Process 6.4 If the window is a sliding window, the calculation method in Table 2 below can be used to calculate the window start time with the sliding interval as the window size.
  • Process 6.5 Use the window start time + sliding interval obtained in process 6.4 as the window end time
  • Process 6.6 Due to the overlapping of sliding windows, in order to avoid double calculation caused by overlapping windows, when dividing the sliding window, the window is divided into continuous non-overlapping small windows.
  • Procedure 6.7 Compute the greatest common factor of the window size and sliding interval as the non-overlapping small window size described in Procedure 6.6
  • Process 6.8 Use the window end time obtained in process 6.5 as the window start time, and the greatest common factor obtained in process 6.6 as the window size to set a temporary window, and slide the temporary window in the direction of time reduction until the first window is found, its window The end time is less than the data timestamp.
  • the purpose of this step is to find the first window containing the timestamp of the target data, but since the window cannot be directly obtained by numerical calculation, the first window whose end time is less than the target timestamp can only be found through the sliding window, and then the time Swipe one unit to increase direction.
  • Process 6.9 Use the window end time obtained in process 6.8 + the greatest common factor obtained in process 6.6 as the window sequence number.
  • Process 7 Divide the data block into multiple intermediate data blocks based on the window serial number allocated in process 6, and each intermediate data block only contains data with the same window serial number. Then pre-aggregation calculation is performed on the data of each intermediate data block to generate a pre-aggregation intermediate state.
  • the system When the system pre-aggregates data blocks, it only reads the column data required for the aggregation operation through the column storage technology, reducing disk read time. For example, if you need to count the total number of users whose age is older than 30 in each window, first read the age column, filter out users who are younger than or equal to 30 years old, and then read the window serial number column, aggregate and sum according to the window serial number, and the entire operation does not need to be read Other column information in the data table to reduce disk overhead. And this process is more friendly to the CPU cache due to the more compact data volume, which can speed up the calculation process.
  • the pre-aggregation technology can be, for example, that the calculation task is the sum of numbers.
  • a data stream arrives at 4 numbers successively, namely 1, 2, 3, and 4.
  • the system performs a calculation in advance when each number arrives.
  • the intermediate states of each pre-aggregation are 1, 1; 2, 3; 3, 6; 4, 10.
  • the system triggers the final calculation, it directly reads the latest, which is the fourth pre-aggregation intermediate state, and 10 is the final calculation result.
  • Process 8 Write the pre-aggregated intermediate state to the internal storage engine.
  • Process 9 In streaming data processing, data arrives continuously, so it is necessary to use background tasks to perform multiple merge operations from time to time.
  • the system uses background tasks to automatically pre-aggregate data blocks with the same window number in the storage engine when the calculation is idle, and merge multiple data blocks into a single data block.
  • Process 10 Processing time
  • the system uses the computer machine time to set the trigger, and when the machine time reaches the end time of the window, it invokes the window processing command to calculate the data of the corresponding window at that moment.
  • the system uses the water level mechanism to set the trigger, takes the maximum time of all messages currently observed as the water level, and calls the corresponding window processing command when the water level meets the trigger condition.
  • the specific execution steps of the window processing command are as follows:
  • Process 10.1 Extract the pre-aggregated intermediate state of the window number corresponding to the window from the internal storage, each rolling window corresponds to a window number, and the sliding window corresponds to one or more window numbers due to the use of window segmentation.
  • Process 10.2 If the pre-aggregation intermediate state extracted in process 10.1 is multiple data blocks, perform pre-aggregation calculation and merge them into a single data block.
  • Process 10.3 Calculate the pre-aggregated intermediate state of a single data block as the final calculation result through the final calculation operation.
  • Process 11 If the TO keyword is specified when WindowView is created, the final calculation result is output to the target table.
  • Process twelve If the client uses the WATCH keyword to monitor WindowView, then output the final calculation result to the client terminal.
  • Process thirteen Repeat process three to process twelve when new data arrives.
  • Process 14 The system uses background tasks to regularly clean up expired window data and release storage space according to the late data processing strategy.
  • this system divides all processing tasks (calculation operations) into two steps: calculation to the pre-aggregation intermediate state, and merging of the pre-aggregation intermediate state to generate the final calculation result.
  • Calculation operations can be common database operations such as summation, averaging, statistics, and classification. Take the sum operation of 100 pieces of data as an example, assuming that the machine has 10 computing threads. The system allocates 10 pieces of data to each calculation thread. Step 1: Each calculation thread counts the 10 pieces of data allocated, where the summation value of the 10 pieces of data is the pre-aggregation intermediate state; Step 2: Combine the 10 summation values generated by the 10 threads to generate " "Final Calculation Status", which is the sum of 100 data.
  • the present invention also proposes a streaming data processing system based on column storage data, which includes:
  • Module 1 is used to obtain the column-stored stream data to be processed and its corresponding processing tasks, divide the stream data into batch data blocks based on the time dimension, and create batch data blocks for each batch data block according to the preset window mode.
  • each intermediate data block only contains data with the same window serial number, performs pre-aggregation calculation on the data of each intermediate data block, and generates a pre-aggregated intermediate state;
  • Module 3 is used to extract the pre-aggregated intermediate state of the window number corresponding to the window from the internal storage according to the preset streaming data time processing mode, execute the corresponding processing task, and output the task execution result as the streaming data processing result.
  • module 2 is used to directly discard the window expired data or discard the window expired after a specified time when performing the pre-aggregation process.
  • the stream data processing system based on column storage data, wherein the stream data time processing mode in module 3 is processing time or event time processing mode;
  • the processing time processing mode use the computer machine time to execute the processing task to set the trigger, so that when the machine time reaches the end time of the window, the window processing command is called, and the pre-aggregation intermediate state of the window corresponding to the end time of the window is selected and executed correspondingly processing tasks;
  • the stream data processing system based on column storage data, wherein the module 1 is used for
  • the window mode is a rolling window
  • the sum of the window start time and the window size of the data in the batch data block is used as the window end time, and the window sequence number is based on the window end time
  • the window mode is a sliding window
  • the stream data processing system based on column storage data, wherein the stream data is physiological data, image data or log text data collected by sensors in real time; the processing tasks corresponding to the stream data are database statistical tasks.
  • the present invention proposes a stream data processing method and system based on columnar data, including: obtaining columnar stream data to be processed and corresponding processing tasks, and dividing the stream data into batch data based on the time dimension block, according to the preset window mode, assign a window serial number to each piece of data in the batch data block; divide the batch data block into multiple intermediate data blocks, and each intermediate data block only contains data with the same window serial number.
  • the data of each intermediate data block is pre-aggregated and calculated to generate a pre-aggregated intermediate state; according to the preset streaming data time processing mode, the pre-aggregated intermediate state corresponding to the window number is extracted from the internal storage and the corresponding processing task is executed. Output task execution results as stream data processing results.
  • the present invention improves the throughput of the data analysis scene by using the column storage and computing engine, combined with the pre-aggregation technology, on the premise of maintaining a low delay

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种基于列存数据的流式数据处理方法和系统,包括:获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;根据预设的流式数据时间处理模式,从内部存储提取窗口所对应的窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作为流式数据处理结果。上述方法通过使用列存存储及计算引擎,结合预聚合技术,在保持较低延迟的前提下,提升数据分析场景的吞吐量。

Description

一种基于列存数据库的流式数据处理方法及系统 技术领域
本发明属于分布式计算领域,具体应用在分布式流式数据计算方向,并特别涉及一种基于列存数据库的流式数据处理方法及系统。
背景技术
流式数据计算引擎正在逐渐兴起,并渗透至各个行业。目前,几乎所有的云服务商都提供流式数据计算引擎,可用于数据聚合、数据关联、数据监控、以及数据分析等场景。当前主流的流式数据计算引擎以Apache Flink、Apache Spark Streaming、及Storm等系统为代表,使用有向无环图表示用户作业,其编程模型较MapReduce更加灵活。当代流式数据计算引擎通过窗口技术实现时间维度的数据聚合,并通过事件消息实现乱序消息处理支持。
窗口技术简介:
流式数据计算引擎使用窗口技术对数据进行时间维度聚合操作,常见的窗口包括滚动和滑动窗口。滚动窗口也称为固定时间窗口,以固定的时间间隔聚合数据,比如每天对数据进行汇总处理;滑动窗口也叫跳跃窗口,其定义一个具有固定大小,并以固定时间间隔滑动的窗口,比如可用于每天生成最近一周的统计数据表。当时间窗口大小和滑动间隔相等时,滑动时间窗口退化为滚动时间窗口,当滑动间隔小于时间窗口大小时,滑动窗口出现重叠,此时一条记录可能属于多个不同窗口。
时间语义简介:
流式数据计算引擎以时间维度对数据进行处理,通常支持处理时间和事件时间两类时间语义。处理时间是消息进入计算引擎的时间,数据按照进入系统的顺序绑定递增的时间戳,由于处理时间语义下使用机器物理时间,因此窗口按照机器时间触发即可,该模式下数据处理较为简单。事件时间是指数据真实发生的时间,但数据产生后可能由于网络不稳定等原因导致乱序到达服务器,也可能由于网络故障导致无法到达服务器。因此,事件时间处理模式下,系统无法使用机器时间判断窗口的数据是否全部到齐。当前主流的一种做法是使用 水位线机制判断数据是否到齐,水位线是系统使用特定算法估算出的一个标志位,用于标记该时刻某窗口数据已全部到齐,通常使用启发式算法得出。但由于数据是未知的,系统无法预测准确的水位线位置,因此水位线之后仍可能有迟到数据到达,这部分数据被单独处理。由于数据可能延迟几小时甚至几天才能到达,而在金融等一些领域又无法接受数据丢弃,因此迟到数据处理场景下如何缓存大量窗口数据是流式数据计算系统的一大挑战。
存储和计算模式简介:
流式计算引擎的存储和计算模式分为基于行存和基于列存两类,行存模式是指系统以数据表的行为单位存储数据及进行计算,如图1所示。基于行存的存储模式是一种非常直观的存储模式,其存储模式与人类所习惯的表格存储模式类似。其优点是可以高效的操作同一记录的各个数据属性,并且对事务操作友好。但由于行存存储模式在读取数据时需要按行读取每个记录的全部数据,如果一个查询只需要使用数据记录中的部分属性,则此模式会造成无关的读写开销,当数据记录属性特别多时此开销可能严重影响系统性能。此外,在需要对整个数据集按照某属性进行数据聚合的场景,由于行存模式需要读取数据记录全部的数据,对内存不友好,造成性能较差。主流的流式数据计算引擎如Apache Flink、Apache Spark Streaming等,使用行存存储模式,该模式在数据清洗、过滤、转换等场景下给系统带来较低的延迟。
列存模式是指系统按照数据表的列来维护数据记录并进行计算,数据表每一列代表数据记录的一个属性,并将所有数据记录按属性排序存储在内存中,如图2所示。其存储模式没有行存模式直观。基于列存的存储模式是为提升数据分析场景性能而诞生的。由于其每条数据记录各属性存储不连续,因此对单个数据记录的操作速度要慢于行存存储模式,且对事务操作不友好。但由于列存模式可以只检索指定的数据属性而无需读取全部数据,因此在需要数据过滤的场景可极大减少数据读写开销,同时在数据聚合场景对内存友好,故列存存储模式在数据分析场景下有其独到优势。基于列存的存储模式广泛应用于数据分析引擎中,如HBase,ClickHouse等。
综上现有技术存在以下问题和缺点:
(1)行存模式数据分析场景性能低。主流流式数据计算引擎针对日志数据处理进行设计及优化,采用行存存储及计算模式,实现消息实时处理。但行 存计算模式在数据分析场景下性能较低,研究表明,主流流式数据计算引擎吞吐量相较如SQL Server、Shark等列存数据分析引擎可能低于500倍甚至更多。由于列存引擎在数据排序或聚合等场景可以更高效的利用硬件资源,其在大数据分析场合存在独有优势。但是,由于主流数据库等缺乏增量计算模型支持,因此无法支持流式数据计算。
(2)多系统使用困难,及数据拷贝等开销造成性能损失。许多分析型任务,如实时推荐、在线机器学习、或流式图计算处理等任务具有复杂的计算模式,通常需要从多个不同系统中进行聚合计算,如聚合流式数据计算引擎、数据库、以及内容缓存系统中的数据。例如,广告分析系统使用关系型数据库中的广告客户及用户数据,并在流式数据处理任务中使用这些数据。同样,在线机器学习或图计算任务中,也可能会访问数据库以获得训练数据等信息。多系统的使用增加了用户的学习成本,同时也使系统逻辑变得复杂,难以维护,此外,由于数据需要在多个不同系统之间流转,带来数据拷贝、及序列化和反序列化开销。主流流式数据计算系统不支持数据库存储,因此需要搭配数据库系统才能完成上述用户业务,同时往往还需引入消息队列以实现流式数据计算系统和数据库系统通信。
发明公开
本发明的目的是提高流式数据计算系统在数据分析场景下的计算效率,提出了一种使用列存存储及计算引擎的流式数据计算方法和系统。
针对现有技术的不足,本发明提出一种基于列存数据的流式数据处理方法,其中包括:
步骤1、获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;
步骤2、将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;
步骤3、根据预设的流式数据时间处理模式,从内部存储提取窗口对应窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作 为流式数据处理结果。
所述的基于列存数据的流式数据处理方法,其中步骤2包括:执行该预聚合处理时对窗口过期数据直接丢弃或窗口过期指定时间后丢弃。
所述的基于列存数据的流式数据处理方法,其中步骤3中该流式数据时间处理模式为处理时间或事件时间处理模式;
处理时间处理模式下,使用执行处理任务的计算机机器时间设置触发器,以在机器时间到达窗口结束时间时,调用窗口处理命令,选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务;
事件时间处理模式下,使用水位线机制设置触发器,以将所有流式数据的最大时间作为水位线,在水位线满足触发条件时选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务。
所述的基于列存数据的流式数据处理方法,其中该步骤1包括:
当该窗口模式为滚动窗口时,以该批式数据块中数据所在窗口开始时间和窗口大小之和作为窗口结束时间,根据该窗口结束时间为该窗口序号;
当该窗口模式为滑动窗口时,根据滑动间隔,计算该批式数据块中数据所在窗口的开始时间,并根据其与窗口滑动间隔之和作为窗口结束时间;
以窗口大小和窗口滑动间隔的最大公因数为临时子窗口大小,以该窗口结束时间为临时子窗口开始时间,设置临时窗口,并将该临时窗口向时间减少方向滑动,直到找到包含该批式数据块中数据的最小序号窗口,以其结束时间为窗口序号。
所述的基于列存数据的流式数据处理方法,其中该流式数据为传感器实时采集的生理数据、图像数据或日志文本数据;流式数据对应的处理任务为数据库统计任务。
本发明还提出了一种基于列存数据的流式数据处理系统,其中包括:
模块1,用于获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;
模块2,用于将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;
模块3,用于根据预设的流式数据时间处理模式,从内部存储提取窗口对应窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作为流式数据处理结果。
所述的基于列存数据的流式数据处理系统,其中模块2用于执行该预聚合处理时对窗口过期数据直接丢弃或窗口过期指定时间后丢弃。
所述的基于列存数据的流式数据处理系统,其中模块3中该流式数据时间处理模式为处理时间或事件时间处理模式;
处理时间处理模式下,使用执行处理任务的计算机机器时间设置触发器,以在机器时间到达窗口结束时间时,调用窗口处理命令,选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务;
事件时间处理模式下,使用水位线机制设置触发器,以将所有流式数据的最大时间作为水位线,在水位线满足触发条件时选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务。
所述的基于列存数据的流式数据处理系统,其中该模块1用于
当该窗口模式为滚动窗口时,以该批式数据块中数据所在窗口开始时间和窗口大小之和作为窗口结束时间,根据该窗口结束时间为该窗口序号;
当该窗口模式为滑动窗口时,根据滑动间隔,计算该批式数据块中数据所在窗口的开始时间,并根据其与窗口滑动间隔之和作为窗口结束时间;
以窗口大小和窗口滑动间隔的最大公因数为临时子窗口大小,以该窗口结束时间为临时子窗口开始时间,设置临时窗口,并将该临时窗口向时间减少方向滑动,直到找到包含该批式数据块中数据的最小序号窗口,以其结束时间为窗口序号。
所述的基于列存数据的流式数据处理系统,其中该流式数据为传感器实时采集的生理数据、图像数据或日志文本数据;流式数据对应的处理任务为数据库统计任务。
由以上方案可知,本发明的优点在于:
该发明提出了一个使用列存引擎的流式数据计算系统。与现有技术相比,该系统通过使用列存存储及计算引擎,结合预聚合技术,在保持较低延迟的前提下,提升数据分析场景的吞吐量。该系统在雅虎流式数据计算基准测试中吞吐量达到业界知名系统Apache Flink的14.8倍,在使用纽约出租车数据集的 一个典型数据分析场景下,吞吐量超过Flink和Apache Spark Streaming 2700多倍。
附图简要说明
图1为行存存储模式示意图;
图2为列存存储模式示意图;
图3为系统使用模式图;
图4为流式数据处理流程示意图;
图5为WindowView创建语法示意图;
图6为水位线使用示例图;
图7为迟到策略使用示例图;
图8为TUMBLE函数定义图;
图9为TUMBLE函数使用示例图;
图10为HOP函数定义图;
图11为HOP函数使用示例图。
实现本发明的最佳方式
许多用户存在数据分析场景流式数据计算任务吞吐量明显低于传统数据库计算任务吞吐量的问题。发明人在进行流式计算引擎研究时,发现现有技术中的该项缺陷是由流式计算引擎所使用的行存存储及处理引擎所导致,行存引擎以单个数据记录为单位进行计算,难以获取数据间的关联关系进行聚合计算加速。主流流式数据计算引擎不采用列存引擎的原因是行存模式以单个数据为单位进行处理,处理延迟较低,采用列存模式会增加处理延迟。发明人经过对现有技术的研究,提出了基于列存引擎的流式数据计算系统,通过窗口切分、窗口ID压缩、窗口计算状态预聚合等技术降低列存引擎的处理延迟,并通过存储引擎优化技术实现过期窗口持久化,以支持过期数据永不丢弃。
具体来说本申请涉及以下关键技术点:
关键点1,使用列存计算引擎的流式数据计算系统;技术效果:系统在时间维度上将流式数据切分为批式数据块,以数据块而不是单条数据作为数据计 算单位,充分利用列存存储及计算技术加速聚合操作;
关键点2,窗口预聚合技术;技术效果:将计算任务预聚合为计算中间状态,减少窗口触发时的计算量,降低计算延迟;
关键点3,滑动窗口切分及计算状态复用技术;技术效果:将重叠的滑动窗口切分为不重叠的连续窗口,并对切分后的窗口进行预聚合计算,窗口触发时复用预聚合计算状态,减少滑动窗口重复计算开销,降低计算延迟。
为让本发明的上述特征和效果能阐述的更明确易懂,下文特举实施例,并配合说明书附图作详细说明如下。
本系统通过视图的方式实现结构化查询语义SQL下的流式数据处理,本发明系统通过定义WindowView视图表,将关系型源数据表转换为流式数据,在WindowView中以流式形式处理后,将处理结果输出至目标表,如图3所示。与传统数据库视图类似,WindowView会监控源数据表,数据插入时可自动读取新插入的数据,源数据表可以是系统内的任意表,如普通关系型数据表、以及一些特殊表如分布式表、Kafka表、文件表、以及Null表等。其中,可通过分布式表实现分布式计算,并可通过Null表将数据直接插入WindowView,实现流式数据非落盘处理。图4展示了WindowView流式数据处理流程。
过程一:使用SQL语句创建WindowView表,创建WindowView的语法和创建数据库视图表相似,如图5所示,关键字说明见表1。
表1 WindowView关键字说明:
Figure PCTCN2021129076-appb-000001
Figure PCTCN2021129076-appb-000002
系统支持如下水位线机制,其使用示例如图6所示:
STRICTLY_ASCENDING:按照系统观测到的最大时间提交水位线,数据时间小于最大观察时间则不算迟到。其中最大时间即为系统观测到的所有日志的“最新时间”。若系统观测到日志序列为:1,5,3,4.则“最大时间”为5。此处使用“最大时间”,而不是“最新时间”,是考虑到时间在系统中是以“时间戳”的形式表示,数字越大,时间越新。
ASCENDING:按照系统观测到的最大时间减1提交水位线,数据时间不大于最大观察时间则不算迟到。
BOUNDED:按照系统观测到的最大时间减去固定时间间隔提交水位线。
系统使用Window Function(窗口函数)为数据集分配窗口序号,窗口序号是用于标识窗口的唯一标识符,系统支持TUMBLE(滚动)和HOP(滑动)窗口函数。
TUMBLE窗口函数定义了一个在时间维度上以固定时间间隔滚动的窗口,其定义如图8所示。参数time_attr是数据所包含的时间戳,也可使用函数now()将数据时间指定为系统当前时间;参数interval用来指定窗口大小;参数timezone是可选参数,用于指定与系统不同的时间区域,默认为系统时间区域。图9是TUMBLE函数的一个使用示例,其定义了大小为一天的滚动时间窗口。
HOP窗口函数定义了一个具有固定大小,并在时间维度上滑动的窗口,其定义如图10所示。参数time_attr是数据所包含的时间戳,也可以使用函数now()将数据时间指定为系统当前时间;参数hop_interval是窗口滑动间隔;参数window_interval是窗口大小,当窗口大小大于滑动间隔时,滑动窗口存在重叠,当窗口大小等于滑动间隔时,窗口退化为滚动窗口,当窗口大小小于滑动间隔时,窗口变的不连续,由于系统不支持不连续窗口,因此窗口大小不能小于滑动间隔;参数timezone是可选参数,用于指定与系统不同的时间区域,默认为系统时间区域。图11是HOP函数的一个使用示例,其定义了窗口大小为三天,滑动间隔为一天的时间窗口,可用于每天统计最近三天的数据。
过程二:流式数据处理时,新到达的数据可由用户应用追加到系统源数据表。也可由系统自动监控kafka等数据源,新数据到达时自动插入源数据表。
过程三:WindowView自动监控源数据表更新,源数据表更新时新插入的数据自动推送至WindowView。
过程四:为了充分发挥列存引擎优势,数据插入WindowView后会进行短暂缓存,积攒够一定数量数据后,WindowView将积攒的数据打包为数据块,以数据块为单位进行处理。数据块打包策略可配置为根据数据条目数量、数据量大小、以及时间间隔触发打包操作。
过程五:如果用户计算任务包含窗口聚合操作,则对数据块中窗口过期数据进行过滤。系统支持过期数据直接丢弃,或窗口过期一段时间后丢弃,该时间长短可在WindowView创建语句中指定。
过程六:对数据块中每条数据计算并分配窗口序号,其步骤如下,其中时间戳为数据记录的处理时间或事件时间:
过程6.1:如果窗口为滚动窗口,获取窗口开始时间。可使用例如下表2的方法计算得到窗口开始时间。
过程6.2:使用过程6.1所得的开始时间+窗口大小作为窗口结束时间。
过程6.3:将过程6.2所得的窗口结束时间分配为窗口序号。
过程6.4:如果窗口为滑动窗口,可使用下表2的计算方法,以滑动间隔作为窗口大小,计算窗口开始时间。
过程6.5:将过程6.4所得的窗口开始时间+滑动间隔作为窗口结束时间
过程6.6:由于滑动窗口存在重叠,为了避免重叠窗口造成的重复计算,因此在划分滑动窗口时将窗口切分为连续不重叠的小窗口。
过程6.7:计算窗口大小和滑动间隔的最大公因数作为过程6.6所描述的不重叠小窗口大小
过程6.8:以过程6.5所得的窗口结束时间为窗口开始时间,过程6.6所得的最大公因数为窗口大小设置一个临时窗口,并将临时窗口向时间减少方向滑动,直到找到第一个窗口,其窗口结束时间小于数据时间戳。此步骤的目的是要找到第一个包含目标数据时间戳的窗口,但由于窗口无法直接通过数值计算获得,只能通过滑动窗口找到结束时间小于目标时间戳的第一个窗口后,再向时间增加方向滑动一个单位。
过程6.9:以过程6.8所得的窗口结束时间+过程6.6所得的最大公因数作为窗口序号。
表2窗口开始时间计算方法
Figure PCTCN2021129076-appb-000003
过程七:以过程六分配的窗口序号为单位,将数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据。随后对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态。
系统在对数据块进行预聚合时,通过列存存储技术,仅读取聚合操作所需的列数据,减少磁盘读取时间。例如需统计每个窗口中年龄大于30岁的用户总数,则先读取年龄列,过滤掉小于等于30岁的用户,再读取窗口序号列,根据窗口序号聚合求和,整个操作无需读取数据表其他列信息,减少磁盘开销。且此过程由于数据量更紧凑,对CPU cache更友好,可加速计算过程。
其中预聚合技术可例如是计算任务是数字求和,一个数据流先后到达4个数字,分别是1、2、3、4,使用预聚合技术,在每个数字到达时,系统预先进行一次计算,每一次预聚合中间状态分别是1,1;2,3;3,6;4,10。在系统触发最终计算时,直接读取最新,也就是第4次预聚合中间状态,10 即为最终计算结果。
过程八:将预聚合中间状态写入内部存储引擎。
过程九:由于流式数据处理中,数据是源源不断到达的,因此需要使用后台任务不定期进行多次合并操作。系统使用后台任务,在计算空闲时,自动对存储引擎中窗口序号相同的数据块进行预聚合计算,将多个数据块合并为单个数据块。
过程十:处理时间处理模式下,系统使用计算机机器时间设置触发器,在机器时间到达窗口结束时间时,调用窗口处理命令,计算该时刻所对应窗口的数据。事件时间处理模式下,系统使用水位线机制设置触发器,将目前观测到所有消息的最大时间作为水位线,在水位线满足触发条件时调用所对应窗口处理命令。窗口处理命令具体执行步骤如下:
过程10.1:从内部存储提取窗口所对应的窗口序号的预聚合中间状态,每个滚动窗口对应一个窗口序号,滑动窗口由于使用窗口切分,对应一个或多个窗口序号。
过程10.2:如过程10.1提取的预聚合中间状态为多个数据块,则进行预聚合计算,将其合并为单个数据块。
过程10.3:通过最终计算操作将单个数据块的预聚合中间状态计算为最终计算结果。
过程十一:如WindowView创建时指定了TO关键字,则将最终计算结果输出至目标表。
过程十二:如果客户端使用WATCH关键字监控WindowView,则将最终计算结果输出至客户端终端。
过程十三:新数据到达时重复过程三到过程十二。
过程十四:系统使用后台任务,根据迟到数据处理策略,定期清理过期窗口数据,释放存储空间。
综上,本系统将所有处理任务(计算操作)均分为两步骤:计算至预聚合中间状态,以及预聚合中间状态合并,以产生最终计算结果。计算操作可以是求和、求平均、统计、分类等数据库常用操作。以对100条数据进行求和操作为例,假设机器有10个计算线程。本系统对每个计算线程分配10条数据。步骤一:每个计算线程统计所分配的10条数据,此处10条数据的求和值即为预 聚合中间状态;步骤二:将10个线程所产生的10个求和值合并,生成“最终计算状态”,即为100个数据的求和值。
以下为与上述方法实施例对应的系统实施例,本实施方式可与上述实施方式互相配合实施。上述实施方式中提到的相关技术细节在本实施方式中依然有效,为了减少重复,这里不再赘述。相应地,本实施方式中提到的相关技术细节也可应用在上述实施方式中。
本发明还提出了一种基于列存数据的流式数据处理系统,其中包括:
模块1,用于获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;
模块2,用于将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;
模块3,用于根据预设的流式数据时间处理模式,从内部存储提取窗口对应窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作为流式数据处理结果。
所述的基于列存数据的流式数据处理系统,其中模块2用于执行该预聚合处理时对窗口过期数据直接丢弃或窗口过期指定时间后丢弃。
所述的基于列存数据的流式数据处理系统,其中模块3中该流式数据时间处理模式为处理时间或事件时间处理模式;
处理时间处理模式下,使用执行处理任务的计算机机器时间设置触发器,以在机器时间到达窗口结束时间时,调用窗口处理命令,选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务;
事件时间处理模式下,使用水位线机制设置触发器,以将所有流式数据的最大时间作为水位线,在水位线满足触发条件时选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务。
所述的基于列存数据的流式数据处理系统,其中该模块1用于
当该窗口模式为滚动窗口时,以该批式数据块中数据所在窗口开始时间和窗口大小之和作为窗口结束时间,根据该窗口结束时间为该窗口序号;
当该窗口模式为滑动窗口时,根据滑动间隔,计算该批式数据块中数据所 在窗口的开始时间,并根据其与窗口滑动间隔之和作为窗口结束时间;
以窗口大小和窗口滑动间隔的最大公因数为临时子窗口大小,以该窗口结束时间为临时子窗口开始时间,设置临时窗口,并将该临时窗口向时间减少方向滑动,直到找到包含该批式数据块中数据的最小序号窗口,以其结束时间为窗口序号。
所述的基于列存数据的流式数据处理系统,其中该流式数据为传感器实时采集的生理数据、图像数据或日志文本数据;流式数据对应的处理任务为数据库统计任务。
工业应用性
本发明提出一种基于列存数据的流式数据处理方法和系统,包括:获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;根据预设的流式数据时间处理模式,从内部存储提取窗口对应窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作为流式数据处理结果。本发明通过使用列存存储及计算引擎,结合预聚合技术,在保持较低延迟的前提下,提升数据分析场景的吞吐量

Claims (10)

  1. 一种基于列存数据的流式数据处理方法,其特征在于,包括:
    步骤1、获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;
    步骤2、将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;
    步骤3、根据预设的流式数据时间处理模式,从内部存储提取窗口对应窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作为流式数据处理结果。
  2. 如权利要求1所述的基于列存数据的流式数据处理方法,其特征在于,步骤2包括:执行该预聚合处理时对窗口过期数据直接丢弃或窗口过期指定时间后丢弃。
  3. 如权利要求1所述的基于列存数据的流式数据处理方法,其特征在于,步骤3中该流式数据时间处理模式为处理时间或事件时间处理模式;
    处理时间处理模式下,使用执行处理任务的计算机机器时间设置触发器,以在机器时间到达窗口结束时间时,调用窗口处理命令,选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务;
    事件时间处理模式下,使用水位线机制设置触发器,以将所有流式数据的最大时间作为水位线,在水位线满足触发条件时选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务。
  4. 如权利要求1所述的基于列存数据的流式数据处理方法,其特征在于,该步骤1包括:
    当该窗口模式为滚动窗口时,以该批式数据块中数据所在窗口开始时间和窗口大小之和作为窗口结束时间,根据该窗口结束时间为该窗口序号;
    当该窗口模式为滑动窗口时,根据滑动间隔,计算该批式数据块中数据所在窗口的开始时间,并根据其与窗口滑动间隔之和作为窗口结束时间;
    以窗口大小和窗口滑动间隔的最大公因数为临时子窗口大小,以该窗口结束时间为临时子窗口开始时间,设置临时窗口,并将该临时窗口向时间减少方向滑动,直到找到包含该批式数据块中数据的最小序号窗口,以其结束时间为窗口序号。
  5. 如权利要求1所述的基于列存数据的流式数据处理方法,其特征在于,该流式数据为传感器实时采集的生理数据、图像数据或日志文本数据;流式数据对应的处理任务为数据库统计任务。
  6. 一种基于列存数据的流式数据处理系统,其特征在于,包括:
    模块1,用于获取待处理的列存流式数据及其对应的处理任务,基于时间维度将该流式数据切分为批式数据块,根据预设窗口模式为该批式数据块中每条数据分配窗口序号;
    模块2,用于将该批式数据块切分为多个中间数据块,每个中间数据块仅包含窗口序号相同的数据,对每个中间数据块的数据进行预聚合计算,产生预聚合中间状态;
    模块3,用于根据预设的流式数据时间处理模式,从内部存储提取窗口对应窗口序号的预聚合中间状态并执行与其对应的处理任务,输出任务执行结果,作为流式数据处理结果。
  7. 如权利要求6所述的基于列存数据的流式数据处理系统,其特征在于,模块2用于执行该预聚合处理时对窗口过期数据直接丢弃或窗口过期指定时间后丢弃。
  8. 如权利要求6所述的基于列存数据的流式数据处理系统,其特征在于,模块3中该流式数据时间处理模式为处理时间或事件时间处理模式;
    处理时间处理模式下,使用执行处理任务的计算机机器时间设置触发器,以在机器时间到达窗口结束时间时,调用窗口处理命令,选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务;
    事件时间处理模式下,使用水位线机制设置触发器,以将所有流式数据的最大时间作为水位线,在水位线满足触发条件时选取窗口结束时间所对应窗口的预聚合中间状态并执行与之对应的处理任务。
  9. 如权利要求6所述的基于列存数据的流式数据处理系统,其特征在于,该模块1用于
    当该窗口模式为滚动窗口时,以该批式数据块中数据所在窗口开始时间和窗口大小之和作为窗口结束时间,根据该窗口结束时间为该窗口序号;
    当该窗口模式为滑动窗口时,根据滑动间隔,计算该批式数据块中数据所在窗口的开始时间,并根据其与窗口滑动间隔之和作为窗口结束时间;
    以窗口大小和窗口滑动间隔的最大公因数为临时子窗口大小,以该窗口结束时间为临时子窗口开始时间,设置临时窗口,并将该临时窗口向时间减少方向滑动,直到找到包含该批式数据块中数据的最小序号窗口,以其结束时间为窗口序号。
  10. 如权利要求6所述的基于列存数据的流式数据处理系统,其特征在于,该流式数据为传感器实时采集的生理数据、图像数据或日志文本数据;流式数据对应的处理任务为数据库统计任务。
PCT/CN2021/129076 2021-11-05 2021-11-05 一种基于列存数据库的流式数据处理方法及系统 WO2023077451A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/129076 WO2023077451A1 (zh) 2021-11-05 2021-11-05 一种基于列存数据库的流式数据处理方法及系统

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2021/129076 WO2023077451A1 (zh) 2021-11-05 2021-11-05 一种基于列存数据库的流式数据处理方法及系统

Publications (1)

Publication Number Publication Date
WO2023077451A1 true WO2023077451A1 (zh) 2023-05-11

Family

ID=86240407

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/129076 WO2023077451A1 (zh) 2021-11-05 2021-11-05 一种基于列存数据库的流式数据处理方法及系统

Country Status (1)

Country Link
WO (1) WO2023077451A1 (zh)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331255A (zh) * 2014-11-17 2015-02-04 中国科学院声学研究所 一种基于嵌入式文件系统的流式数据读取方法
WO2017185576A1 (zh) * 2016-04-25 2017-11-02 百度在线网络技术(北京)有限公司 一种多流流式数据的处理方法、系统、存储介质及设备
WO2018072618A1 (zh) * 2016-10-18 2018-04-26 阿里巴巴集团控股有限公司 流式计算任务的分配方法和控制服务器
CN109033439A (zh) * 2018-08-15 2018-12-18 中科驭数(北京)科技有限公司 流式数据的处理方法和装置
CN109196494A (zh) * 2016-08-26 2019-01-11 华为技术有限公司 用于对数据流执行信息处理的设备和方法
CN110019386A (zh) * 2017-09-05 2019-07-16 中国移动通信有限公司研究院 一种流数据处理方法及设备
CN112286582A (zh) * 2020-12-31 2021-01-29 浙江岩华文化科技有限公司 基于流式计算框架的多线程数据处理方法、装置和介质
CN112398906A (zh) * 2020-10-14 2021-02-23 上海海典软件股份有限公司 一种互联网平台数据交互方法及装置
CN112667170A (zh) * 2021-01-12 2021-04-16 北京工业大学 一种面向滑动窗口数据分析的Spark数据缓存方法

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331255A (zh) * 2014-11-17 2015-02-04 中国科学院声学研究所 一种基于嵌入式文件系统的流式数据读取方法
WO2017185576A1 (zh) * 2016-04-25 2017-11-02 百度在线网络技术(北京)有限公司 一种多流流式数据的处理方法、系统、存储介质及设备
CN109196494A (zh) * 2016-08-26 2019-01-11 华为技术有限公司 用于对数据流执行信息处理的设备和方法
CN112148753A (zh) * 2016-08-26 2020-12-29 华为技术有限公司 用于对数据流执行信息处理的设备和方法
WO2018072618A1 (zh) * 2016-10-18 2018-04-26 阿里巴巴集团控股有限公司 流式计算任务的分配方法和控制服务器
CN110019386A (zh) * 2017-09-05 2019-07-16 中国移动通信有限公司研究院 一种流数据处理方法及设备
CN109033439A (zh) * 2018-08-15 2018-12-18 中科驭数(北京)科技有限公司 流式数据的处理方法和装置
CN112398906A (zh) * 2020-10-14 2021-02-23 上海海典软件股份有限公司 一种互联网平台数据交互方法及装置
CN112286582A (zh) * 2020-12-31 2021-01-29 浙江岩华文化科技有限公司 基于流式计算框架的多线程数据处理方法、装置和介质
CN112667170A (zh) * 2021-01-12 2021-04-16 北京工业大学 一种面向滑动窗口数据分析的Spark数据缓存方法

Similar Documents

Publication Publication Date Title
US11882054B2 (en) Terminating data server nodes
Li et al. No pane, no gain: efficient evaluation of sliding-window aggregates over data streams
Traub et al. Efficient Window Aggregation with General Stream Slicing.
CN106648904B (zh) 一种流式数据处理自适应速率控制方法
Arasu et al. Stream: The stanford data stream management system
WO2020211300A1 (zh) 资源分配方法、装置、计算机设备和存储介质
US7673291B2 (en) Automatic database diagnostic monitor architecture
US20140156636A1 (en) Dynamic parallel aggregation with hybrid batch flushing
US7376682B2 (en) Time model
CN107623639B (zh) 基于emd距离的数据流分布式相似性连接方法
WO2017185576A1 (zh) 一种多流流式数据的处理方法、系统、存储介质及设备
CN107766413B (zh) 一种实时数据流聚合查询的实现方法
CN114185885A (zh) 一种基于列存数据库的流式数据处理方法及系统
Chen et al. Popularity-aware differentiated distributed stream processing on skewed streams
Liu et al. Optimizing shuffle in wide-area data analytics
Cao et al. Timon: A timestamped event database for efficient telemetry data processing and analytics
Maier et al. Capturing episodes: may the frame be with you
WO2023077451A1 (zh) 一种基于列存数据库的流式数据处理方法及系统
Falk et al. Query-able kafka: An agile data analytics pipeline for mobile wireless networks
Marcu et al. Exploring shared state in key-value store for window-based multi-pattern streaming analytics
Shaikh et al. Smart scheme: an efficient query execution scheme for event-driven stream processing
CN114185884A (zh) 基于列存数据的流式数据处理方法及系统
Gomes et al. Railgun: managing large streaming windows under MAD requirements
Chen et al. GDSW: a general framework for distributed sliding window over data streams
Watanabe et al. Query result caching for multiple event-driven continuous queries

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21962964

Country of ref document: EP

Kind code of ref document: A1