CN110990059A

CN110990059A - Stream type calculation engine operation method and system for tilt data

Info

Publication number: CN110990059A
Application number: CN201911191154.2A
Authority: CN
Inventors: 程学旗; 李冰; 张志斌; 郭嘉丰
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2019-11-28
Filing date: 2019-11-28
Publication date: 2020-04-10
Anticipated expiration: 2039-11-28
Also published as: CN110990059B

Abstract

The invention provides a streaming computing engine operation method and a streaming computing engine operation system for tilt data, wherein the streaming computing engine operation method comprises the following steps: acquiring user operation containing oblique data, converting the user operation into a directed acyclic graph representing the user operation, and dividing nodes in the directed acyclic graph into a stateful operator and an stateless operator according to service processing logic of the operators in the directed acyclic graph; all the state operators are packaged into stateful tasks and then input into stateful data streams, and all the stateless operators are packaged into stateless tasks and then input into stateless data streams; copying and sending the stateless task in the stateless data stream to any computing node to obtain a stateless processing result; and converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.

Description

Stream type calculation engine operation method and system for tilt data

Technical Field

The invention belongs to the field of distributed computing, and particularly relates to the field of distributed streaming data computing.

Background

In the streaming computing engine, the source of the input data is continuously coming after the user job is started, so that the distribution of the data source is unknown. In a real scene, data is inclined, for example, in a social network, a celebrity has more fans, and a hot event has more comments. And the distribution of the skewed data may also change dynamically, as in social networks, hot events change over time, which makes the data distribution uncertain.

In the stream computing engine, the computing operators are divided into two types, namely, a state (stateful) operator and a stateless operator. When the stateless operator is used for calculation, only the current input data is considered, the context and historical calculation information do not need to be considered, and the same input data always has the same calculation output; the state operator needs to save the computation state in the computation process to perform more complex operations, and can be used in scenarios such as machine learning model training and data aggregation, and in the state operator, due to the delay sensitivity of the stream computing system, the computation state can only be accessed locally, which means that data of the same key value needs to be sent to the same node for computation. Under the condition of inclined data, the stateless operator can process any data or be dispatched to any node, and system load imbalance cannot be caused. In the state operator, because input data with the same key value can only be sent to the same node, system load imbalance may be caused in a tilted data scene.

And the streaming computing engine divides the input data of the state operator to ensure that the data with the same key value is distributed to the same node. Traditional Streaming computing engines such as Flink and Spark Streaming all use a static data splitting strategy, and the unknown property and distribution uncertainty of a data source cause that the static splitting strategy is difficult to dynamically adjust when the distribution changes, so that the system load is unbalanced, and the optimal computing performance cannot be achieved.

The existing stream computing engine uses a static data partitioning strategy, after data is partitioned, each partition must correspond to a physical computing node, the size of the data partition is uneven due to an inclined scene, and the physical node corresponding to a larger partition has a higher load, so that cluster load is unbalanced, and computing resources cannot be fully utilized. And the static partitioning strategy cannot cope with dynamic data distribution change, when the oblique data distribution is changed, the system needs to be partitioned again in order to improve the utilization rate of computing resources, and the partition operation needs to suspend the computing task of the whole cluster, so that the influence on the system performance is large. Therefore, how to solve the problem of load imbalance under the inclined data set is a technical problem to be solved in the technical field.

Disclosure of Invention

The invention aims to improve the resource utilization rate of the existing system in a data tilt scene, and provides a streaming computing engine supporting tilt data.

In view of the deficiencies of the prior art, the present invention provides a method for running a streaming computing engine for tilting data, comprising:

step 1, obtaining user operation containing oblique data, converting the user operation into a directed acyclic graph representing the user operation, and dividing nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;

step 2, packaging all the state operators into a state task and inputting a state data stream, packaging all the stateless operators into a stateless task and inputting a stateless data stream;

step 3, copying and sending the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;

and 4, converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.

The method for running the streaming computing engine for the tilt data, wherein the step 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.

The streaming computing engine operation method for the skewed data, wherein the process of packaging the state operators into the stateful tasks and the process of packaging the stateless operators into the stateless tasks in the step 2 both comprise the step of merging the associated operators by using an OP fusion rule.

The method for running the streaming computing engine for the tilt data, wherein the step 4 comprises:

step 41, the computing node selects the stateful task with the heaviest load at regular time intervals, and reallocates the CPU resource according to the following formula:

wherein p is_nextRepresenting the number of newly allocated CPU threads of a task, p_nowFor the current number of threads of the task, c_lastIncreasing the number of threads when the current task is scheduled last time, N_cpuThe number of CPU cores.

The method for running the streaming computing engine for the tilt data, wherein the step 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.

The invention also provides a streaming computing engine operating system for tilting data, which comprises:

the method comprises the following steps that a module 1 acquires user operation containing oblique data, converts the user operation into a directed acyclic graph representing the user operation, and divides nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;

the module 2 is used for packaging all the state operators into a state task and then inputting a state data stream, and packaging all the stateless operators into a stateless task and then inputting a stateless data stream;

the module 3 copies and sends the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;

and the module 4 is used for converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.

The streaming computing engine operating system for the tilt data, wherein the module 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.

The streaming computing engine operating system for the tilted data, wherein the process of packaging the state operators into the stateful tasks and the process of packaging the stateless operators into the stateless tasks in the module 2 both comprise merging the associated operators by using an OP fusion rule.

The streaming computing engine operating system for the tilt data, wherein the module 4 comprises:

module 41, the compute node picks the stateful task with the heaviest load every fixed time interval and redistributes it to CPU resources according to the following formula:

The streaming computing engine operating system for the tilt data, wherein the module 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.

According to the scheme, the invention has the advantages that:

the invention provides a streaming computing engine supporting tilt data. Compared with the prior art, the engine can be used for load balancing in a tilting scene and can respond to dynamic changes of data distribution in time, and the system has 8-node throughput reaching 78.5 times of Flink in an extreme tilting scene (zipfskew is 2.0).

Drawings

FIG. 1 is a system architecture diagram of the present invention;

FIG. 2 is a state data stream segmentation diagram according to the present invention;

FIG. 3 is a schematic diagram of OP fusion rules according to the present invention;

FIG. 4 is a schematic diagram of task level migration according to the present invention;

FIG. 5 is a flow chart of the method of the present invention.

Detailed Description

In the prior art, the problem of load imbalance of operation under a tilted data set is caused by a static partitioning strategy used by a streaming computing engine, on one hand, the static partitioning strategy is difficult to enable a system to achieve load balance in a tilted scene, and on the other hand, the static partitioning strategy is difficult to quickly cope with data distribution change. The invention provides a virtual data partitioning and two-stage dynamic scheduling technology. The virtual data partitioning technology reduces the performance influence caused by data re-partitioning in the traditional system and provides support for two-stage dynamic scheduling strategies; the two-stage dynamic scheduling technology is divided into a vertical expansion part and a horizontal migration part, the vertical expansion cost is low, and the vertical expansion and contraction can be completed in millisecond level. The horizontal expansion can solve the problem of unbalanced load among machines, and the cost is higher than that of vertical expansion, but frequent triggering is not needed. Compared with the traditional single-stage scheduling mode, the two-stage scheduling mode has lower expenditure, and the scheduling frequency can be increased to better adapt to the dynamic change of the task load.

The key technical points of the invention comprise:

the key point 1 is a streaming computing engine which is segmented by using virtual data; the technical effects are as follows: the system segments the data stream according to the virtual blocks, and the data stream is only required to be scheduled without segmentation again when the data distribution is changed.

The key point 2 is a stream type calculation engine using a two-stage dynamic scheduling technology; the technical effects are as follows: the system uses a vertical and horizontal two-stage scheduling strategy, the vertical scheduling cost is low, the system does not need to be suspended, the system can be frequently used so that the resource utilization rate of the system is optimal, the horizontal scheduling cost is high, but the task is migrated between machines, so that the resource utilization rate of the whole cluster can be maximized. Compared with the traditional single-stage scheduling strategy, the two-stage scheduling strategy reduces the scheduling overhead and increases the scheduling response speed.

In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.

Compared with a master-slave architecture, the master-slave architecture has the characteristics of high throughput, low delay and the like, and the master node in the system can not become a performance bottleneck, and is only responsible for task scheduling and not responsible for task calculation. The scheduling task does not have large computational overhead, so the scheduling task does not become a performance bottleneck. Therefore, the system adopts a master-slave architecture (master is JobManager and slave is TaskManager), and the architecture of the system is shown in FIG. 1 and comprises three parts, namely a job manager JobManager, a task manager TaskManager and a State manager State Service. The JobManager is responsible for cluster computing resources and user job management; the TaskManager is responsible for calculating task execution and collecting statistical information; the State Service is responsible for the management of the computation State, including providing a computation State access interface, and performing backup, migration and recovery of the computation State. Referring to fig. 5 (the purpose of state data stream segmentation in fig. 5 is to coordinate with horizontal and vertical scheduling, and automatically schedule corresponding data slices during scheduling), the method of the present invention includes:

the first process is as follows: in the system, each data in the data stream is converted into a data record containing "key" and "value" fields. The "key" field is used as a data record partition, specified by the user job. The "value" field records the original information of the data stream. When the user job has no partition requirement, the 'key' field can be null, and the system can automatically allocate the records containing the same 'key' to the same node. The relation between the job and the task is called the job from the user perspective and called the task from the system perspective.

And a second process: after entering the system, the data streams are divided into two types, namely, state data streams and stateless data streams:

in the stateless data flow, the data is stateless data, context and historical calculation information do not need to be considered, partition requirements do not exist, the same data flow input always generates the same data flow output, therefore, the data can be scheduled to any node by a system for processing, system elastic expansion and load balancing are easy, and the stateless data flow is scheduled preferentially when the system is scheduled. Stateless jobs do not use the key field.

In the status data stream, the data includes partition requirements, data with the same "key" must be scheduled to the same node for processing, and the requirement on the system scheduler is high, so the system divides the status data stream into a fixed number of data slice slices (data slices), as shown in fig. 2. The Data slice is a basic scheduling and processing unit in the system, and can be processed by the system in parallel or scheduled to other machines for processing according to the decision of the scheduler. Dividing the Data stream into Data slices can simplify the system design and reduce the burden of the system scheduler. In addition, when the system failure is recovered, the efficiency of batch transmission and data processing is higher, and the system downtime is reduced.

Wherein the 'stateless task' scheduling is determined according to the 'stateful task' scheduling condition. The system preferentially decides how the state tasks are deployed, and finally fills the remaining computing resources with the stateless tasks. (when the load changes, the stateless task is scheduled preferentially, and the computing resource is vacated for the stateful task). The system deploys the stateless data flow processing tasks in all nodes in advance, and when the data flow needs to be scheduled, the tasks do not need to be built again. The advantage is that stateless data stream switching is low cost.

The third process: the system converts the user job into a directed acyclic graph DAG that represents the user job. Operators (operators) are nodes in the DAG and represent a business process logic in a user job. The operators are divided into a state operator and a stateless operator, the state operator is used for processing the state data stream, and the stateless operator is used for processing the stateless data stream. After the user operation is submitted, the system can analyze the user operation code, and converts the UDF (user defined function, the user operation is composed of the UDF and the connecting edge defining the relation between the UDFs) needing to use the calculation state into a state operator, and the input of the state operator is a state data stream; operators that do not require the use of computational states are converted into "stateless operators", the input stream of which is a "stateless data stream". For example: the user submits a word count job, which comprises the following steps:

the Json log is analyzed through the UDF-1, and word is extracted;

and UDF-2, counting the occurrence times of the word.

The business processing logic of the UDF-1 is a state which does not need to be calculated, the system converts the state into a stateless operator, and the corresponding input stream is a stateless data stream; the UDF-2 business processing logic needs to record the occurrence number of word, so that the state needs to be calculated, the state is taken as a state operator, and the corresponding input stream is taken as a state input stream.

The process four is as follows: the system packages the state operator and the Stateless operator into a state task (stateful stream task) and a Stateless task (Stateless stream task), and when the state operator and the Stateless operator are packaged, the associated operators are automatically merged by using an OP fusion rule, as shown in FIG. 3.

And a fifth process: the system sends the packaged tasks to the computing nodes, and as the stateless tasks can be sent and executed at any node, in order to reduce the system scheduling overhead, the system copies and sends each stateless task to all the computing nodes, and sends each stateless task to any computing node randomly and uniformly. When switching the stateless data stream, the corresponding target node must have the corresponding stateless task. Because the system schedules the stateless tasks preferentially (because the scheduling overhead is low) and the stateless data streams are switched frequently, the stateless tasks are deployed to all the nodes in advance, so that the process of task construction is omitted during switching, and the switching speed is accelerated.

The process six: after receiving the tasks, the computing nodes can automatically perform vertical telescopic adjustment on the tasks in order to maximize the utilization rate of computing resources, and adjust the occupation amount of CPU resources of each computing task. And counting the accumulation degree of the input data stream messages of each task by the computing node, wherein the more serious the accumulation indicates that the computing task is heavier. The computing node selects the computing task with the heaviest load at regular time intervals and redistributes the CPU resources according to the following formulaA source. Wherein p is_nextRepresenting the number of newly allocated CPU threads of a task, p_nowFor the current number of threads of the task, c_lastIncreasing the number of threads when the current task is scheduled last time, N_cpuThe number of CPU cores.

The process is seven: the system JobManager can count the utilization rate of the computing resources of each computing node, when the utilization rates of the computing resources are not balanced, the system executes a horizontal scheduling strategy, the JobManager can select a task from the computing nodes with the highest load to be migrated to the nodes with the lowest load, and when the task is selected, the task with the least computing state can be selected preferentially, so that the migration time consumption is reduced. Migration process as shown in fig. 4, when task T1 needs to be migrated from node 1 to node 2, the JobManager first notifies T1 that the upstream node suspends its work, and then reconstructs task T1 on node 2, during which process the computation state DB1 of T1 automatically migrates to node 2. After the migration is completed, the node 2 sends a migration completion message to the JobManager, and finally the JobManager rewrites the upstream routing table of the T1 and resumes its work.

The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.

Claims

1. A method of running a streaming computing engine for tilting data, comprising:

2. A streaming computation engine operation method for tilt data as recited in claim 1, wherein the step 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.

3. A method as recited in claim 1, wherein the step 2 of packing the state operators into stateful tasks and packing the stateless operators into stateless tasks comprises merging the associated operators using an OP fusion rule.

4. A streaming computation engine operation method for tilt data as recited in claim 1, wherein the step 4 comprises:

5. A streaming computation engine operation method for tilt data as recited in claim 1, wherein the step 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.

6. A streaming computing engine operating system for tilting data, comprising:

7. A streaming computing engine execution system for tilt data as in claim 6, the module 4 comprising: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.

8. A streaming computing engine operating system for skewed data as in claim 6, wherein the wrapping of stateful operators into stateful tasks and stateless operators into stateless tasks in module 2 comprises merging associated operators using OP fusion rules.

9. A streaming computing engine execution system for tilt data as in claim 6, the module 4 comprising:

10. A streaming computing engine execution system for tilt data as in claim 6, the module 4 comprising: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.