CN115412501A

CN115412501A - Multi-level collaborative reconfiguration stream processing system based on Flink and processing method thereof

Info

Publication number: CN115412501A
Application number: CN202211047958.7A
Authority: CN
Inventors: 张展; 左德承; 封威; 冯懿; 舒燕君; 温东新
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2022-08-30
Filing date: 2022-08-30
Publication date: 2022-11-29

Abstract

A multi-level collaborative reconfiguration stream processing system based on Flink and a processing method thereof belong to the technical field of computer data processing. To optimize the performance of a stream processing system in the face of complex situations such as data skew, load fluctuations, resource changes, etc. On the basis of the original assembly of the Flink flow processing platform, an index monitor, a cooperative reconfiguration manager, a reconfiguration coordinator, a re-partition executor and a subtask configuration manager are added, and the original cooperative adaptive scheduler is modified into a horizontal elastic executor and the original resource slot distributor is modified into a re-scheduling executor; the index monitor is connected with the cooperative reconfiguration manager and the subtask configuration manager, the cooperative reconfiguration manager is connected with the reconfiguration coordinator, and the reconfiguration coordinator is respectively connected with the re-partition executor, the horizontal elastic executor, the re-scheduling executor and the subtask configuration manager. The present invention optimizes a stream processing system.

Description

Multi-level collaborative reconfiguration flow processing system based on Flink and processing method thereof

Technical Field

The invention belongs to the technical field of computer data processing, and particularly relates to a multi-level collaborative reconfiguration flow processing system based on Flink and a processing method thereof.

Background

Real-time streaming data is an important data organization form in the big data era, has the characteristics of high speed, high variability, high unpredictability, high value of data and the like, and in order to acquire the value of data more efficiently, in recent years, stream calculation becomes a standard for processing continuous unbounded data streams due to the characteristics of low delay, high throughput and the like. Many modern Distributed Stream Processing systems DSPS (Distributed Stream Processing systems) have been developed and deployed on a large scale in various fields, such as Google's real-time large-scale streaming Processing System Google millreel, apache Spark for implementing micro-batch Processing based on elastic data set RDD, etc., where Apache flush becomes one of the well-known open-source frameworks in the Distributed Stream computing field in terms of native real-time streaming computation and excellent state management, but there is a problem that one-time optimization performed by a developer or at the time of initial deployment cannot guarantee that a Stream Processing application maintains a consistent service level throughout its life cycle. The workload characteristics (e.g., traffic bursts, fluctuations, or data skew) and the operating environment (e.g., network environment, node processing capacity, node failure) faced by the stream processing system DSPS may change from time to time. DSPS must continually optimize one or more service objectives (e.g., end-to-end delay, throughput) throughout the lifetime of an application, which presents a significant challenge to its adaptive reconfiguration capability.

In fact, many studies have attempted to optimize the performance of a stream processing system in different ways, including providing application flexibility (adjusting the number of parallel instances), adjusting data partitioning (changing the distribution of a data stream among multiple parallel instances of the same operator), and rescheduling (changing the distribution of operators on compute nodes), among others. These schemes allow for a wide range of mechanisms, architectures and methods to introduce adaptive reconfiguration capabilities in a streaming system. However, they rely on different assumptions and only make adjustments for one or some problems that may be faced in the execution process of the stream processing application, and in stateful computation that is more widely applied, the execution of various adaptive policies requires application of a state migration mechanism to ensure the consistency of computation, and an inappropriate adaptive policy not only cannot solve the problem, but also brings additional migration overhead, consumes a large amount of system resources, and brings additional delay overhead. Therefore, the method has important significance in researching the self-adaptive reconfiguration strategy which can comprehensively consider data parallel and task parallel so as to adjust and optimize the factors influencing the performance of the stream processing system from multiple layers.

The patent with publication number CN114675969A and title of invention "elastic scaling stream processing method and system based on adaptive load partition" discloses an elastic scaling stream processing method based on adaptive load partition, which comprises constructing a stream processing system based on Flink prototype; constructing a DKG model for distributing data to downstream operator instances and managing the computing states in the instances; constructing an index collector model to collect and store performance index data of the stream processing system; sharing performance index data; constructing a discriminator model for calculating an elastic scaling strategy implementation factor and a load partition strategy implementation factor; constructing a corresponding elastic scaling strategy and a load partitioning strategy; and the reconfiguration controller module is constructed to apply the strategy to the stream processing system to complete the elastic scaling stream processing based on the adaptive load partition. The invention also discloses a system for realizing the elastic scaling stream processing method based on the self-adaptive load partition. The invention can realize lower end-to-end processing delay and higher throughput on balanced and inclined data streams, and has the advantages of high reliability, good implementation effect, science and reasonability. However, the corresponding flexible scaling strategy and the load partitioning strategy proposed by the method cannot effectively reduce the influence caused by fluctuation in a fluctuating load environment, the granularity of adjustment is large, and in the face of a complex environment and a fluctuating data stream, reconfiguration may be frequently triggered, and the cost is large.

The invention discloses a distributed inclined stream processing method and system based on high-frequency key value counting, and the method and system based on high-frequency key value counting are disclosed, and the basic idea is that a counting bloom filter is used for counting each data item in a data stream, the data item is respectively identified as a high-frequency key, a potential high-frequency key and a low-frequency key according to frequency, so that the distribution of different data items is obtained, a downstream instance is distributed to the high-frequency key by adopting a strategy of adding a random suffix and grouping and aggregating, and a downstream instance is distributed to a non-high-frequency key by adopting a key value grouping strategy, so that the load balance among different downstream instances is realized, and the system performance is improved. The invention can solve the technical problems of extremely high memory overhead of random grouping downstream instances and unbalanced load among key value grouping downstream instances in the oblique flow processing method. However, only the data partitioning strategy is considered, the problem of data tilt faced by a single operator in flow application is solved, the single adaptive strategy is a complex problem, and stability is difficult to maintain under complex conditions such as fluctuating load and the like. The state transition overhead of the stateful operators is not taken into account.

Disclosure of Invention

The invention aims to optimize the performance of a stream processing system in the face of complex conditions such as data inclination, load fluctuation, resource change and the like, and provides a multi-level collaborative reconfiguration stream processing system based on Flink and a processing method thereof.

In order to achieve the purpose, the invention is realized by the following technical scheme:

a multi-layer collaborative reconfiguration flow processing system based on Flink is characterized in that an index monitor, a collaborative reconfiguration manager, a reconfiguration coordinator, a re-partition executor and a subtask configuration manager are added to the multi-layer collaborative reconfiguration flow processing system based on the original component of a Flink flow processing platform, and the original collaborative self-adaptive scheduler is modified into a horizontal elastic executor and the original resource slot distributor is modified into a re-scheduling executor;

the re-partition executor, the horizontal elastic executor and the re-scheduling executor form a re-configuration execution module;

the index monitor is connected with a cooperative reconfiguration manager and a subtask configuration manager, the cooperative reconfiguration manager is connected with a reconfiguration coordinator, and the reconfiguration coordinator is respectively connected with a re-partition executor, a horizontal elastic executor, a re-scheduling executor and a subtask configuration manager.

Further, the index monitor is used for monitoring flow application and related indexes of the Flink-based multi-level collaborative reconfiguration flow processing system during operation, and is used for reminding the collaborative reconfiguration manager system of being overloaded when the indexes cannot meet the requirements of the Service Level Agreement (SLAS), and supporting the operation of the multi-level collaborative reconfiguration flow processing system;

the collaborative reconfiguration manager is used for making reconfiguration instructions on the monitoring information provided by the indicator monitor and sending reconfiguration instructions to the reconfiguration coordinator so as to instruct the multi-layer collaborative reconfiguration stream processing system based on Flink to execute reconfiguration;

the reconfiguration coordinator is a central coordinator when executing reconfiguration and is responsible for receiving the reconfiguration instruction from the collaborative reconfiguration manager, returning an execution result and calling services provided by different types of reconfiguration execution modules to execute reconfiguration according to different reconfiguration options;

the re-partition executor in the reconfiguration execution module is used for executing data stream re-partition, the horizontal elastic executor is used for application/resource expansion and contraction, and the rescheduling executor is used for task rescheduling; the reconfiguration execution module is responsible for generating a reconfiguration scheme and transmitting the reconfiguration scheme to the collaborative reconfiguration manager for evaluating the overhead and the income;

the subtask configuration manager is responsible for managing the configuration of a specific subtask, each subtask in the stream maintains a subtask configuration manager, in the monitoring stage, the subtask configuration manager is responsible for collecting monitoring indexes of the subtask, including element distribution in the stream and task input and output conditions, and uploading the monitoring indexes to the index monitor, and in the execution stage, the subtask configuration manager is responsible for receiving a remote instruction from the reconfiguration coordinator and correspondingly updating the configuration at a task end.

A processing method of a multi-level collaborative reconfiguration flow processing system based on Flink is realized by relying on the multi-level collaborative reconfiguration flow processing system based on Flink, and comprises the following steps:

s1, constructing a multilevel cooperative control strategy for minimizing migration overhead;

s2, constructing a flow application elastic strategy based on computing resource perception;

s3, constructing a flow partition strategy based on fine-grained asynchronous migration;

s4, constructing a load balancing task rescheduling strategy for minimizing communication overhead;

s5, when the multi-level collaborative reconfiguration flow processing system based on the Flink is in a stable state, the index monitor periodically acquires monitoring data from the cluster, checks a set threshold index, and triggers the collaborative reconfiguration manager when a certain index is overloaded;

s6, the cooperative reconfiguration manager acquires cluster related indexes from the index monitor, acquires alternative schemes from each reconfiguration execution module according to the strategy of the step S1, and selects a proper scheme to optimize cluster configuration;

s7, the cooperative reconfiguration manager sends the reconfiguration options to the reconfiguration coordinator;

s8, the reconfiguration coordinator calls the corresponding reconfiguration execution module according to the needed reconfiguration option and starts to optimize the partitions according to the strategies of the steps S2-S4;

and S9, the reconfiguration executing module cooperates with the subtask configuration manager to complete the optimization of the data stream partition, and the multi-layer cooperative reconfiguration stream processing system based on the Flink recovers the stable state and waits for triggering the next reconfiguration.

Further, the specific method for constructing the multilevel cooperative control strategy for minimizing the migration overhead in step S1 includes the following steps:

s1.1, monitoring cluster data by an index monitor, checking a set threshold index, judging whether the cluster index violates SLAS constraint, if not, returning to the step of checking the set threshold index, and if so, further judging whether the speed fluctuation of the cluster index exceeds the standard;

s1.2, judging whether the speed fluctuation exceeds the standard, if not, acquiring an overload node set from an index monitor, if so, executing flow application elastic strategy adjustment parallelism based on computing resource perception, and then returning to the step of checking the set threshold index;

s1.3, judging whether the overload node still exists in the overload node set obtained in the step S1.2, if not, returning to the step of checking the set threshold index, if so, executing a flow partition strategy based on fine-grained asynchronous migration to balance the node instances, and then further judging whether the node still is overloaded;

and S1.4, judging whether the node is still overloaded in the step S1.3, if not, returning to the step of checking the set threshold index, and if so, executing a load balancing task rescheduling strategy for minimizing communication overhead to reconfigure, and if so, circulating until the cluster load balancing degree reaches the threshold range set by a user.

Further, in step S1, in order to reduce frequent triggering of elastic adjustment in the face of load burst fluctuation due to a fixed threshold, a sliding window is used to linearly smooth the input rate.

Further, the specific method for constructing the flow application elasticity strategy based on the computing resource perception in step S2 includes the following steps:

s2.1, judging whether the cluster index data acquired by the horizontal elastic actuator is a null value, if not, reading user configuration and loading parallelism configuration, updating the parallelism to all operators, and outputting application topology;

s2.2, judging whether the cluster index data acquired by the horizontal elastic actuator is a null value, if so, acquiring the input speed of the current period and calculating an adjustment proportion under the condition of current reconfiguration, adjusting the parallelism of the original operators and calculating the parallelism of all the operators, and outputting the application topology.

Further, the method for loading the parallelism configuration in step S2 includes the following steps:

s2.1.1, setting the application topology G generated by optimizing the application layer of the stream processing system after the user code is submitted _LT ，V(G _LT ) For the set of vertices of the topology, vertex V _i ∈V(G _LT ) Set up c _vi And m _vi Respectively being vertex V _i Computing resource consumption and parallelism;

s2.1.2, for initial scheduling, setting the processing capacity of all nodes in a cluster to be the same, so that the proportion of the parallelism of the nodes is proportional to the consumption of the computing resources of the nodes, and the formula is as follows:

c _v1 :m _v1 ＝c _v2 :m _v2 ＝…＝c _vn :m _vn

s2.1.3, for reconfiguration, setting p _vi Is a vertex V _i The processing capacity of the located compute node, i.e., the total amount of compute resources, to balance the compute load on each compute node, V (G) _LT ) Each vertex in (2) needs to satisfy a constraint condition, that is, the calculation amount distributed in the unit calculation node is proportional to the number of instances, and the node parallelism ratio is determined by the following formula:

wherein n is | V (G) _LT ) Number of vertices in |;

s2.1.4, determining the parallelism proportion of the nodes according to the steps S2.1.2 and S2.1.3, calculating the parallelism of the rest operators according to the parallelism proportion of the nodes by determining the parallelism of any operator in the topology, setting the parallelism of a source operator to be 1 in initial scheduling, and then monitoring the data stream input rate of the source operator to be lambda in the reconfiguration process _new Adjusting the parallelism of a source operator according to the input rate ratio Diff of the current period to the previous period, and calculating a new ratio to update the parallelism of the rest operators according to the following formula after adjustment:

wherein the input rate of the data stream in the last period is lambda _old ；

S2.1.5, configuring USERCONFIG by a user, and outputting an application topology G with updated parallelism _LT 。

Further, the specific method for constructing the stream partition strategy based on fine-grained asynchronous migration in step S3 includes the following steps:

s3.1, determining a candidate migration key value in the candidate migration virtual instance set V to serve as a candidate migration instance set Om;

s3.1.1, acquiring overload instances and overload thresholds of the candidate migration virtual instance set V by the aid of the re-partition executer;

s3.1.2, an upstream operator acquires the total frequency of key values in each downstream partition, simultaneously calculates candidate migration key values v in a data stream in a virtual instance mode, and calculates state quantities f (v) and s (v) contained in each downstream instance by using a HyperLog algorithm, wherein f (v) is the frequency of the candidate migration key values v, and s (v) is the migration cost of the candidate migration key values v;

s3.1.3, sorting the virtual instances in the order of f (v)/s (v) from high to low;

s3.1.4, judging whether node load exceeds a threshold value, if so, migrating the virtual instance, and reducing the node load to be below the threshold value;

s3.1.5, if the judgment result is negative, outputting a candidate emigration instance set Om;

s3.2, determining candidate migration key values, updating the routing of the downstream partition by using a routing table updating algorithm, migrating the virtual instance where the candidate migration key values with high cost are located to a light-load instance, and outputting a new routing table;

s3.2.1, acquiring a candidate migration virtual instance set V and a candidate migration instance set Om by a rescheduling actuator;

s3.2.2, arranging the candidate migration virtual instance sets V in a descending order according to f (V)/s (V), and arranging the candidate migration instance sets Om in an ascending order according to the load of the operator instances;

s3.2.3, judging whether unbalance and overload phenomena occur after the optional instance and the node detection key value are migrated, if so, calculating a routing table, and if not, continuing to judge the next group;

s3.2.4, statistics of new routing table H _new 。

Further, the step S4 of constructing a load balancing task rescheduling policy that minimizes communication overhead includes the following steps:

s4.1, defining constraint conditions of maximum migration cost and maximum load distance;

s4.2, traversing the topology, calculating and judging whether the operator is recorded in the juxtaposed group set to reduce the network overhead: t is t _jj' Representing the output rate from a certain subtask j to subtask j' within Δ T, the operator instance pairs are determined<Ti，Tj>Whether adding a set of collocated groups helps to reduce overall network overhead,

should exceed avg (T) _j ) SF, wherein avg (T) _j ) The input rate of all upstream operator instances to the current instance is an average value, SF is a scoring factor, SF is set to 1 to represent that input exceeding the average value is received from the upstream instance, if SF is equal to 2, the input rate of the upstream instance needs to exceed twice the average value, if the input rate is judged to be 1, the step S4.3 is continued, if the input rate is judged to be 2, the step S4.2 is continued until all operators in the topology are traversed, and then the operation is terminated;

s4.3, determining a scheduling unit: firstly, processing a juxtaposed group set to form a minimum number of sets, then splitting the juxtaposed group set into scheduling units meeting conditions according to set maximum migration overhead and maximum load distance, and dividing the juxtaposed group set into a plurality of subsets meeting the maximum migration overhead and the maximum load distance by adopting a greedy strategy according to a splitting strategy to obtain a newly appeared juxtaposed group;

s4.4, improving the juxtaposition group: for a newly appeared juxtaposition group, determining a juxtaposition position in a new scheme according to load, specifically for an operator instance pair which is not previously placed in the same node but has larger data exchange quantity, distributing the operator instance pair to a node with lower load according to the data exchange quantity from large to small;

s4.4, solving to obtain a mixed integer linear programming problem MILP: solving the constrained mixed integer linear programming problem and calculating the resulting allocated load distance, and if the load distance is greater than a predefined maximum load distance maxLD, then forming more partitions by reducing the maximum unit load maxll.

The invention has the beneficial effects that:

the multi-level collaborative reconfiguration flow processing system based on the Flink realizes an index monitor based on the original index system of the Flink and the key value frequency monitoring function in a subtask configuration manager, the collaborative reconfiguration manager monitors overload and triggers a multi-level collaborative control strategy for minimizing migration overhead by registering the collaborative reconfiguration manager in the index monitor, and then constructs an environmental state according to collected environmental indexes, so that a proper reconfiguration option is selected and sent to the reconfiguration coordinator and is specifically executed.

The invention relates to a Multi-level cooperative reconfiguration flow processing system (MCR-Flink, multi-level cooperative reconfiguration-Flink) based on Flink, for a reconfiguration execution module, an actuator is designed for a reconfiguration method applied to each level respectively, the reconfiguration execution module comprises a re-partition actuator, a re-scheduling actuator and a horizontal elastic actuator, and a static scheduler based on Flink gives specific implementation:

and the repartition executor updates the flow subareas according to the synchronous, migration and recovery processes for realizing local pause and state migration in operation, and updates the routing table according to the old routing table and the new monitoring data based on the flow subarea strategy of fine-grained asynchronous migration. The next execution section then updates the configuration according to the sync-pause-migration-resume flow. The synchronization and recovery function depends on the reconfiguration coordinator and the subtask configuration manager to inject marks into the stream, so as to realize the functions of synchronizing and suspending the data processing of the instances; the migration function is realized by depending on statistical data collected in the index monitor and a fine-grained asynchronous migration algorithm, after synchronization is completed, the related operator instance is in a pause state, the re-partition executor is used as a transfer, and the state update of the operator instance is realized by collecting and distributing the state to be migrated.

The rescheduling actuator realizes the updating of task deployment based on a Flink stop-restart mechanism, realizes a load balancing task rescheduling strategy for minimizing communication overhead in a Slot distributor of the Flink, and provides optimized task deployment according to the current cluster state. Restarting of tasks and updating of execution graph are achieved by directly calling methods in the Flink scheduler.

The horizontal elasticity executor is realized by an adaptive scheduler based on Flink, realizes application elasticity for the detection function of resource change, applies a flow application elasticity strategy based on calculation resource perception, and calculates the required parallelism according to the input rate of the current period in the cluster. After the resources are determined, the Flink scheduler applies a scheduling algorithm to pre-calculate the quantity of the resources required by all operators under the current parallelism, a load balancing task rescheduling strategy corresponding to the set minimum communication overhead is required to be obtained, a monitoring index of the previous period is required to be obtained to determine a proper operator deployment scheme, a corresponding quantity of task managers are started or stopped by calling a pre-programmed resource changing script through JAVA, and the adaptive scheduler responds to the change of the resources and restarts the operation, so that the horizontal expansion of the application parallelism and the quantity of the resources is realized.

The invention relates to a multi-layer collaborative reconfiguration flow processing system based on Flink, which is characterized in that experimental tests are carried out on various performance indexes optimized by the multi-layer collaborative reconfiguration flow processing system based on WordCount application using a simulation data source and a benchmark test NEXMark, and the effects of various algorithms in an MCR-Flink collaborative framework are verified from three aspects of system resource utilization rate, cluster load balance degree and tuple processing delay with other algorithms and models and comparison:

(1) It can be seen from the variation of delay that the end-to-end delay of the stream processing system can be significantly affected by the variation of load, and the multi-level cooperative reconfiguration algorithm proposed herein can perform effective reconfiguration and maintain the delay within the level required by SLAs in the face of varying data stream and tilting load.

(2) Corresponding to the problem of excessive overhead of adjusting jitter and load balancing, the multilevel cooperative control strategy for minimizing migration overhead provided by the processing method for the multilevel cooperative reconfiguration flow processing system based on Flink can select a proper elastic trigger time, and avoid the performance reduction of the system caused by continuous reduction and expansion in the face of burst load.

(3) In the operation process, load balancing can be effectively carried out through a flow partition strategy based on fine-grained asynchronous migration instead of a task scheduling algorithm, delay peaks caused by a large number of state migrations and global reconfiguration are effectively reduced, and the effect of the multi-level cooperative reconfiguration strategy of the processing method of the multi-level cooperative reconfiguration flow processing system based on the Flink on the aspect of optimizing the processing delay of the system is verified.

Drawings

Fig. 1 is a schematic structural diagram of a Flink-based multi-level cooperative reconfiguration flow processing system according to the present invention;

fig. 2 is a flow chart of a multilevel cooperative control strategy for minimizing migration overhead of a processing method of a Flink-based multilevel cooperative reconfiguration flow processing system according to the present invention;

FIG. 3 is a flowchart of a flow application elastic policy based on computing resource perception of a processing method of a Flink-based multi-level cooperative re-configuration flow processing system according to the present invention;

FIG. 4 is a flowchart illustrating the determination of migratable virtual instances based on the fine-grained asynchronous migration flow partitioning policy in the processing method of a Flink-based multi-level cooperative reconfiguration flow processing system according to the present invention;

FIG. 5 is a flow chart of a fine-grained asynchronous migration-based routing table update of a flow partitioning policy of a processing method of a Flink-based multi-level cooperative reconfiguration flow processing system according to the present invention;

fig. 6 is a flowchart of a load balancing task rescheduling strategy for minimizing communication overhead of a processing method of a Flink-based multi-level cooperative reconfiguration flow processing system according to the present invention.

FIG. 7 is a graph comparing the average end-to-end delay with the input rate of different systems for the same tilt level according to the present invention;

FIG. 8 is a graph comparing the input rate of different systems and the average thread busy time ratio under the same inclination degree;

FIG. 9 is a graph comparing average load imbalances for different systems comparing Flink and MCR-Flink for the same tilt according to the present invention;

FIG. 10 is a graph comparing the average end-to-end delay with different system skew levels for the same input rate according to the present invention;

FIG. 11 is a graph comparing the system inclination and the average thread busy time ratio under the same input rate;

FIG. 12 is a graph comparing the degree of system skew with the degree of cluster load imbalance for the same input rate;

FIG. 13 is a graph showing the step and pulse load performance of a comparative example;

FIG. 14 is a graph showing the step and pulse load performance of the present invention;

FIG. 15 is a graph of the performance-optimized query delay contrast for the NEXMark benchmark program of the present invention for Q1 with stateless computation;

FIG. 16 is a performance optimized query delay contrast curve for Q3 with a state computation for the NEXMark benchmark program of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and the detailed description. It is to be understood that the embodiments described herein are illustrative only and are not limiting, i.e., that the embodiments described are only a few embodiments, rather than all, of the present invention. While the components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations, the present invention is capable of other embodiments.

Thus, the following detailed description of specific embodiments of the present invention, presented in the accompanying drawings, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the description of the invention without inventive step, are within the scope of protection of the invention.

In order to further understand the contents, features and effects of the present invention, the following embodiments are illustrated and described in detail with reference to fig. 1-16:

the first specific implementation way is as follows:

a multi-layer collaborative reconfiguration flow processing system based on Flink is disclosed, which is characterized in that an index monitor 1, a collaborative reconfiguration manager 2, a reconfiguration coordinator 3, a re-partition executor 4 and a subtask configuration manager 7 are added on the basis of an original component of a Flink flow processing platform, and an original collaborative adaptive scheduler is modified into a horizontal elastic executor 5 and an original resource slot distributor is modified into a re-scheduling executor 6;

the re-partition executor 4, the horizontal elastic executor 5 and the re-scheduling executor 6 form a re-configuration execution module;

the index monitor 1 is connected with a collaborative reconfiguration manager 2 and a subtask configuration manager 7, the collaborative reconfiguration manager 2 is connected with a reconfiguration coordinator 3, and the reconfiguration coordinator 3 is respectively connected with a re-partition executor 4, a horizontal elastic executor 5, a re-scheduling executor 6 and a subtask configuration manager 7.

Further, the index monitor 1 is configured to monitor a flow application and related indexes of the Flink-based multi-level collaborative reconfiguration flow processing system during operation, and is configured to remind the collaborative reconfiguration manager 2 that the system is overloaded when the indexes cannot meet the requirements of the Service Level Agreements (SLAs), and to support the operation of the multi-level collaborative reconfiguration flow processing system;

the cooperative reconfiguration manager is used for making reconfiguration instructions on the monitoring information provided by the indicator monitor 1 and sending reconfiguration instructions to the reconfiguration coordinator 3 so as to instruct the multi-layer cooperative reconfiguration stream processing system based on Flink to execute reconfiguration;

the reconfiguration coordinator 3 is a central coordinator when executing reconfiguration, and is responsible for receiving the reconfiguration instruction from the coordinated reconfiguration manager 2, returning an execution result, and calling services provided by different types of reconfiguration execution modules to execute reconfiguration according to different reconfiguration options;

the re-partition executor 4 in the reconfiguration execution module is used for executing data stream re-partition, the horizontal elastic executor 5 is used for application/resource expansion and contraction, and the rescheduling executor 6 is used for task rescheduling; the reconfiguration execution module is responsible for generating a reconfiguration scheme and transmitting the reconfiguration scheme to the collaborative reconfiguration manager 2 for evaluating the overhead and the profit;

the subtask configuration manager 7 is responsible for managing the configuration of a specific subtask, each subtask in the stream maintains one subtask configuration manager 7, in the monitoring stage, the subtask configuration manager 7 is responsible for collecting monitoring indexes of the subtask, including element distribution in the stream and task input and output conditions, and uploading the monitoring indexes to the index monitor 1, and in the execution stage, the subtask configuration manager 7 is responsible for receiving a remote instruction from the reconfiguration coordinator 3 and correspondingly updating the configuration at the task end.

The second embodiment is as follows:

a processing method of a multi-level collaborative reconfiguration flow processing system based on Flink is realized by relying on the multi-level collaborative reconfiguration flow processing system based on Flink in the first embodiment, and is characterized by comprising the following steps:

Further, in step S1, in order to reduce frequent triggering of elastic adjustment due to a fixed threshold in the face of a load burst fluctuation, a sliding window is used to linearly smooth the input rate.

s2.1.1, after setting the user code to submit, the methodThe application topology generated by optimizing the application layer of the stream processing system is G _LT ，V(G _LT ) For the set of vertices of the topology, vertex V _i ∈V(G _LT ) Set up c _vi And m _vi Are respectively a vertex V _i Computing resource consumption and parallelism;

c _v1 :m _v1 ＝c _v2 :m _v2 ＝…＝c _vn :m _vn

wherein n is | V (G) _LT ) The number of vertices in l;

s2.1.4, determining the parallelism proportion of the nodes according to the steps S2.1.2 and S2.1.3, calculating the parallelism of the rest operators according to the parallelism proportion of the nodes by determining the parallelism of any operator in the topology, setting the parallelism of a source operator to be 1 in initial scheduling, and then monitoring the data stream input rate of the source operator to be lambda in the reconfiguration process _new Adjusting the parallelism of a source operator according to the input rate ratio Diff of the current period and the previous period, and calculating a new ratio to update the parallelism of the rest operators according to the following formula after adjustment:

wherein the input rate of the data stream in the last period is lambda _old ；

S2.1.5, configuring USERCONFIG by a user, and outputting an application topology G with updated parallelism _LT

S2.2, judging whether the cluster index data acquired by the horizontal elastic actuator is null, if so, acquiring the input speed of the current period and calculating an adjustment proportion, adjusting the original operator parallelism and calculating all the operator parallelisms under the condition of current reconfiguration, and outputting the application topology.

s3, the specific method for constructing the flow partition strategy based on fine-grained asynchronous migration comprises the following steps:

s3.1, determining candidate migration key values in the candidate migration virtual instance set V to serve as a candidate migration instance set Om;

s3.1.1, acquiring overload instances and overload thresholds of the candidate migrated virtual instance set V by the aid of the re-partition executor;

s3.1.5, if the judgment is negative, outputting a candidate migration instance set Om;

s3.2, determining candidate migration key values, updating the routing of the downstream partition by applying a routing table updating algorithm, migrating the virtual instance where the candidate migration key value with high cost is located into a light-load instance, and outputting a new routing table;

s3.2.4, counting new routing table H _new 。

further, the step S4 of constructing the load balancing task rescheduling strategy that minimizes the communication overhead includes the following steps:

s4.2, traversing the topology, calculating and judging whether the operator is recorded in the juxtaposed group set to reduce the network overhead: t is t _jj' Representing the output rate from a certain subtask j to subtask j' within Δ T, the operator instance pairs are determined<Ti，Tj>Whether adding a collocated set helps reduce overall network overhead,

s4.3, determining a scheduling unit: firstly, processing a juxtaposed group set to form a minimum number of sets, then splitting the juxtaposed group set into scheduling units meeting conditions according to set maximum migration overhead and maximum load distance, and dividing the juxtaposed group set into a plurality of subsets meeting the maximum migration overhead and the maximum load distance by adopting a greedy strategy through a splitting strategy to obtain a newly appeared juxtaposed group;

s4.4, improving the juxtaposition group: for a newly appeared juxtaposition group, determining a juxtaposition position in a new scheme according to load, specifically, for an operator instance pair which is not previously placed in the same node but has larger data exchange quantity, distributing the operator instance pair to a node with lower load according to the data exchange quantity from large to small;

s4.4, solving to obtain a mixed integer linear programming problem MILP: solving the constrained mixed integer linear programming problem, calculating the obtained distributed load distance, and if the load distance is greater than the predefined maximum load distance maxLD, forming more partitions by reducing the maximum unit load maxll;

s5, the multi-level collaborative reconfiguration flow processing system based on the Flink is in a stable state, the index monitor periodically acquires monitoring data from the cluster, checks a set threshold index, and triggers the collaborative reconfiguration manager when a certain index is overloaded;

s7, the coordinated reconfiguration manager sends the reconfiguration options to the reconfiguration coordinator;

The experimental effect of the invention is further verified:

1. effect of input Rate on Performance

When the inclination degree of the zipf distribution is fixed to 0.6, the optimization effect of MCR-Flink relative to Flink under different input rates is shown in the experimental results of FIG. 7, FIG. 8 and FIG. 9. The performance performances of Flink and MCR-Flink under the stable load with the input speed from 500 tuples/second to 5000 tuples/second are tested by taking 500 tuples/second as the adjustment interval, and three indexes of AL, AMTU (average maximum thread busy time) and ALB are used for evaluation.

First, it can be seen that the end-to-end delay of MCR-Flink and Mc-Stream is almost always lower than the end-to-end delay of Flink at different data Stream input rates, as shown in FIG. 7. When the input rate is in the range of [500,2000] tuple/second, the difference caused by reconfiguration is not obvious, because the input does not reach the bottleneck, when the input rate is in the range of [2500,4000] tuple/second, the optimization effect of MCR-Flink relative to Flink can be obviously seen, and as can be seen by combining the value analysis of the corresponding input rate in the graph 8 and the graph 9, the MCR-Flink performs the balance adjustment on the cluster load, so that the single-node bottleneck occurs more slowly, and the average delay is lower.

It can be seen from fig. 9 and 7 that, when the load balancing effects of the MCR-Flink and the Mc-Stream are similar, the end-to-end delay of the MCR-Flink is lower, and it can be seen that the reconfiguration execution overhead brought by the present invention is smaller on the premise of achieving the same optimization effect.

2. Effect of inclination on Performance

As can be seen from fig. 10, as the tilt degree increases, AL of Flink rapidly rises, and the delays of MCR-Flink and Mc-Stream slightly rise, and as can be seen from the corresponding load tilt degree and the maximum resource utilization analysis in fig. 11, when the tilt degree of the data distribution in the Zipf data Stream reaches 0.6, there is already a node in Flink as a performance bottleneck, and the two reconfiguration algorithms can still maintain a normal service level due to the existence of the equalization effect, and MCR-Flink is better than the Mc-Stream algorithm in the delay performance, which shows that MCR-Flink can have less influence on the delay due to its selectivity to the reconfiguration algorithm on the premise that the equalization effect is similar.

3. Runtime end-to-end delay comparison and analysis

The test experimental load for the cooperative control algorithm was a mixed load with input rates varying regularly at 1000, 2000, 4000, 1000 tuples/sec, with simultaneous addition of upward and downward bursts of variation.

FIGS. 13 and 14 show the delay monitoring of a mixed-mode load with a native Flink platform and MCR-Flink platform, and FIG. 13 shows the performance of the Flink platform under the load, and it can be seen that after the 35 second load step rises, the input tuple number increases in a burst, resulting in a tuple delay rise of 90 and 95, but the median tuple delay can still be kept within a normal range. Then, after the 95 th input rate rises to 4000 tuples per second, the upstream map operator immediately generates back pressure, resulting in the delay mark not being propagated. The source operator then also generates a back pressure, delaying the measurement to continue rising above 30000 ms.

Fig. 14 shows the performance of the cooperative control (MLCC) under the above load, where first when the 35 th second load burst rises, the MLCC algorithm still triggers flexible capacity expansion, and at the same time, the scheduling algorithm is invoked for the new configuration to determine the deployment, resulting in a delay spike spanning about 4 seconds, and then the second spike appears, which may be due to the tuple needing to be consumed and piled up, and the delay drops slowly after capacity expansion. The MLCC algorithm does not scale down immediately after the burst load change is restored, but continues to consume the packed tuples with new parallelism, eliminating the generation of a second delay spike caused by justification jitter. At 65 seconds, the input rate rises to 2000 tuples/sec, accommodating the new parallelism. The delay rises slowly, possibly due to the fact that an input data stream has a tilt phenomenon, then a plurality of delay peaks are observed, the comparison logs are used for triggering a fine-grained migration strategy to conduct partition adjustment, and after partition load is balanced, the delay drops to a normal level. At 95 seconds, the input rate rises to 4000 tuples/second, MLCC triggers elastic expansion, bringing the first delay spike, then a large number of tuples are accumulated in the process, and the delay measurement rises to around 9000 ms. The latency then trends downward, and the analysis conjectures are due to the new configuration being able to consume tuples with greater efficiency. After that, the delay gradually returns to the normal value of 16ms during the light load processing, and the delay generally shows a periodic first rising and then falling trend, which is analyzed due to the continuous change of the input data flow distribution. Overall, in the face of load fluctuations and sudden changes, the coordinated control algorithm can achieve better latency performance than the non-coordinated algorithm.

4. Analysis of optimization effect of algorithm under NEXMark load

The reconfiguration effect of MCR-Flink and Mc-Stream is tested using the NEXMark benchmark program and load, flink applies an initial parallelism setting of 64, the performance optimization effect of MCR-Flink is tested by using Q1 with stateless computation and Q3 with stateful computation, the load change of the NEXMark program can be regulated and controlled automatically, and therefore a NEXMark benchmark test lasting 60 seconds is set, and the load with changed input rate and inclination degree is generated. And meanwhile, comparing the obtained value with an uncooperative elastic algorithm for carrying out capacity expansion based on the maximum load value, and marking as NoC.

As can be seen from fig. 15, at 15 seconds, the end-to-end delay of Flink starts to rise first and then fall, checking the log, which is due to the increased input rate. And then, in 45 seconds, a small delay spike appears in Flink, and the condition that a single node becomes a resource bottleneck at the moment is known by observing the resource utilization rate of each node. Next we analyze the impact of the reconfiguration of cooperative and non-cooperative policies under this change on the end-to-end delay. Corresponding to the first delay rising peak of the Flink, the MCR-Flink and the uncoordinated elastic algorithm show similar delay change forms, trigger elastic expansion and bring a large delay spike, but because Q1 is stateless, the part of overhead is mainly formed by the execution overhead of restart and deployment tasks, and does not contain state transition overhead. Corresponding to the second delayed rising peak, we observe that the adjustment peak of MCR-Flink is significantly smaller than the Flink and the uncoordinated elastic algorithm, because the good load balancing effect corresponding to MCR-Flink avoids the new elastic expansion.

Fig. 16 shows that the load change rule same as Q1 is used in the present invention to examine the difference between the stateful computation and the stateless computation, and the end-to-end delay of Flink shows similar delay change, and two delay peaks appear. Next, observing reconfiguration performance of MCR-Flink and uncoordinated elastic algorithms, first at the first delay-rising peak of Flink, both similar to Q1, also triggered elastic expansion, but observing the delay peak of Q3 exceeding 150ms, larger than the delay peak in Q1. The additional delay rise of the portion is analyzed may be due to state transitions. At the second peak of delay, the non-cooperative algorithm is expanding until the peak of delay reaches nearly 200ms, which is caused by the fact that as Q3 runs, the state accumulation continues to increase indefinitely, and larger states bring about larger transition overhead. Similarly, MCR-Flink has higher delay spikes relative to stateless operators when performing reconfiguration on stateful operators, but is also smaller than the delay peaks of non-collaborative algorithms.

For Q1, the average delay of Flink is 21.52ms, the average delay of MCR-Flink is 19.26ms, optimized for 10.5%. For Q3, the average delay of Flink is 24.54ms, the average delay of MCR-Flink is 20.51ms, optimized for 16.41%.

It is noted that relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

While the application has been described above with reference to specific embodiments, various modifications may be made and equivalents may be substituted for elements thereof without departing from the scope of the application. In particular, the various features of the embodiments disclosed herein can be used in any combination with one another as long as no structural conflict exists, and the combination is not exhaustive in this specification for reasons of brevity and resource economy. Therefore, it is intended that the application not be limited to the particular embodiments disclosed, but that the application will include all embodiments falling within the scope of the appended claims.

Claims

1. A multi-level collaborative reconfiguration flow processing system based on Flink is characterized in that: the multi-level collaborative reconfiguration flow processing system based on the Flink adds an index monitor (1), a collaborative reconfiguration manager (2), a reconfiguration coordinator (3), a re-partition executor (4) and a subtask configuration manager (7) on the basis of an original component of a Flink flow processing platform, modifies an original collaborative adaptive scheduler into a horizontal elastic executor (5), and modifies an original resource slot distributor into a re-scheduling executor (6);

the re-partition actuator (4), the horizontal elastic actuator (5) and the re-scheduling actuator (6) form a re-configuration execution module;

the index monitor (1) is connected with a collaborative reconfiguration manager (2) and a subtask configuration manager (7), the collaborative reconfiguration manager (2) is connected with a reconfiguration coordinator (3), and the reconfiguration coordinator (3) is respectively connected with a re-partition executor (4), a horizontal elastic executor (5), a re-scheduling executor (6) and a subtask configuration manager (7).

2. The Flink-based multi-level cooperative reconfiguration flow processing system according to claim 1, wherein: the index monitor (1) is used for monitoring flow application and related indexes of a multi-level collaborative reconfiguration flow processing system based on Flink during operation, and is used for reminding the collaborative reconfiguration manager (2) that the system is overloaded when the indexes cannot meet the requirements of a service level protocol (SLAS), and supporting the operation of the multi-level collaborative reconfiguration flow processing system;

the coordinated reconfiguration manager is used for making reconfiguration indication on the monitoring information provided by the index monitor (1) and sending a reconfiguration instruction to the reconfiguration coordinator (3) so as to indicate the multi-layer coordinated reconfiguration stream processing system based on the Flink to execute reconfiguration;

the reconfiguration coordinator (3) is a central coordinator when executing reconfiguration, and is responsible for receiving the reconfiguration instruction from the collaborative reconfiguration manager (2), returning an execution result, and calling services provided by different types of reconfiguration execution modules to execute reconfiguration according to different reconfiguration options;

the re-partition executor (4) in the reconfiguration execution module is used for executing data stream re-partition, the horizontal elastic executor (5) is used for application/resource contraction and expansion, and the rescheduling executor (6) is used for task rescheduling; the reconfiguration execution module is responsible for generating a reconfiguration scheme and transmitting the reconfiguration scheme to the collaborative reconfiguration manager (2) for evaluating the overhead and the income;

the subtask configuration manager (7) is responsible for managing the configuration of a specific subtask, each subtask in the stream maintains one subtask configuration manager (7), in the monitoring stage, the subtask configuration manager (7) is responsible for collecting monitoring indexes of the subtask, including element distribution in the stream and task input and output conditions, and uploading the monitoring indexes to the index monitor (1), and in the execution stage, the subtask configuration manager (7) is responsible for receiving a remote instruction from the reconfiguration coordinator (3) and correspondingly updating the configuration at the task end.

3. A processing method of a multi-level collaborative reconfiguration flow processing system based on Flink is realized by relying on the multi-level collaborative reconfiguration flow processing system based on the Flink in claims 1-2, and is characterized by comprising the following steps:

4. The processing method of Flink-based multi-level cooperative reconfiguration flow processing system according to claim 3, wherein: the specific method for constructing the multilevel cooperative control strategy for minimizing the migration overhead in the step S1 comprises the following steps:

s1.3, judging whether the overload node still exists in the overload node set obtained in the step S1.2, if not, returning to the step of checking the set threshold index, if so, executing a flow partition strategy based on fine-grained asynchronous migration to balance the node examples, and then further judging whether the node still is overloaded;

5. The processing method of Flink-based multi-level cooperative reconfiguration flow processing system according to claim 4, wherein: in order to reduce frequent triggers of elastic adjustment due to fixed thresholds in the face of load burst fluctuations in step S1, a sliding window is used to linearly smooth the input rate.

6. The processing method of Flink-based multi-level cooperative reconfiguration flow processing system according to claim 5, wherein: s2, the specific method for constructing the flow application elastic strategy based on the computing resource perception comprises the following steps:

7. The processing method of Flink-based multi-level cooperative reconfiguration flow processing system according to claim 6, wherein: the loading method for the parallelism configuration in the step S2 comprises the following steps:

s2.1.1, setting a user code, submitting, and then processing by a stream processing systemThe application topology generated by the application layer optimization is G _LT ，V(G _LT ) For the set of vertices of the topology, vertex V _i ∈V(G _LT ) Set up c _vi And m _vi Respectively being vertex V _i Computing resource consumption and parallelism;

c _v1 ：m _v1 ＝c _v2 ：m _v2 ＝…＝c _vn ：m _vn

s2.1.3, for reconfiguration, setting p _vi Is a vertex V _i The processing capacity of the located compute node, i.e., the total amount of compute resources, to balance the compute load on each compute node, V (G) _LT ) Each vertex in (2) needs to satisfy a constraint condition, that is, the calculation amount distributed in the unit calculation node is proportional to the number of instances, and the node parallelism proportion is determined, the formula is as follows:

wherein n is | V (G) _LT ) The number of vertices in l;

s2.1.4, determining the parallelism proportion of the nodes according to the steps S2.1.2 and S2.1.3, calculating the parallelism of the rest operators according to the parallelism proportion of the nodes by determining the parallelism of any operator in the topology,

in the initial scheduling, the parallelism of a source operator is set to be 1, and then in the reconfiguration process, the data stream input rate of the source operator is monitored to be lambda _new Adjusting the parallelism of a source operator according to the input rate ratio Diff of the current period and the previous period, and calculating a new ratio to update the parallelism of the rest operators according to the following formula after adjustment:

wherein the input rate of the data stream in the last period is lambda _old ；

8. The processing method of Flink-based multi-level cooperative reconfiguration flow processing system according to claim 7, wherein: s3, the specific method for constructing the flow partition strategy based on fine-grained asynchronous migration comprises the following steps:

s3.2.4, statistics of new routing table H _new 。

9. The processing method of Flink-based multi-level cooperative reconfiguration streaming processing system according to claim 8, wherein: step S4, the construction of the load balancing task rescheduling strategy for minimizing the communication overhead comprises the following steps:

s4.1, defining constraint conditions of maximum migration overhead and maximum load distance;

s4.4, solving to obtain a mixed integer linear programming problem MILP: solving the constrained mixed integer linear programming problem and calculating the resulting allocated load distance, and if the load distance is greater than a predefined maximum load distance maxLD, then forming more partitions by reducing the maximum unit load maxlu.