CN110990059A - Stream type calculation engine operation method and system for tilt data - Google Patents

Stream type calculation engine operation method and system for tilt data Download PDF

Info

Publication number
CN110990059A
CN110990059A CN201911191154.2A CN201911191154A CN110990059A CN 110990059 A CN110990059 A CN 110990059A CN 201911191154 A CN201911191154 A CN 201911191154A CN 110990059 A CN110990059 A CN 110990059A
Authority
CN
China
Prior art keywords
stateless
data
stateful
task
operators
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911191154.2A
Other languages
Chinese (zh)
Other versions
CN110990059B (en
Inventor
程学旗
李冰
张志斌
郭嘉丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201911191154.2A priority Critical patent/CN110990059B/en
Publication of CN110990059A publication Critical patent/CN110990059A/en
Application granted granted Critical
Publication of CN110990059B publication Critical patent/CN110990059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines

Abstract

The invention provides a streaming computing engine operation method and a streaming computing engine operation system for tilt data, wherein the streaming computing engine operation method comprises the following steps: acquiring user operation containing oblique data, converting the user operation into a directed acyclic graph representing the user operation, and dividing nodes in the directed acyclic graph into a stateful operator and an stateless operator according to service processing logic of the operators in the directed acyclic graph; all the state operators are packaged into stateful tasks and then input into stateful data streams, and all the stateless operators are packaged into stateless tasks and then input into stateless data streams; copying and sending the stateless task in the stateless data stream to any computing node to obtain a stateless processing result; and converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.

Description

Stream type calculation engine operation method and system for tilt data
Technical Field
The invention belongs to the field of distributed computing, and particularly relates to the field of distributed streaming data computing.
Background
In the streaming computing engine, the source of the input data is continuously coming after the user job is started, so that the distribution of the data source is unknown. In a real scene, data is inclined, for example, in a social network, a celebrity has more fans, and a hot event has more comments. And the distribution of the skewed data may also change dynamically, as in social networks, hot events change over time, which makes the data distribution uncertain.
In the stream computing engine, the computing operators are divided into two types, namely, a state (stateful) operator and a stateless operator. When the stateless operator is used for calculation, only the current input data is considered, the context and historical calculation information do not need to be considered, and the same input data always has the same calculation output; the state operator needs to save the computation state in the computation process to perform more complex operations, and can be used in scenarios such as machine learning model training and data aggregation, and in the state operator, due to the delay sensitivity of the stream computing system, the computation state can only be accessed locally, which means that data of the same key value needs to be sent to the same node for computation. Under the condition of inclined data, the stateless operator can process any data or be dispatched to any node, and system load imbalance cannot be caused. In the state operator, because input data with the same key value can only be sent to the same node, system load imbalance may be caused in a tilted data scene.
And the streaming computing engine divides the input data of the state operator to ensure that the data with the same key value is distributed to the same node. Traditional Streaming computing engines such as Flink and Spark Streaming all use a static data splitting strategy, and the unknown property and distribution uncertainty of a data source cause that the static splitting strategy is difficult to dynamically adjust when the distribution changes, so that the system load is unbalanced, and the optimal computing performance cannot be achieved.
The existing stream computing engine uses a static data partitioning strategy, after data is partitioned, each partition must correspond to a physical computing node, the size of the data partition is uneven due to an inclined scene, and the physical node corresponding to a larger partition has a higher load, so that cluster load is unbalanced, and computing resources cannot be fully utilized. And the static partitioning strategy cannot cope with dynamic data distribution change, when the oblique data distribution is changed, the system needs to be partitioned again in order to improve the utilization rate of computing resources, and the partition operation needs to suspend the computing task of the whole cluster, so that the influence on the system performance is large. Therefore, how to solve the problem of load imbalance under the inclined data set is a technical problem to be solved in the technical field.
Disclosure of Invention
The invention aims to improve the resource utilization rate of the existing system in a data tilt scene, and provides a streaming computing engine supporting tilt data.
In view of the deficiencies of the prior art, the present invention provides a method for running a streaming computing engine for tilting data, comprising:
step 1, obtaining user operation containing oblique data, converting the user operation into a directed acyclic graph representing the user operation, and dividing nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;
step 2, packaging all the state operators into a state task and inputting a state data stream, packaging all the stateless operators into a stateless task and inputting a stateless data stream;
step 3, copying and sending the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;
and 4, converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.
The method for running the streaming computing engine for the tilt data, wherein the step 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.
The streaming computing engine operation method for the skewed data, wherein the process of packaging the state operators into the stateful tasks and the process of packaging the stateless operators into the stateless tasks in the step 2 both comprise the step of merging the associated operators by using an OP fusion rule.
The method for running the streaming computing engine for the tilt data, wherein the step 4 comprises:
step 41, the computing node selects the stateful task with the heaviest load at regular time intervals, and reallocates the CPU resource according to the following formula:
Figure BDA0002293603520000021
wherein p isnextRepresenting the number of newly allocated CPU threads of a task, pnowFor the current number of threads of the task, clastIncreasing the number of threads when the current task is scheduled last time, NcpuThe number of CPU cores.
The method for running the streaming computing engine for the tilt data, wherein the step 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.
The invention also provides a streaming computing engine operating system for tilting data, which comprises:
the method comprises the following steps that a module 1 acquires user operation containing oblique data, converts the user operation into a directed acyclic graph representing the user operation, and divides nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;
the module 2 is used for packaging all the state operators into a state task and then inputting a state data stream, and packaging all the stateless operators into a stateless task and then inputting a stateless data stream;
the module 3 copies and sends the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;
and the module 4 is used for converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.
The streaming computing engine operating system for the tilt data, wherein the module 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.
The streaming computing engine operating system for the tilted data, wherein the process of packaging the state operators into the stateful tasks and the process of packaging the stateless operators into the stateless tasks in the module 2 both comprise merging the associated operators by using an OP fusion rule.
The streaming computing engine operating system for the tilt data, wherein the module 4 comprises:
module 41, the compute node picks the stateful task with the heaviest load every fixed time interval and redistributes it to CPU resources according to the following formula:
Figure BDA0002293603520000031
wherein p isnextRepresenting the number of newly allocated CPU threads of a task, pnowFor the current number of threads of the task, clastIncreasing the number of threads when the current task is scheduled last time, NcpuThe number of CPU cores.
The streaming computing engine operating system for the tilt data, wherein the module 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.
According to the scheme, the invention has the advantages that:
the invention provides a streaming computing engine supporting tilt data. Compared with the prior art, the engine can be used for load balancing in a tilting scene and can respond to dynamic changes of data distribution in time, and the system has 8-node throughput reaching 78.5 times of Flink in an extreme tilting scene (zipfskew is 2.0).
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a state data stream segmentation diagram according to the present invention;
FIG. 3 is a schematic diagram of OP fusion rules according to the present invention;
FIG. 4 is a schematic diagram of task level migration according to the present invention;
FIG. 5 is a flow chart of the method of the present invention.
Detailed Description
In the prior art, the problem of load imbalance of operation under a tilted data set is caused by a static partitioning strategy used by a streaming computing engine, on one hand, the static partitioning strategy is difficult to enable a system to achieve load balance in a tilted scene, and on the other hand, the static partitioning strategy is difficult to quickly cope with data distribution change. The invention provides a virtual data partitioning and two-stage dynamic scheduling technology. The virtual data partitioning technology reduces the performance influence caused by data re-partitioning in the traditional system and provides support for two-stage dynamic scheduling strategies; the two-stage dynamic scheduling technology is divided into a vertical expansion part and a horizontal migration part, the vertical expansion cost is low, and the vertical expansion and contraction can be completed in millisecond level. The horizontal expansion can solve the problem of unbalanced load among machines, and the cost is higher than that of vertical expansion, but frequent triggering is not needed. Compared with the traditional single-stage scheduling mode, the two-stage scheduling mode has lower expenditure, and the scheduling frequency can be increased to better adapt to the dynamic change of the task load.
The key technical points of the invention comprise:
the key point 1 is a streaming computing engine which is segmented by using virtual data; the technical effects are as follows: the system segments the data stream according to the virtual blocks, and the data stream is only required to be scheduled without segmentation again when the data distribution is changed.
The key point 2 is a stream type calculation engine using a two-stage dynamic scheduling technology; the technical effects are as follows: the system uses a vertical and horizontal two-stage scheduling strategy, the vertical scheduling cost is low, the system does not need to be suspended, the system can be frequently used so that the resource utilization rate of the system is optimal, the horizontal scheduling cost is high, but the task is migrated between machines, so that the resource utilization rate of the whole cluster can be maximized. Compared with the traditional single-stage scheduling strategy, the two-stage scheduling strategy reduces the scheduling overhead and increases the scheduling response speed.
In order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
Compared with a master-slave architecture, the master-slave architecture has the characteristics of high throughput, low delay and the like, and the master node in the system can not become a performance bottleneck, and is only responsible for task scheduling and not responsible for task calculation. The scheduling task does not have large computational overhead, so the scheduling task does not become a performance bottleneck. Therefore, the system adopts a master-slave architecture (master is JobManager and slave is TaskManager), and the architecture of the system is shown in FIG. 1 and comprises three parts, namely a job manager JobManager, a task manager TaskManager and a State manager State Service. The JobManager is responsible for cluster computing resources and user job management; the TaskManager is responsible for calculating task execution and collecting statistical information; the State Service is responsible for the management of the computation State, including providing a computation State access interface, and performing backup, migration and recovery of the computation State. Referring to fig. 5 (the purpose of state data stream segmentation in fig. 5 is to coordinate with horizontal and vertical scheduling, and automatically schedule corresponding data slices during scheduling), the method of the present invention includes:
the first process is as follows: in the system, each data in the data stream is converted into a data record containing "key" and "value" fields. The "key" field is used as a data record partition, specified by the user job. The "value" field records the original information of the data stream. When the user job has no partition requirement, the 'key' field can be null, and the system can automatically allocate the records containing the same 'key' to the same node. The relation between the job and the task is called the job from the user perspective and called the task from the system perspective.
And a second process: after entering the system, the data streams are divided into two types, namely, state data streams and stateless data streams:
in the stateless data flow, the data is stateless data, context and historical calculation information do not need to be considered, partition requirements do not exist, the same data flow input always generates the same data flow output, therefore, the data can be scheduled to any node by a system for processing, system elastic expansion and load balancing are easy, and the stateless data flow is scheduled preferentially when the system is scheduled. Stateless jobs do not use the key field.
In the status data stream, the data includes partition requirements, data with the same "key" must be scheduled to the same node for processing, and the requirement on the system scheduler is high, so the system divides the status data stream into a fixed number of data slice slices (data slices), as shown in fig. 2. The Data slice is a basic scheduling and processing unit in the system, and can be processed by the system in parallel or scheduled to other machines for processing according to the decision of the scheduler. Dividing the Data stream into Data slices can simplify the system design and reduce the burden of the system scheduler. In addition, when the system failure is recovered, the efficiency of batch transmission and data processing is higher, and the system downtime is reduced.
Wherein the 'stateless task' scheduling is determined according to the 'stateful task' scheduling condition. The system preferentially decides how the state tasks are deployed, and finally fills the remaining computing resources with the stateless tasks. (when the load changes, the stateless task is scheduled preferentially, and the computing resource is vacated for the stateful task). The system deploys the stateless data flow processing tasks in all nodes in advance, and when the data flow needs to be scheduled, the tasks do not need to be built again. The advantage is that stateless data stream switching is low cost.
The third process: the system converts the user job into a directed acyclic graph DAG that represents the user job. Operators (operators) are nodes in the DAG and represent a business process logic in a user job. The operators are divided into a state operator and a stateless operator, the state operator is used for processing the state data stream, and the stateless operator is used for processing the stateless data stream. After the user operation is submitted, the system can analyze the user operation code, and converts the UDF (user defined function, the user operation is composed of the UDF and the connecting edge defining the relation between the UDFs) needing to use the calculation state into a state operator, and the input of the state operator is a state data stream; operators that do not require the use of computational states are converted into "stateless operators", the input stream of which is a "stateless data stream". For example: the user submits a word count job, which comprises the following steps:
the Json log is analyzed through the UDF-1, and word is extracted;
and UDF-2, counting the occurrence times of the word.
The business processing logic of the UDF-1 is a state which does not need to be calculated, the system converts the state into a stateless operator, and the corresponding input stream is a stateless data stream; the UDF-2 business processing logic needs to record the occurrence number of word, so that the state needs to be calculated, the state is taken as a state operator, and the corresponding input stream is taken as a state input stream.
The process four is as follows: the system packages the state operator and the Stateless operator into a state task (stateful stream task) and a Stateless task (Stateless stream task), and when the state operator and the Stateless operator are packaged, the associated operators are automatically merged by using an OP fusion rule, as shown in FIG. 3.
And a fifth process: the system sends the packaged tasks to the computing nodes, and as the stateless tasks can be sent and executed at any node, in order to reduce the system scheduling overhead, the system copies and sends each stateless task to all the computing nodes, and sends each stateless task to any computing node randomly and uniformly. When switching the stateless data stream, the corresponding target node must have the corresponding stateless task. Because the system schedules the stateless tasks preferentially (because the scheduling overhead is low) and the stateless data streams are switched frequently, the stateless tasks are deployed to all the nodes in advance, so that the process of task construction is omitted during switching, and the switching speed is accelerated.
The process six: after receiving the tasks, the computing nodes can automatically perform vertical telescopic adjustment on the tasks in order to maximize the utilization rate of computing resources, and adjust the occupation amount of CPU resources of each computing task. And counting the accumulation degree of the input data stream messages of each task by the computing node, wherein the more serious the accumulation indicates that the computing task is heavier. The computing node selects the computing task with the heaviest load at regular time intervals and redistributes the CPU resources according to the following formulaA source. Wherein p isnextRepresenting the number of newly allocated CPU threads of a task, pnowFor the current number of threads of the task, clastIncreasing the number of threads when the current task is scheduled last time, NcpuThe number of CPU cores.
Figure BDA0002293603520000071
The process is seven: the system JobManager can count the utilization rate of the computing resources of each computing node, when the utilization rates of the computing resources are not balanced, the system executes a horizontal scheduling strategy, the JobManager can select a task from the computing nodes with the highest load to be migrated to the nodes with the lowest load, and when the task is selected, the task with the least computing state can be selected preferentially, so that the migration time consumption is reduced. Migration process as shown in fig. 4, when task T1 needs to be migrated from node 1 to node 2, the JobManager first notifies T1 that the upstream node suspends its work, and then reconstructs task T1 on node 2, during which process the computation state DB1 of T1 automatically migrates to node 2. After the migration is completed, the node 2 sends a migration completion message to the JobManager, and finally the JobManager rewrites the upstream routing table of the T1 and resumes its work.
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a streaming computing engine operating system for tilting data, which comprises:
the method comprises the following steps that a module 1 acquires user operation containing oblique data, converts the user operation into a directed acyclic graph representing the user operation, and divides nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;
the module 2 is used for packaging all the state operators into a state task and then inputting a state data stream, and packaging all the stateless operators into a stateless task and then inputting a stateless data stream;
the module 3 copies and sends the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;
and the module 4 is used for converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.
The streaming computing engine operating system for the tilt data, wherein the module 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.
The streaming computing engine operating system for the tilted data, wherein the process of packaging the state operators into the stateful tasks and the process of packaging the stateless operators into the stateless tasks in the module 2 both comprise merging the associated operators by using an OP fusion rule.
The streaming computing engine operating system for the tilt data, wherein the module 4 comprises:
module 41, the compute node picks the stateful task with the heaviest load every fixed time interval and redistributes it to CPU resources according to the following formula:
Figure BDA0002293603520000081
wherein p isnextRepresenting the number of newly allocated CPU threads of a task, pnowFor the current number of threads of the task, clastIncreasing the number of threads when the current task is scheduled last time, NcpuThe number of CPU cores.
The streaming computing engine operating system for the tilt data, wherein the module 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.

Claims (10)

1. A method of running a streaming computing engine for tilting data, comprising:
step 1, obtaining user operation containing oblique data, converting the user operation into a directed acyclic graph representing the user operation, and dividing nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;
step 2, packaging all the state operators into a state task and inputting a state data stream, packaging all the stateless operators into a stateless task and inputting a stateless data stream;
step 3, copying and sending the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;
and 4, converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.
2. A streaming computation engine operation method for tilt data as recited in claim 1, wherein the step 4 comprises: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.
3. A method as recited in claim 1, wherein the step 2 of packing the state operators into stateful tasks and packing the stateless operators into stateless tasks comprises merging the associated operators using an OP fusion rule.
4. A streaming computation engine operation method for tilt data as recited in claim 1, wherein the step 4 comprises:
step 41, the computing node selects the stateful task with the heaviest load at regular time intervals, and reallocates the CPU resource according to the following formula:
Figure FDA0002293603510000011
wherein p isnextRepresenting the number of newly allocated CPU threads of a task, pnowFor the current number of threads of the task, clastIncreasing the number of threads when the current task is scheduled last time, NcpuThe number of CPU cores.
5. A streaming computation engine operation method for tilt data as recited in claim 1, wherein the step 4 comprises: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.
6. A streaming computing engine operating system for tilting data, comprising:
the method comprises the following steps that a module 1 acquires user operation containing oblique data, converts the user operation into a directed acyclic graph representing the user operation, and divides nodes in the directed acyclic graph into a stateful operator and a stateless operator according to service processing logic of the operators in the directed acyclic graph;
the module 2 is used for packaging all the state operators into a state task and then inputting a state data stream, and packaging all the stateless operators into a stateless task and then inputting a stateless data stream;
the module 3 copies and sends the stateless task in the stateless data stream to any computing node to obtain a stateless processing result;
and the module 4 is used for converting the stateful tasks in the stateful data stream into data records containing keys and values, distributing the data records containing the same keys to the same computing nodes to obtain stateful processing results, and collecting the stateless processing results and the stateful processing results as the running results of the user jobs.
7. A streaming computing engine execution system for tilt data as in claim 6, the module 4 comprising: the state data stream is divided into a fixed number of data pieces which can be processed by the compute nodes in parallel or scheduled to be processed by other compute nodes according to the decision of the scheduler.
8. A streaming computing engine operating system for skewed data as in claim 6, wherein the wrapping of stateful operators into stateful tasks and stateless operators into stateless tasks in module 2 comprises merging associated operators using OP fusion rules.
9. A streaming computing engine execution system for tilt data as in claim 6, the module 4 comprising:
module 41, the compute node picks the stateful task with the heaviest load every fixed time interval and redistributes it to CPU resources according to the following formula:
Figure FDA0002293603510000021
wherein p isnextRepresenting the number of newly allocated CPU threads of a task, pnowFor the current number of threads of the task, clastIncreasing the number of threads when the current task is scheduled last time, NcpuThe number of CPU cores.
10. A streaming computing engine execution system for tilt data as in claim 6, the module 4 comprising: and selecting one stateful task from the computing node with the highest load to migrate to the computing node with the lowest load at regular intervals.
CN201911191154.2A 2019-11-28 2019-11-28 Stream type calculation engine operation method and system for tilt data Active CN110990059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911191154.2A CN110990059B (en) 2019-11-28 2019-11-28 Stream type calculation engine operation method and system for tilt data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911191154.2A CN110990059B (en) 2019-11-28 2019-11-28 Stream type calculation engine operation method and system for tilt data

Publications (2)

Publication Number Publication Date
CN110990059A true CN110990059A (en) 2020-04-10
CN110990059B CN110990059B (en) 2021-11-19

Family

ID=70087973

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911191154.2A Active CN110990059B (en) 2019-11-28 2019-11-28 Stream type calculation engine operation method and system for tilt data

Country Status (1)

Country Link
CN (1) CN110990059B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201296A (en) * 2021-12-09 2022-03-18 厦门美亚亿安信息科技有限公司 Data balancing method and system based on streaming processing platform

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103491024A (en) * 2013-09-27 2014-01-01 中国科学院信息工程研究所 Job scheduling method and device for streaming data
CN103853766A (en) * 2012-12-03 2014-06-11 中国科学院计算技术研究所 Online processing method and system oriented to streamed data
CN105354089A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Streaming data processing model and system supporting iterative calculation
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof
US20160277542A1 (en) * 2015-03-18 2016-09-22 Alcatel-Lucent Usa Inc. Data stream load balancing utilizing multiple sets of servers
CN106155791A (en) * 2016-06-30 2016-11-23 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN106815071A (en) * 2017-01-12 2017-06-09 上海轻维软件有限公司 Big data job scheduling system based on directed acyclic graph
CN108241530A (en) * 2016-12-23 2018-07-03 西北大学 A kind of streaming computing bipartite graph method for scheduling task based on Storm
CN108388474A (en) * 2018-02-06 2018-08-10 北京易沃特科技有限公司 Intelligent distributed management of computing system and method based on DAG
CN110069441A (en) * 2019-03-21 2019-07-30 中国科学院计算技术研究所 A kind of FPGA network and stream calculation System and method for for stream calculation
CN110402431A (en) * 2017-03-23 2019-11-01 亚马逊科技公司 Event driven scheduling is carried out using directed acyclic graph

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103853766A (en) * 2012-12-03 2014-06-11 中国科学院计算技术研究所 Online processing method and system oriented to streamed data
CN103491024A (en) * 2013-09-27 2014-01-01 中国科学院信息工程研究所 Job scheduling method and device for streaming data
US20160277542A1 (en) * 2015-03-18 2016-09-22 Alcatel-Lucent Usa Inc. Data stream load balancing utilizing multiple sets of servers
CN105354089A (en) * 2015-10-15 2016-02-24 北京航空航天大学 Streaming data processing model and system supporting iterative calculation
CN105956021A (en) * 2016-04-22 2016-09-21 华中科技大学 Automated task parallel method suitable for distributed machine learning and system thereof
CN106155791A (en) * 2016-06-30 2016-11-23 电子科技大学 A kind of workflow task dispatching method under distributed environment
CN108241530A (en) * 2016-12-23 2018-07-03 西北大学 A kind of streaming computing bipartite graph method for scheduling task based on Storm
CN106815071A (en) * 2017-01-12 2017-06-09 上海轻维软件有限公司 Big data job scheduling system based on directed acyclic graph
CN110402431A (en) * 2017-03-23 2019-11-01 亚马逊科技公司 Event driven scheduling is carried out using directed acyclic graph
CN108388474A (en) * 2018-02-06 2018-08-10 北京易沃特科技有限公司 Intelligent distributed management of computing system and method based on DAG
CN110069441A (en) * 2019-03-21 2019-07-30 中国科学院计算技术研究所 A kind of FPGA network and stream calculation System and method for for stream calculation

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SMRITI R RAMAKRISHNAN. ETC: "Balancing reducer skew in mapreduce workloads using progressive sampling", 《SOCC‘12:PROCEEDINGS OF THE THIRD ACM SYMPOSIUM ON CLOUD COMPUTING》 *
孙大为等: "大数据流式计算:关键技术及系统实例", 《软件学报》 *
程学旗等: "大数据系统和分析技术综述", 《软件学报》 *
龙笑等: "基于布谷鸟算法的Storm集群动态负载均衡策略", 《计算机技术与发展》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114201296A (en) * 2021-12-09 2022-03-18 厦门美亚亿安信息科技有限公司 Data balancing method and system based on streaming processing platform

Also Published As

Publication number Publication date
CN110990059B (en) 2021-11-19

Similar Documents

Publication Publication Date Title
US11650971B2 (en) System and method for large-scale data processing using an application-independent framework
CN107431696B (en) Method and cloud management node for application automation deployment
Krishnamurthy et al. Pratyaastha: an efficient elastic distributed sdn control plane
Xu et al. Stela: Enabling stream processing systems to scale-in and scale-out on-demand
Gehring et al. MARS—A framework for minimizing the job execution time in a metacomputing environment
US11474874B2 (en) Systems and methods for auto-scaling a big data system
Rupprecht et al. SquirrelJoin: Network-aware distributed join processing with lazy partitioning
CN106250233B (en) MapReduce performance optimization system and optimization method
CN108572873A (en) A kind of load-balancing method and device solving the problems, such as Spark data skews
Zhang et al. The real-time scheduling strategy based on traffic and load balancing in storm
Madsen et al. Integrative dynamic reconfiguration in a parallel stream processing engine
Li et al. Enabling elastic stream processing in shared clusters
Liu et al. An adaptive online scheme for scheduling and resource enforcement in storm
Ye et al. vHadoop: A scalable Hadoop virtual cluster platform for MapReduce-based parallel machine learning with performance consideration
US20150331724A1 (en) Workload balancing to handle skews for big data analytics
CN107291550A (en) A kind of Spark platform resources dynamic allocation method and system for iterated application
Madsen et al. Dynamic resource management in a massively parallel stream processing engine
CN108089918B (en) Graph computation load balancing method for heterogeneous server structure
US20200159594A1 (en) Systems and methods for dynamic partitioning in distributed environments
Su et al. Passive and partially active fault tolerance for massively parallel stream processing engines
Madsen et al. Integrating fault-tolerance and elasticity in a distributed data stream processing system
CN110990059B (en) Stream type calculation engine operation method and system for tilt data
CN114900525A (en) Method and system for deflecting data stream
CN116302574B (en) Concurrent processing method based on MapReduce
KhudaBukhsh et al. Generalized cost-based job scheduling in very large heterogeneous cluster systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant