WO2015196940A1 - 一种流处理方法、装置及系统 - Google Patents

一种流处理方法、装置及系统 Download PDF

Info

Publication number
WO2015196940A1
WO2015196940A1 PCT/CN2015/081533 CN2015081533W WO2015196940A1 WO 2015196940 A1 WO2015196940 A1 WO 2015196940A1 CN 2015081533 W CN2015081533 W CN 2015081533W WO 2015196940 A1 WO2015196940 A1 WO 2015196940A1
Authority
WO
WIPO (PCT)
Prior art keywords
stream processing
processing component
stream
computing
component
Prior art date
Application number
PCT/CN2015/081533
Other languages
English (en)
French (fr)
Inventor
钱剑锋
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2015196940A1 publication Critical patent/WO2015196940A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/50Network service management, e.g. ensuring proper service fulfilment according to agreements
    • H04L41/5041Network service management, e.g. ensuring proper service fulfilment according to agreements characterised by the time relationship between creation and deployment of a service
    • H04L41/5054Automatic deployment of services triggered by the service manager, e.g. service implementation by automatic configuration of network components
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/61Network streaming of media packets for supporting one-way streaming services, e.g. Internet radio
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling
    • H04L65/765Media network packet handling intermediate

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a stream processing method, apparatus, and system.
  • Stream processing technology is widely used in real-time processing systems in various fields, such as stock exchanges, network monitoring, web applications, and communication data management.
  • the common feature of such systems is that data is highly real-time and has a large amount of data. Quite high, sudden, continuous and constantly changing.
  • Stream processing technology requires real-time monitoring of continuous data streams, real-time data analysis in the process of changing data, capturing information that may be useful to users, responding quickly to emergencies, and processing in real time.
  • stream data processing mainly uses distributed computing methods.
  • the distributed stream processing system includes a plurality of computing nodes, and the processing of the stream data can be completed by the plurality of computing nodes.
  • the distributed stream system allocates the stream processing component in the stream processing task to the plurality of computing nodes, wherein the stream processing component includes calculation logic of the data, so that the plurality of computing nodes can be allocated according to the allocation
  • the resulting computational logic of the stream processing component processes the stream data.
  • the embodiment of the invention provides a stream processing method, device and system for allocating a stream processing component included in a stream processing task, which can effectively reduce the computing resources required by the stream processing component allocated by the computing node to exceed the calculation.
  • the computational resources that a node can provide can lead to system instability and the probability of data processing failures, thereby improving system performance.
  • a first aspect of the present invention provides a stream processing method, the method comprising:
  • the first stream processing task including one or more stream processing groups The data input and output relationship of the stream processing component and the identifier of the stream data source;
  • the first stream processing task includes a first stream processing component that does not satisfy a preset constraint, the second stream processing component having the same computing logic as the first stream processing component is copied, The number of the second stream processing components is one or more, and the second stream processing component is added to the first stream processing task to obtain a second stream processing task; in the second stream processing task, the The second stream processing component has the same data input and output relationship with the first stream processing component, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, And the third stream processing component sends, according to the first data distribution policy, data sent by the third stream processing component to the first stream processing component to the first stream processing component and the second stream processing component, if And in the first stream processing task, there is a stream data source corresponding to the stream data source identifier that sends data to the first stream processing component, where the stream data source is based on Transmitting, by the second data distribution policy, the data sent by the flow data source to the first stream processing component to the
  • the stream processing component of the second stream processing task is assigned to a compute node in the stream processing system that satisfies the computing resources required by the stream processing component.
  • the first stream processing task further includes an operator estimation calculation amount and a stream transmission estimation calculation amount of the stream processing component;
  • the computing resources required to calculate each of the first stream processing tasks include:
  • the calculating the amount of calculation and the stream transmission according to the operator of each of the first stream processing tasks Estimating the amount of computation, calculating the computing resources required by each of the stream processing components, including:
  • the sum of the operator calculation amount of each of the stream processing components and the stream computing calculation amount of each of the stream processing components is used as a computing resource required by each of the stream processing components.
  • the computing node assigned to the stream processing component in the stream processing system that satisfies the computing resources required by the stream processing component includes:
  • the computing node assigned to the stream processing component in the stream processing system that satisfies the computing resources required by the stream processing component includes:
  • the stream processing component in the second stream processing task is allocated a computing node that satisfies the computing resources required by the stream processing component.
  • the constraint condition is that the computing resource required by the stream processing component is less than or equal to a preset value, or the computing resource required by the stream processing component is smaller than each calculation.
  • the first data allocation policy is an average allocation policy
  • the second data allocation policy is an average allocation policy
  • the resource allocation policy is an average allocation policy
  • a second aspect of the present invention provides a stream processing apparatus, including:
  • a receiving unit configured to receive a first stream processing task, where the first stream processing task includes one or more stream processing components, a data input and output relationship of the stream processing component, and an identifier of the stream data source;
  • a calculating unit configured to calculate, after the receiving unit receives the first stream processing task, computing resources required by each of the one or more stream processing components included in the first stream processing task ;
  • a copy update unit configured to: after the computing unit obtains the computing resources required by each of the stream processing components, if the first stream processing task includes a first stream that requires the computing resources that do not meet the preset constraints Processing the component, copying the second stream processing component having the same computing logic as the first stream processing component, the number of the second stream processing component is one or more, and adding the second stream processing component to In the first stream processing task, a second stream processing task is obtained; in the second stream processing task, the second stream processing component has the same data input and output relationship with the first stream processing component, and And if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component sends the third stream processing component to the first data processing policy according to the first data processing policy
  • the data of the first stream processing component is sent to the first stream processing component and the second stream processing component, if there is a direction in the first stream processing task
  • the stream data source that is sent by the first stream processing component identifies the corresponding stream data source
  • An allocating unit configured to allocate a stream processing component in the second stream processing task to the stream processing system to satisfy the stream after the copying update unit obtains the second stream processing task
  • the compute node of the computing resource required by the component
  • the first stream processing task further includes an operator estimation calculation amount of the stream processing component and a stream transmission estimation calculation amount of the stream processing component;
  • the computing unit is specifically configured to calculate, according to the operator estimation calculation amount of the stream processing component and the stream transmission estimation calculation amount of the stream processing component, according to each of the first stream processing tasks, Describe the computing resources required by each stream processing component.
  • the calculating unit includes:
  • a first calculating unit configured to: after the receiving unit receives the first stream processing task, estimate an amount of calculation according to an operator of each stream processing component and an estimated calculation amount of source code of the stream processing component, according to a preset operator calculation amount prediction function calculates an operator calculation amount of each of the stream processing components;
  • a second calculating unit configured to: after the first calculating unit calculates the operator calculation amount of each of the stream processing components, calculate the calculation amount according to the stream transmission of each of the stream processing components, according to the preset stream transmission
  • the calculation amount prediction function calculates a flow calculation amount of each of the stream processing components
  • a third calculating unit configured to calculate an operator calculation amount of each of the stream processing components and each of the stream processing components after the second computing unit calculates a streaming calculation amount of each of the stream processing components Streaming computes the sum of the computational resources required for each of the stream processing components.
  • the allocating unit includes:
  • a sorting unit configured to: after the second update processing unit obtains the second stream processing task, process the stream according to a sequence of computing resources required by the stream processing component in the second stream processing task Component sorting;
  • a component allocation unit configured to, after the sorting unit performs sorting, assign the stream processing component to the computing node in the stream processing system that satisfies a computing resource required by the stream processing component according to the sorting,
  • the computing node is a computing node with the smallest proportion of computing resources of the stream processing component on each computing node, where the computing resource ratio is a computing resource required by the stream processing component and a computing resource used by the computing resource node And the ratio of the total computing resources of the computing node.
  • the allocating unit includes:
  • a determining unit configured to: after the copy update unit obtains the second stream processing task, according to Determining a classification model to determine a type of the second stream processing task;
  • a searching unit configured to: after the determining unit determines the type of the second stream processing task, search for a correspondence table between the preset task type and the allocation manner, and determine an allocation corresponding to the type of the second stream processing task the way;
  • a node allocating unit configured to allocate, according to the corresponding allocation manner, a stream processing component in the second stream processing task to meet a requirement of the stream processing component, after the determining unit determines the corresponding allocation mode The compute node of the computing resource.
  • the constraint condition is that the computing resource required by the stream processing component is less than or equal to a preset value, or the computing resource required by the stream processing component is less than the remaining computing in all computing nodes.
  • the computing resource that the resource with the largest resource can provide, or the computing resource required by the stream processing component, is smaller than the average of the remaining computing resources of the computing node.
  • the first data allocation policy is an average allocation policy
  • the second data allocation policy is an average allocation policy
  • the resource allocation policy is an average allocation policy
  • a third aspect of the present invention provides a stream processing system including: a stream processing apparatus and a plurality of computing nodes, wherein:
  • the stream processing device is configured to: receive a first stream processing task, where the first stream processing task includes one or more stream processing components, data input and output relationships of the stream processing component, and identifiers of stream data sources; The computing resources required by each of the one or more stream processing components included in the first stream processing task; if the first stream processing task includes the required computing resources that do not meet the preset constraints a first stream processing component, copying a second stream processing component having the same computing logic as the first stream processing component, the number of the second stream processing component is one or more, and processing the second stream Adding a component to the first stream processing task to obtain a second stream processing task; in the second stream processing task, the second stream processing component has the same data input and output relationship as the first stream processing component And if there is an orientation in the first stream processing task Translating, by the first stream processing component, a third stream processing component that sends data, the third stream processing component sending, according to the first data distribution policy, data sent by the third stream processing component to the first stream processing component to the a first stream processing
  • the computing node is configured to: accept a stream processing component allocated by the stream processing device, and process data sent to the stream processing component according to a computing logic of the stream processing component.
  • the first stream processing task further includes an operator estimation calculation amount and a stream transmission estimation calculation amount of the stream processing component;
  • the stream processing component is specifically configured to: calculate an amount of calculation of the operator of each of the stream processing components and an estimated calculation amount of the source code of the stream processing component according to the preset operator calculation function Calculating an operator calculation amount of each stream processing component; calculating a calculation amount of the stream transmission of each of the stream processing components, and calculating a stream computing calculation amount of each of the stream processing components according to a preset stream computing calculation amount prediction function And a sum of an operator calculation amount of each of the stream processing components and a streaming calculation amount of each of the stream processing components as a computing resource required by each of the stream processing components.
  • the constraint condition is that the computing resource required by the stream processing component is less than or equal to a preset The value, or the computing resource required by the stream processing component is smaller than the largest idle computing resource that each computing node can provide, or the computing resource required by the stream processing component is smaller than the average of the idle computing resources of each computing node. .
  • the first data allocation policy is an average allocation policy
  • the second data allocation policy is an average
  • the policy is assigned
  • the resource allocation policy is an average allocation policy
  • the stream processing device receives a first stream processing task, where the first stream processing task includes one or more stream processing components, a data input and output relationship of the stream processing component, and an identifier of the stream data source, and the first The computing resource required by each of the stream processing components, if the first stream processing task includes the first stream processing component that the required computing resource does not satisfy the preset constraint, copying the first stream processing component a second stream processing component having the same computing logic, the number of the second stream processing component being one or more, adding the second stream processing component to the first stream processing task, to obtain a second stream processing task;
  • the second stream processing component has the same data input and output relationship as the first stream processing component, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task,
  • the third stream processing component sends the data sent by the third stream processing component to the first stream processing component to the first stream processing component and the second stream processing component according to the first data distribution policy, if there is a first stream in the first
  • the first stream processing component and the second stream processing component; the stream processing component in the second stream processing task is allocated to the computing node in the stream processing device that satisfies the computing resource required by the stream processing component.
  • the stream processing component in the second stream processing task is allocated to the computing node in the stream processing device that satisfies the computing resource required by the stream processing component.
  • FIG. 1 is a schematic diagram of a stream processing task in the prior art
  • FIG. 2 is a schematic diagram of a stream processing method according to an embodiment of the present invention.
  • FIG. 3a is a schematic diagram of a first stream processing task according to an embodiment of the present invention.
  • FIG. 3b is a schematic diagram of a second stream processing task according to an embodiment of the present invention.
  • FIG. 4 is another schematic diagram of a stream processing method according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram showing the structure of a stream processing apparatus according to an embodiment of the present invention.
  • FIG. 6 is another schematic diagram of a structure of a stream processing apparatus according to an embodiment of the present invention.
  • FIG. 7 is a schematic diagram showing the structure of a stream processing system according to an embodiment of the present invention.
  • FIG. 8 is another schematic diagram showing the structure of a stream processing apparatus according to an embodiment of the present invention.
  • the embodiment of the invention provides a stream processing method, device and system for allocating a stream processing component included in a stream processing task, which can effectively reduce the computing resources required by the stream processing component allocated by the computing node to exceed the calculation.
  • the computational resources that a node can provide can lead to system instability and the probability of data processing failures, thereby improving system performance.
  • a stream processing component can receive output data of a stream data source or other stream processing component, and a stream data source provides a data stream
  • the stream data source includes a stream data source A and B
  • the output data of the stream data source A is sent to the stream processing component A
  • the output data of the stream data source B is sent to the stream processing components A and B
  • the stream processing components A and B send the output data to the stream processing.
  • Component C wherein the stream processing component can be constructed with a plurality of stream processing units, and in FIG. 1, the stream processing component includes stream processing units A1 to Ai.
  • a flow processing method is applied to a stream processing apparatus according to an embodiment of the present invention.
  • the method includes:
  • the user may submit a first stream processing task to the stream processing device, where the first stream processing task includes one or more stream processing components, a data input and output relationship of the stream processing component, and an identifier of the stream data source.
  • the first stream processing task may further include an identifier of the storage device, where the stream processing component carries calculation logic for processing the data, for example, the calculation logic may be data filtering, summation, averaging, selecting feature values, and the like.
  • the data input and output relationship of the stream processing component refers to which stream processing component or stream data source is input to the stream processing component, and the stream processing component output data is sent to which stream processing component or storage device.
  • the data input and output relationship of the stream processing component refers to which stream processing component and stream data source are input by the stream processing component, and the stream processing component output data is sent to which stream processing component or storage device.
  • the stream processing component A and the stream data source B input data to the stream processing component C, and the data is sent to the stream processing component D after the stream processing component C, and the input relationship of the stream processing component C includes the stream processing component A and the stream.
  • the data source, the output relationship of the stream processing component C includes a stream processing component D.
  • the stream processing device may calculate the computing resources required by each of the one or more stream processing components included in the first stream processing task, and specifically, the stream processing device may be based on the preset prediction.
  • the function calculates the computing resources required by each of the stream processing components in the stream processing task, wherein the pre-set prediction function can be set as needed, which is not limited by the embodiment of the present invention.
  • the first stream processing task includes the first stream processing component that the required computing resource does not satisfy the preset constraint condition, copy the second stream processing component having the same computing logic as the first stream processing component, and the second stream processing component The number of the ones is one or more, and the second stream processing component is added to the first stream processing task to obtain the second stream processing task;
  • the flow processing device obtains the computing resources required by each of the first stream processing tasks, if the required computing resources included in the first stream processing task do not meet the preset constraints,
  • the first stream processing component copies the second stream processing component having the same computing logic as the first stream processing component, the number of the second stream processing component is one or more, and the second stream processing component is added to the In the first-class processing task, a second stream processing task is obtained, in which the second stream processing component has the same data input and output as the first stream processing component.
  • the third stream processing component sends the third stream processing component to the first stream processing component according to the first data distribution policy
  • the data is sent to the first stream processing component and the second stream processing component. If there is a stream data source corresponding to the stream data source identifier that sends data to the first stream processing component in the first stream processing task, the stream data source is allocated according to the second data.
  • the policy sends the data sent by the flow data source to the first stream processing component to the first stream processing component and the second stream processing component, and the computing resources required by the first stream processing component in the first stream processing task are divided into the second according to the resource allocation policy.
  • the first stream processing component and the second stream processing component in the stream processing task are if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task.
  • the first data distribution policy may be an average allocation policy, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component is allocated according to the average.
  • the policy sends the data sent by the third stream processing component to the first stream processing component to the first stream processing component and the second stream processing component; for example, if the first stream processing component in the first stream processing task requires a computing resource of K, The number of sums of the first stream processing component and the second stream processing component having the same computing logic as the first stream processing component in the second stream processing task is N, and the first stream processing component and the second stream in the second stream processing task
  • the computing resources required to process the components are K/N.
  • the second data distribution policy may also be an average allocation policy. If there is a stream data source corresponding to the stream data source identifier that sends data to the first stream processing component in the first stream processing task, the stream data source according to the average allocation policy The data sent by the streaming data source to the first stream processing component is sent to the first stream processing component and the second stream processing component.
  • the resource allocation policy may also be an average allocation policy, and the computing resources required by the first stream processing component in the first stream processing task are divided according to the average allocation policy.
  • the first stream processing component and the second stream processing component of the second stream processing task are given.
  • the first data allocation policy may also be a random allocation policy, a parity allocation policy, or allocated according to a preset ratio
  • the second data allocation policy may also be randomly allocated.
  • the policy, the parity allocation policy or the allocation is performed with reference to a preset ratio, and the first data distribution policy and the second data distribution policy may be the same or different.
  • the resource allocation policy is related to the first data distribution policy and the second data distribution policy, for example, if both the first data allocation policy and the second data allocation policy are For the average allocation policy, the resource allocation policy is also an average allocation policy. If the first data allocation policy is a random allocation policy and the second data allocation policy is a parity allocation policy, the resource allocation policy is an average allocation policy.
  • the constraint may be preset according to specific needs.
  • the preset constraint may be: the computing resource required by the stream processing component is less than or equal to a preset value, or the stream processing component
  • the required computing resources are smaller than the largest idle computing resources that the computing nodes in the stream processing device can provide, or the computing resources required by the stream processing component are smaller than the average of the idle computing resources of the computing nodes in the stream processing device.
  • constraints can be set according to specific needs, which are not limited herein.
  • the constraint condition set in advance is: the computing resource required by the stream processing component is greater than a preset value, or the computing resource required by the stream processing component is greater than or equal to the stream processing device.
  • the stream processing device allocates the obtained stream processing component in the second stream processing task to the computing node in the stream processing device that satisfies the computing resource required by the stream processing component, and the data stream needs to be described.
  • the direction of data input and output on the compute node and the flow assigned by the compute node The direction of data input and output between the components is the same.
  • the stream processing device receives the first stream processing task, and calculates the computing resources required by each of the stream processing components in the first stream processing task, if the required computing resources are not satisfied in the first stream processing task.
  • a first stream processing component of the preset constraint copying a second stream processing component having the same computing logic as the first stream processing component, and adding the second stream processing component to the first stream processing task to obtain a second stream processing
  • the third stream processing component sends the data according to the first data distribution policy The first stream processing component and the second stream processing component, if there is a stream data source
  • FIG. 4 is an embodiment of a flow processing method according to an embodiment of the present invention, including:
  • the stream processing apparatus may receive a first stream processing task submitted by the user, where the first stream processing task includes one or more stream processing components, a data input and output relationship of the stream processing component, and an identifier of the stream data source. , the operator calculation amount of the stream processing component and the flow calculation amount.
  • the operator estimation calculation amount refers to the calculation amount required for processing the unit data according to the calculation logic of the stream processing component
  • the stream transmission estimation calculation amount refers to the calculation amount required for the transmission estimation of the unit data
  • the unit data is Refers to the data transmitted in a unit of time, the unit data is related to the speed of the stream data source output data.
  • the operator estimation calculation amount of the stream processing component included in the first stream processing task and The stream estimation component of the stream processing component can be used to calculate the computing resources required by the stream processing component. If other resources of the stream processing component, such as memory resources, network bandwidth resources, etc., need to be calculated, they can be carried in the stream processing task.
  • the resource type related parameter that needs to be calculated is a technical solution for calculating a computing resource description required by the stream processing component in the embodiment of the present invention. In an actual application, the user can set the stream processing component by setting a parameter in the stream processing task. The specific type of resources required is not limited here.
  • the stream processing task includes an operator estimation calculation amount of each stream processing component of the first stream processing task and a stream transmission estimation calculation amount of the stream processing component, and the stream processing device uses the first stream processing task.
  • the operator of each stream processing component estimates the computation amount and the stream estimation computation amount, and calculates the computation resources required by each stream processing component.
  • the computing resources required by each stream processing component may be calculated by using a preset prediction function, and the types of resources of the stream processing component that need to be calculated are different, and the prediction function that needs to be used is used. It is also different. In the actual application, the prediction function required for calculating different resources is preset in the stream processing component, which is not limited herein.
  • the computing resource required by the stream processing device to calculate each stream processing component may be:
  • the stream processing device may pre-estimate the estimated calculation amount of the source code of each of the first stream processing components included in the stream processing task after receiving the stream processing task, and as a reference calculation method,
  • the preset operator calculation amount prediction function can be:
  • i denotes the i-th stream processing component in the first stream processing task
  • a and b are preset adjustment parameters
  • Vi represents the operator calculation amount of the i-th stream processing component
  • the i-th stream processing component denoted by Pi
  • the operator estimates the amount of computation
  • Mi represents the estimated computational amount of the source code of the i-th stream processing component of the stream processing device.
  • the stream processing device can also monitor each stream processing component, obtain the monitored operator calculation amount of each stream processing component, and determine the stream processing component based on the monitored operator calculation amount.
  • the preset operator calculation amount prediction function can be:
  • Ki represents the monitored operator calculation amount of the i-th stream processing component.
  • the streaming calculation calculation function can be referred to as follows:
  • Ei represents a stream transfer calculation amount of the i-th stream processing component
  • d is a preset adjustment parameter
  • Fi represents a stream transmission estimation calculation amount of the i-th stream processing component.
  • the stream processing device may also monitor each stream processing component to obtain the monitored value of each stream processing component.
  • the calculation amount of the stream is calculated, and based on the monitored stream transfer calculation amount, the stream transfer calculation amount of the stream processing component is determined, and the stream transfer calculation amount prediction function can be referred to as follows:
  • e is a preset adjustment parameter
  • Gi represents a streamed calculation amount of the monitored i-th stream processing component.
  • the stream processing device can use the sum of Ei and Vi as the computing resource required by the i-th stream processing component.
  • first stream processing task includes the first stream processing component that the required computing resource does not satisfy the preset constraint, copy the second stream processing component having the same computing logic as the first stream processing component, and the second stream processing component The number is one or more, and the second stream processing component is added to the first stream processing task to obtain the second stream processing task;
  • the stream processing device determines whether the first stream processing task includes the required computing resources.
  • the first stream processing component whose source does not satisfy the preset constraint condition, and if the first stream processing task does not satisfy the preset constraint condition, copy the second stream processing component having the same calculation logic as the first stream processing component, and the second stream
  • the number of processing components is one or more, and the second stream processing component is added to the first stream processing task to obtain a second stream processing task.
  • the second stream processing component has the same data input and output relationship with the first stream processing component, and if there is a third stream processing that sends data to the first stream processing component in the first stream processing task a component, the third stream processing component sends the data sent by the third stream processing component to the first stream processing component to the first stream processing component and the second stream processing component according to the first data distribution policy, if there is a direction in the first stream processing task
  • the stream data source of the first stream processing component sends the data identifier corresponding to the stream data source, and the stream data source sends the data sent by the stream data source to the first stream processing component to the first stream processing component and the second stream processing according to the second data distribution policy.
  • the computing resources required by the first stream processing component and the second stream processing component of the second stream processing task are obtained by dividing the computing resources required by the first stream processing component in the first stream processing task according to the resource allocation policy.
  • the first stream processing component in the embodiment of the present invention may be a stream processing component, or may be multiple stream processing components, and if multiple stream processing components, the stream processing device will be the multiple streams respectively.
  • Each stream processing component in the processing component copies a stream processing component with the same computational logic.
  • the number of the second stream processing components that have the same computing logic as the first stream processing component may be set in advance or set as needed, which is not limited herein.
  • the stream processing device allocates the stream processing component in the second stream processing task to the computing node in the stream processing system that satisfies the computing resource required by the stream processing component.
  • the method includes: sorting the stream processing components in descending order of the computing resources required by the stream processing component in the second stream processing task, and assigning the stream processing components to the stream processing component in the stream processing system according to the sorting.
  • the computing node of the computing resource is required, wherein the computing node allocated by the stream processing component is a computing node with the smallest proportion of computing resources of the stream processing component on each computing node, and the computing resource ratio is a computing resource required by the stream processing component.
  • the sum of the computing resources used by the computing resource accounts for the calculation The ratio of the total computing resources of the node.
  • Step 404 is also performed according to the following process:
  • the stream processing device sorts the computing resources required by the stream processing component in the second stream processing task from large to small, and obtains the sorted stream processing component set S;
  • the initial value 1, i is less than or equal to N, N is the number of stream processing components included in the stream processing component set S, and H is a set of computing resources of the computing node, and the following steps are performed:
  • the flow processing device calculates the proportion of the computing resources occupied by the computing resources required by the stream processing component Si on each computing node in the set Hi;
  • Tik represents the computing resource ratio of the stream processing component Si on the kth computing node of the set Hi
  • B'k represents the computing resource used by the kth computing node of the set Hi
  • Bk represents the kth calculation of the set Hi.
  • the total computational resource of the node, SCost(Si) represents the computational resources required by the stream processing component Si.
  • the stream processing device obtains the proportion of the computing resources occupied by the stream processing component Si on each computing node in the set Hi, and then allocates the stream processing component Si to the computing node with the smallest proportion of computing resources in the set Hi.
  • the stream processing apparatus may allocate each of the second stream processing tasks to the computing node according to the above steps 1) to 5), and the computing nodes can satisfy the allocated stream processing component.
  • the foregoing steps 1) to 5) are only a feasible allocation manner of the stream processing component.
  • the second stream may be processed according to the type of the second stream processing task.
  • the stream processing component in the processing task is allocated.
  • the stream processing component in the second stream processing task is allocated to the computing node in the stream processing device that satisfies the computing resource required by the stream processing component, specifically:
  • the pre-set classification model determines the type of the second stream processing task, and searches for a correspondence table between the preset task type and the allocation manner, and determines an allocation manner corresponding to the type of the second stream processing task, according to the corresponding allocation manner,
  • a flow processing component in the second stream processing task is assigned a compute node that satisfies the computing resources required by the stream processing component.
  • the pre-set classification model is a model obtained based on a feature classification algorithm of multiple stream processing tasks, wherein the classification algorithm may include a decision tree, a Yebes classifier, a support vector machine, etc., and the stream processing device uses the classification model. In the process of determining the stream processing task, the classification model can also be improved by learning.
  • the allocation mode corresponding to the task type may be set according to the specific needs, and the above-mentioned steps 1) to 5) are one of the feasible allocation modes, which is not limited herein.
  • the stream processing component in the process of allocating the stream processing component in the second stream processing task to the computing node, in the second stream processing task, there are still constraints that are not satisfied by the preset.
  • the stream processing component can continue to copy the stream processing component with the same computational logic as the non-satisfying stream processing component.
  • the stream processing device uses the operator estimation calculation amount and the stream transmission estimation calculation amount of each stream processing component included in the first stream processing task to calculate each stream processing.
  • the computing resource required by the component and if the first stream processing task includes the first stream processing component that the required computing resource does not satisfy the preset constraint, copying at least one second having the same computing logic as the first stream processing component Flow processing component, and adding the second stream processing component to the first stream processing task to obtain a second stream processing task, because in the second stream processing task, the data input and output relationship of the second stream processing component and the first stream processing
  • the components are the same, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component sends the third stream processing component to the first stream processing component according to the first data distribution policy.
  • the data is sent to the first stream processing component and the second stream processing component, if in the first stream processing task
  • the stream data source that sends the data to the first stream processing component identifies the corresponding stream data source, and the stream data source sends the data sent by the stream data source to the first stream processing component to the first stream component and the second according to the second data distribution policy.
  • a stream processing component, and the computing resources required by the first stream processing component in the first stream processing task are allocated to the first stream processing component and the second stream in the second stream processing task according to the resource allocation policy
  • the component is configured to distribute data originally sent to the first stream processing component between the first stream processing component and the second stream processing component, which reduces the computing resources required by the first stream processing component, and can effectively reduce the allocation by the computing node.
  • the computing resources required by the stream processing component exceed the computing resources that the computing node can provide, resulting in system instability and data processing failure probability, thereby improving system performance.
  • FIG. 5 it is a schematic diagram of a structure of a stream processing system according to an embodiment of the present invention, including:
  • the receiving unit 501 is configured to receive a first stream processing task, where the first stream processing task includes one or more stream processing components, a data input and output relationship of the stream processing component, and an identifier of the stream data source.
  • the calculating unit 502 is configured to calculate, after the receiving unit 501 receives the first stream processing task, a calculation required by each of the one or more stream processing components included in the first stream processing task Resource
  • a copy update unit 503 configured to: after the computing unit 502 obtains the computing resources required by each of the stream processing components, if the first stream processing task includes the required computing resources that do not meet the preset constraints
  • the first stream processing component copies a second stream processing component having the same computing logic as the first stream processing component, the number of the second stream processing component is one or more, and the second stream processing component is added to In the first stream processing task, a second stream processing task is obtained; in the second stream processing task, the second stream processing component has the same data input and output relationship with the first stream processing component, and And if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component sends the third stream processing component to the first data processing policy according to the first data processing policy
  • the data of the first stream processing component is sent to the first stream processing component and the second stream processing component, if there is a direction in the first stream processing task
  • the stream data source that is sent by the first stream processing component identifies the corresponding
  • the allocating unit 504 is configured to allocate the stream processing component in the second stream processing task to the satisfaction in the stream processing system after the copy update unit 503 obtains the second stream processing task A compute node that computes the computational resources required by the processing component.
  • the receiving unit 501 receives the first stream processing task, where the first stream processing task includes one or more stream processing components, data input and output relationships of the stream processing component, and identifiers of the stream data source.
  • the calculating unit 502 calculates the computing resources required by each of the first stream processing tasks; and if the first stream processing task includes the first stream that the required computing resources do not satisfy the preset constraints Processing the component, the replication update unit 503 replicating at least one second stream processing component having the same computing logic as the first stream processing component, and adding the second stream processing component to the first stream processing task to obtain a a second stream processing task, wherein the second stream processing component has the same data input and output relationship with the first stream processing component, and if present in the first stream processing task a third stream processing component that sends data to the first stream processing component, and then the third stream processing component is configured according to the first data Sending, to the first stream processing component and the second stream processing component, the data sent by the third stream processing component to the first stream processing component to the
  • the stream processing device receives the first stream processing task, and calculates the computing resources required by each of the stream processing components in the first stream processing task, if the required computing resources are not satisfied in the first stream processing task.
  • a first stream processing component of the preset constraint copying at least one second stream processing component having the same computing logic as the first stream processing component, and adding the second stream processing component to the first stream processing task to obtain a second Streaming the processing task, and allocating the stream processing component in the second stream processing task to the computing node in the stream processing device that satisfies the computing resource required by the stream processing component, by copying at least one first stream that does not satisfy the constraint
  • the processing component has a second stream processing component of the same computing logic, and since the second stream processing component has the same data input and output relationship as the first stream processing component, and if there is a first stream processing component, the first stream processing component is sent to the first stream processing component a third stream processing component of the data, the third stream processing component sends the data sent by the
  • the computing resources required by the first stream processing component in the first stream processing task are allocated to the first stream processing component and the second stream processing component in the second stream processing task according to the resource allocation policy, so that The computing resources required by the first stream processing component that do not satisfy the constraint are allocated between the first stream processing component and the second stream processing component, which reduces the computing resources required by the first stream processing component, and can effectively reduce the flow allocated by the computing node.
  • the computing resources required to process the component exceed the computing resources that the compute node can provide. Data processing system instability and the probability of failure, thereby improving system performance.
  • an embodiment of a structure of a stream processing system includes a receiving unit 501, a calculating unit 502, a copy updating unit 503, and an allocating unit 504, as described in the embodiment shown in FIG.
  • the schemes described in the embodiment shown in FIG. 5 are similar and will not be described again here.
  • the first stream processing task further includes an operator estimation calculation amount of the stream processing component and a stream transmission estimation calculation amount of the stream processing component;
  • the calculating unit 502 is specifically configured to calculate, according to the operator estimation calculation amount of the stream processing component and the stream transmission estimation calculation amount of the stream processing component, according to each of the first stream processing tasks.
  • Each of the stream processing components requires computing resources.
  • the computing unit 502 includes:
  • a first calculating unit 601 configured to: after the receiving unit 501 receives the first stream processing task, estimate an amount of calculation and an estimated calculation amount of source code of the stream processing component according to each stream processing component operator, Calculating an operator calculation amount of each of the stream processing components according to a preset operator calculation amount prediction function;
  • a second calculating unit 602 configured to: after the first calculating unit 601 calculates an operator calculation amount of each of the stream processing components, calculate a calculation amount according to the streaming of each of the stream processing components, according to a preset
  • the streaming calculation amount prediction function calculates a streaming calculation amount of each of the stream processing components
  • a third calculating unit 603, configured to calculate, in the second calculating unit 602, each of the stream processing groups After streaming the computational amount, the sum of the operator computation amount of each of the stream processing components and the streaming computation amount of each of the stream processing components is used as a computing resource required by each of the stream processing components.
  • the allocating unit 504 includes:
  • a sorting unit 604 configured to sort the computing resources required by the stream processing component in the second stream processing task from the largest to the smallest after the second update processing unit 503 obtains the second stream processing task;
  • a component allocating unit 605, configured to, after the sorting unit 604 performs sorting, assign the stream processing component to the computing node in the stream processing system that satisfies the computing resource required by the stream processing component according to the sorting,
  • the computing node is a computing node with the smallest proportion of computing resources of the stream processing component on each computing node, and the computing resource ratio is a computing resource required by the stream processing component and a computing resource used by the computing resource node. The sum of the resources and the total computing resources of the computing node.
  • the allocating unit includes:
  • a determining unit 606, configured to determine, by using the preset classification model, a type of the second stream processing task after the copy update unit 503 obtains the second stream processing task;
  • the searching unit 607 is configured to: after the determining unit 606 determines the type of the second stream processing task, search for a correspondence table between the preset task type and the allocation manner, and determine that the type corresponding to the second stream processing task is Distribution method;
  • a node allocating unit 608, configured to allocate, after the determining unit 607 determines the corresponding allocation manner, the flow processing component in the second stream processing task to satisfy the flow processing component according to the corresponding allocation manner The compute node of the required computing resource.
  • the receiving unit 501 receives a first stream processing task, where the first stream processing task includes one or more stream processing components, data input and output relationships of the stream processing component; and then, the calculating unit 502 calculates The computing resource required by each of the first stream processing tasks, in particular, the first calculating unit 601 in the calculating unit 502 estimates the calculated amount and the stream processing according to each of the stream processing component operators.
  • An estimated calculation amount of the source code of the component calculating an operator calculation amount of each of the stream processing components according to a preset operator calculation amount prediction function; and then the second calculation unit 602 is configured to perform streaming according to each of the stream processing components Estimate the amount of calculation, according to the preset
  • the streaming calculation amount prediction function calculates a streaming calculation amount of each of the stream processing components; and is calculated by the third calculation unit 603 according to the operator calculation amount of each of the stream processing components and the stream of each of the stream processing components Transmitting the computational amount yields the computational resources required by each of the stream processing components.
  • the copy updating unit 503 copies at least one second having the same computing logic as the first stream processing component.
  • a stream processing component adding the second stream processing component to the first stream processing task, obtaining a second stream processing task in the second stream processing task, the second stream processing component and the first stream processing component Having the same data input and output relationship, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component is allocated according to the first data Transmitting, by the policy, data sent by the third stream processing component to the first stream processing component to the first stream processing component and the second stream processing component, if the first stream processing task exists in the first stream processing task
  • the stream data source that sends the data by the first-class processing component identifies the corresponding stream data source, and the stream data source according to the second data distribution policy Data sent by the data source to the first stream processing component is sent to the first stream processing component and the
  • the distribution unit 504 can allocate the stream processing component as follows:
  • the sorting unit 604 sorts the computing resources required by the stream processing components in the second stream processing task from large to small, and the component allocating unit 605 allocates the stream processing components to the stream according to the ordering.
  • a computing node in the processing system that satisfies the computing resources required by the stream processing component, where the computing node is a computing node with the smallest proportion of computing resources of the stream processing component on each computing node, and the computing resource ratio is The ratio of the computing resources required by the stream processing component to the computing resources used by the computing resource node to the total computing resources of the computing node.
  • allocation unit 504 can allocate the stream processing component as follows:
  • the determining unit 606 determines the type of the second stream processing task by using a preset classification model; then, the searching unit 607 searches for a correspondence table of the preset task type and the allocation manner, and determines the type of the second stream processing task.
  • Corresponding allocation method; and by node allocating unit 608 according to The corresponding allocation manner is to allocate, to the stream processing component in the second stream processing task, a computing node that satisfies the computing resources required by the stream processing component.
  • the stream processing device uses the operator estimation calculation amount and the stream transmission estimation calculation amount of each stream processing component included in the first stream processing task to calculate each stream processing.
  • the computing resource required by the component and if the first stream processing task includes the first stream processing component that the required computing resource does not satisfy the preset constraint, copying at least one second having the same computing logic as the first stream processing component Flow processing component, and adding the second stream processing component to the first stream processing task to obtain a second stream processing task, because in the second stream processing task, the data input and output relationship of the second stream processing component and the first stream processing
  • the components are the same, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component sends the third stream processing component to the first stream processing component according to the first data distribution policy.
  • the data is sent to the first stream processing component and the second stream processing component, if in the first stream processing task
  • the stream data source that sends the data to the first stream processing component identifies the corresponding stream data source, and the stream data source sends the data that is sent by the stream data source to the first stream processing component to the first stream processing component and the second according to the second data distribution policy.
  • the computing resources required by the first stream processing component in the first stream processing task are allocated to the first stream processing component and the second stream processing component in the second stream processing task according to the resource allocation policy, so that the first stream processing component is sent to the first stream
  • the data of the processing component can be allocated between the first stream processing component and the second stream processing component, which reduces the computing resources required by the first stream processing component, and can effectively reduce the computing resources required by the stream processing component allocated by the computing node.
  • the computing resources that the computing node can provide result in system instability and data processing failures, thereby improving system performance.
  • FIG. 7 is a structural diagram of a stream processing system according to an embodiment of the present invention, including:
  • the computing node provided by the embodiment of the present invention may be a cloud server of a cloud computing center, or a data processing server of a general data processing center, or a data processing server of a big data processing center, etc., which is not limited by the embodiment of the present invention.
  • the stream processing device 701 is configured to: receive a first stream processing task, where the first stream processing task includes one or more stream processing components, data input and output relationships of the stream processing component, and identifiers of stream data sources; Each of the first stream processing tasks requires computing resources of the component; if the first stream processing task includes the required computing resources, the constraint bar does not meet the preset constraint a first stream processing component of the piece, copying at least one second stream processing component having the same computing logic as the first stream processing component, and adding the second stream processing component to the first stream processing task to obtain a a second stream processing task, wherein the second stream processing component has the same data input and output relationship with the first stream processing component, and if present in the first stream processing task a third stream processing component that sends data to the first stream processing component, where the third stream processing component sends data sent by the third stream processing component to the first stream processing component according to a first data distribution policy And the first stream processing component and the second stream processing component, if there is a stream data source corresponding to the stream data
  • the computing node 702 is configured to: accept a stream processing component allocated by the stream processing device 701, and process data sent to the stream processing component according to calculation logic of the stream processing component.
  • the first stream processing task further includes an operator estimation calculation amount and a stream transmission estimation calculation amount of the stream processing component.
  • the stream processing component is specifically configured to: calculate, according to the operator estimation calculation amount of each of the stream processing components and the estimated calculation amount of the source code of the stream processing component, calculate the each according to a preset operator calculation amount prediction function An operator calculation amount of a stream processing component; calculating, according to the stream estimation estimated calculation amount of each stream processing component, calculating a stream transmission calculation amount of each of the stream processing components according to a preset stream transmission calculation amount prediction function;
  • the sum of the operator calculation amount of each of the stream processing components and the streaming calculation amount of each of the stream processing components is a computing resource required by each of the stream processing components.
  • the first data allocation policy may be an average allocation policy
  • the second data allocation policy may be an average allocation policy
  • the resource allocation policy may be an average allocation policy.
  • Other allocation strategies may be adopted in other embodiments, and are not described herein again.
  • the constraint condition is that the computing resource required by the stream processing component is less than or equal to a preset value, or the computing resource required by the stream processing component is smaller than the largest idle computing resource that each computing node can provide, or , the computing resources required by the stream processing component are smaller than the individual calculations The average of the node's idle computing resources.
  • the stream processing apparatus 701 receives the first stream processing task, and calculates computing resources required by each stream processing component in the first stream processing task, if the first stream processing task includes the required calculation
  • the first stream processing component whose resource does not satisfy the preset constraint condition copies at least one second stream processing component having the same computing logic as the first stream processing component, and adds the second stream processing component to the first stream processing task, Obtaining a second stream processing task, and allocating the stream processing component in the second stream processing task to the computing node in the stream processing device that satisfies the computing resource required by the stream processing component, by copying at least one and not satisfying the constraint
  • the first stream processing component has a second stream processing component of the same computing logic, and since the second stream processing component has the same data input and output relationship as the first stream processing component, and if there is a first stream processing in the first stream processing task
  • the third stream processing component of the component sending data, and the third stream processing component according to the first data component The policy sends the data
  • an embodiment of a structure of a stream processor according to an embodiment of the present invention includes:
  • the receiving device 802 is configured to receive a first stream processing task, where the first stream processing task includes one or more stream processing components, a data input and output relationship of the stream processing component, and an identifier of the stream data source.
  • the memory 804 is configured to store a computer program
  • the processor 801 is configured to read a computer program stored in the memory and perform processing of calculating each stream processing component required by the first stream processing task a computing resource that is required; if the first stream processing task includes a first stream processing component that does not satisfy a pre-set constraint, the at least one second computing component having the same computing logic as the first stream processing component is copied Flow processing component, and adding the second stream processing component to the first stream processing task to obtain a second stream processing task; in the second stream processing task, the second stream processing component and the The first stream processing component has the same data input and output relationship, and if there is a third stream processing component that sends data to the first stream processing component in the first stream processing task, the third stream processing component is based on Transmitting, by the first data distribution policy, data sent by the third stream processing component to the first stream processing component to the first stream processing component and the second stream processing component, if present in the first stream processing task
  • the stream data source that sends data to the first stream
  • the processor 801 is specifically configured to calculate, according to a preset prediction function, a computing resource required by each of the first stream processing tasks.
  • the prediction function can be set according to the actual situation, which is not limited by the embodiment of the present invention.
  • the processor 801 is further configured to: if the first stream processing task further includes an operator calculation amount and a stream transmission estimation calculation amount of the stream processing component, according to the first stream processing task The operator estimates the computation amount and the stream estimation computation amount for each stream processing component, and calculates the computation resources required by each of the stream processing components.
  • the processor 801 is further configured to calculate, according to the operator estimation calculation amount of each of the stream processing components and the estimated calculation amount of the source code of the stream processing component, calculate the calculation function according to a preset operator calculation amount prediction function. The amount of operator calculation for each stream processing component;
  • the sum of the operator calculation amount of each of the stream processing components and the stream computing calculation amount of each of the stream processing components is used as a computing resource required by each of the stream processing components.
  • the processor 801 is further configured to process the flow in the task according to the second stream. Sorting the stream processing components in ascending order of computing resources required by the processing component;
  • the processor 801 is further configured to determine a type of the second stream processing task according to a preset classification model, and search a correspondence table between the preset task type and the allocation manner, and determine the second relationship. And a distribution node corresponding to the type of the stream processing task; and according to the allocation manner, the stream processing component in the second stream processing task is allocated a computing node that satisfies the computing resource required by the stream processing component.
  • the foregoing constraint is that the computing resource required by the stream processing component is less than or equal to a preset value, or the computing resource required by the stream processing component is smaller than the largest idle computing resource that each computing node can provide. Or, the computing resources required by the stream processing component are smaller than the average of the idle computing resources of the respective computing nodes.
  • the first data allocation policy may be an average allocation policy
  • the second data allocation policy may be an average allocation policy
  • the resource allocation policy may be an average allocation policy.
  • Other allocation strategies may be adopted in other embodiments, and are not described herein again.
  • the sending device 803 is configured to send the stream processing component in the second stream processing task to the computing node allocated by the stream processing component;
  • the memory 804 can also be used to store the first stream processing task, the second stream processing task, the identifier of the computing node, and the computing resources used by the computing node and the total computing resources.
  • the stream processor uses the operator estimation calculation amount and the stream transmission estimation calculation amount of each stream processing component included in the first stream processing task to calculate each stream processing.
  • the computing resource required by the component and if the first stream processing task includes the first stream processing component that the required computing resource does not satisfy the preset constraint, copying at least one second having the same computing logic as the first stream processing component Flow processing component, and adding the second stream processing component to the first stream processing task to obtain a second stream processing task, because in the second stream processing task, the data input and output relationship of the second stream processing component and the first stream processing
  • the components are the same, and if The first stream processing component has a third stream processing component that sends data to the first stream processing component, and the third stream processing component sends the data sent by the third stream processing component to the first stream processing component to the first stream according to the first data distribution policy.
  • the processing component and the second stream processing component if there is a stream data source corresponding to the stream data source identifier that sends data to the first stream processing component in the first stream processing task, the stream data source sends the stream data source according to the second data distribution policy And sending the data of the first stream processing component to the first stream processing component and the second stream processing component, and the computing resources required by the first stream processing component in the first stream processing task are allocated to the second stream processing task according to the resource allocation policy.
  • the first-class processing component and the second stream processing component enable data originally sent to the first stream processing component to be allocated between the first stream processing component and the second stream processing component, thereby reducing computing resources required by the first stream processing component, and being capable of To reduce the amount of computing resources required by the stream processing component assigned to the compute node
  • the probability of system instability and data processing fault computing resources can provide compute nodes caused, thereby improving system performance.
  • the above stream processor can be applied to a personal computer or a server.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical. Units can be located in one place or distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • the connection relationship between the modules indicates that there is a communication connection between them, and specifically, one or more communication buses or signal lines can be realized.
  • the present invention can be implemented by means of software plus necessary general hardware, and of course, dedicated hardware, dedicated CPU, dedicated memory, dedicated memory, Special components and so on.
  • functions performed by computer programs can be easily implemented with the corresponding hardware, and the specific hardware structure used to implement the same function can also be various, such as analog circuits and digital circuits. Or dedicated circuits, etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of the present invention which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • U disk mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), disk or optical disk, etc., including a number of instructions to make a computer device (may be A personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present invention.
  • a computer device may be A personal computer, server, or network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

一种流处理方法、装置及系统,该方法包括:接收第一流处理任务(201),第一流处理任务中包含流处理组件、流处理组件的数据输入及输出关系、流数据源的标识;计算第一流处理任务包含的一个或多个流处理组件中的每一个流处理组件所需要的计算资源(202);若第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则通过复制至少一个与第一流处理组件具有相同计算逻辑的第二流处理组件,使得原本输入到第一流处理组件的数据可以在第一流处理组件及第二流处理组件间分配,一定程度避免因分配给计算节点的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率。

Description

一种流处理方法、装置及系统
本申请要求于2014年6月23日提交中国专利局、申请号为201410284343.5、发明名称为“一种流处理方法、装置及系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机技术领域,尤其涉及一种流处理方法、装置及系统。
背景技术
流处理技术广泛应用在各种领域的实时处理系统中,例如股票证券交易中心、网络监视、web应用、通信数据管理,这类系统的共同特点是,数据实时性强、数据量极大,具有相当高的突发性、连续发生并不断变化。流处理技术需要实时监测连续的数据流,在数据不断变化的过程中实时地进行数据分析,捕捉到可能对用户有用的信息,对紧急情况快速响应,实时处理。
目前,流数据处理主要采用分布式的计算方式。分布式流处理系统中包含多个计算节点,可以由该多个计算节点完成流数据的处理过程。用户提交流处理任务之后,该分布式流系统将该流处理任务中的流处理组件分配给该多个计算节点,其中,流处理组件包含数据的计算逻辑,使得该多台计算节点能够按照分配得到的流处理组件的计算逻辑对流数据进行处理。
然而利用上述的分布式流处理系统处理流数据时经常会出现分布在计算节点的流处理组件所需要的计算资源超过该计算节点能够提供的计算资源的情况,容易造成系统不稳定及数据处理的故障,降低系统性能。
发明内容
本发明实施例提供了一种流处理方法、装置及系统,用于对流处理任务中包含的流处理组件进行分配,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
本发明第一方面提供了一种流处理方法,所述方法包括:
接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组 件、所述流处理组件的数据输入及输出关系、流数据源的标识;
计算所述第一流处理任务包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;
若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,所述第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
将所述第二流处理任务中的流处理组件分配给流处理系统中的满足所述流处理组件所需要的计算资源的计算节点。
在第一方面第一种可能的实现方式中,所述第一流处理任务中还包括流处理组件的算子估计计算量及流传输估计计算量;
所述计算所述第一流处理任务中的每一个流处理组件所需要的计算资源包括:
根据所述第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算所述每一个流处理组件所需要的计算资源。
结合第一方面第一种可能的实现方式,在第一方面第二种可能的实现方式中,所述根据所述第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算所述每一个流处理组件所需要的计算资源,包括:
根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理 组件的算子计算量;
根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;
将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
结合第一方面或者第一方面第一种可能的实现方式或者第一方面第二种可能的实现方式,在第一方面第三种可能的实现方式中,所述将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点包括:
按照所述第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序将所述流处理组件排序;
将所述流处理组件按照所述排序分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点,其中所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
结合第一方面或者第一方面第一种可能的实现方式或者第一方面第二种可能的实现方式,在第一方面第四种可能的实现方式中,所述将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点包括:
根据预先设置的分类模型确定所述第二流处理任务的类型;
查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;
按照所述分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
结合第一方面或者第一方面第一种可能的实现方式或者第一方面第二种可能的实现方式或者第一方面第三种可能的实现方式或者第一方面第四种可能的实现方式,在第一方面第五种可能的实现方式中,所述约束条件为所述流处理组件所需要的计算资源小于或等于预先设置的数值,或者,所述流处理组件所需要的计算资源小于各个计算节点能提供的最大的空闲计算资源,或者, 所述流处理组件所需要的计算资源小于各个计算节点的空闲计算资源的平均值。
结合第一方面或者第一方面第一种可能的实现方式或者第一方面第二种可能的实现方式或者第一方面第三种可能的实现方式或者第一方面第四种可能的实现方式,在第一方面第六种可能的实现方式中,所述第一数据分配策略为平均分配策略,所述第二数据分配策略为平均分配策略,所述资源分配策略为平均分配策略。
本发明第二方面提供了一种流处理装置,包括:
接收单元,用于接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;
计算单元,用于在所述接收单元接收所述第一流处理任务之后,计算所述第一流处理任务中包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;
复制更新单元,用于在所述计算单元得到所述每一个流处理组件所需要的计算资源之后,若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,所述第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
分配单元,用于在所述复制更新单元得到所述第二流处理任务之后,将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处 理组件所需要的计算资源的计算节点。
在第二方面第一种可能的实现方式中,所述第一流处理任务中还包括流处理组件的算子估计计算量及流处理组件的流传输估计计算量;
则所述计算单元具体用于根据所述第一流处理任务中的每一个流处理组件对应的所述流处理组件的算子估计计算量及所述流处理组件的流传输估计计算量,计算所述每一个流处理组件所需要的计算资源。
结合第二方面第二种可能的实现方式,所述计算单元包括:
第一计算单元,用于在所述接收单元接收所述第一流处理任务之后,根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;
第二计算单元,用于在所述第一计算单元计算所述每一个流处理组件的算子计算量之后,根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;
第三计算单元,用于在所述第二计算单元计算所述每一个流处理组件的流传输计算量之后,将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
结合第二方面或者第二方面第一种可能的实现方式或者第一方面第二种可能的实现方式,在第二方面第三种可能的实现方式中,所述分配单元包括:
排序单元,用于在所述复制更新单元得到所述第二流处理任务之后,按照所述第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序将所述流处理组件排序;
组件分配单元,用于在所述排序单元进行排序后,将所述流处理组件按照所述排序分配给所述流处理系统中满足所述流处理组件所需要的计算资源的计算节点,所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算资源节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
结合第二方面或者第二方面第一种可能的实现方式或者第一方面第二种可能的实现方式,在第二方面第四种可能的实现方式中,所述分配单元包括:
确定单元,用于在所述复制更新单元得到所述第二流处理任务之后,根据 预先设置的分类模型确定所述第二流处理任务的类型;
查找单元,用于在所述确定单元确定所述第二流处理任务的类型之后,查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;
节点分配单元,用于在所述查找单元确定所述对应的分配方式之后,按照所述对应的分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
结合第二方面或者第二方面第一种可能的实现方式或者第一方面第二种可能的实现方式或者第二方面第三种可能的实现方式或者第二方面第四种可能的实现方式,在第二方面第五种可能的实现方式中,所述约束条件为流处理组件所需要的计算资源小于或等于预先设置的数值,或者,流处理组件所需要的计算资源小于所有计算节点中剩余计算资源最大的计算节点所能提供的计算资源,或者流处理组件所需要的计算资源小于计算节点的剩余计算资源的平均值。
结合第二方面或者第二方面第一种可能的实现方式或者第一方面第二种可能的实现方式或者第二方面第三种可能的实现方式或者第二方面第四种可能的实现方式,在第二方面第六种可能的实现方式中,所述第一数据分配策略为平均分配策略,所述第二数据分配策略为平均分配策略,且所述资源分配策略为平均分配策略。
本发明第三方面提供了一种流处理系统,包括:流处理装置和多个计算节点,其中:
所述流处理装置用于:接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;计算所述第一流处理任务包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,所述第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所 述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且,所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
所述计算节点用于:接受所述流处理装置分配的流处理组件,按照所述流处理组件的计算逻辑对发送给所述流处理组件的数据进行处理。
在第三方面第一种可能的实现方式中,所述第一流处理任务中还包括流处理组件的算子估计计算量及流传输估计计算量;
则所述流处理组件具体用于:根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
结合第三方面或者第三方面第一种可能的实现方式,在第三方面第二种可能的实现方式中,所述约束条件为所述流处理组件所需要的计算资源小于或等于预先设置的数值,或者,所述流处理组件所需要的计算资源小于各个计算节点能提供的最大的空闲计算资源,或者,所述流处理组件所需要的计算资源小于各个计算节点的空闲计算资源的平均值。
结合第三方面或者第三方面第一种可能的实现方式,在第三方面第三种可能的实现方式中,所述第一数据分配策略为平均分配策略,所述第二数据分配策略为平均分配策略,且所述资源分配策略为平均分配策略。
从以上技术方案可以看出,本发明实施例具有以下优点:
流处理装置接收第一流处理任务,该第一流处理任务中包含一个或多个流处理组件、流处理组件的数据输入及输出关系、流数据源的标识,计算该第一 流处理任务中的每一个流处理组件所需要的计算资源,若该第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与该第一流处理组件有相同计算逻辑的第二流处理组件,该第二流处理组件的个数为一个或多个,将该第二流处理组件添加到该第一流处理任务中,得到第二流处理任务;在该第二流处理任务中,第二流处理组件具有与第一流处理组件相同的数据输入及输出关系,且若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件;将第二流处理任务中的流处理组件分配给流处理装置中的满足流处理组件所需要的计算资源的计算节点。通过复制至少一个与第一流处理组件具有相同计算逻辑的第二流处理组件,使得原本输入到第一流处理组件的数据可以在第一流处理组件及第二流处理组件间分配,第一流处理组件所需要的计算资源减少,一定程度避免因分配给计算节点的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
附图说明
图1为现有技术中流处理任务的示意图;
图2为本发明实施例中流处理方法的一个示意图;
图3a为本发明实施例中第一流处理任务的示意图;
图3b为本发明实施例中第二流处理任务的示意图;
图4为本发明实施例中流处理方法的另一示意图;
图5为本发明实施例中流处理装置的结构的一个示意图;
图6为本发明实施例中流处理装置的结构的另一示意图;
图7为本发明实施例中流处理系统的结构的一个示意图;
图8为本发明实施例中流处理装置的结构的另一示意图。
具体实施方式
本发明实施例提供了一种流处理方法、装置及系统,用于对流处理任务中包含的流处理组件进行分配,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
下面通过具体实施例,分别进行详细的说明。
为使得本发明的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,下面所描述的实施例仅仅是本发明一部分实施例,而非全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
本发明的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本发明的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
请参阅图1,为现有技术中流处理任务的示意图,其中,流处理组件可以接收流数据源或者其他流处理组件的输出数据,流数据源提供数据流,其中,流数据源包含流数据源A和B,其中,流数据源A的输出数据发送给流处理组件A,流数据源B的输出数据发送给流处理组件A和B,且流处理组件A和B将输出数据发送给流处理组件C,其中,流处理组件可以有多个流处理单元构成,在图1中,流处理组件包含流处理单元A1至Ai。
请参阅图2,为本发明实施例中一种流处理方法,该流处理方法应用于流处理装置,该方法包括:
201、接收第一流处理任务;
在本发明实施例中,用户可向流处理装置提交第一流处理任务,该第一流处理任务中包含一个或多个流处理组件、流处理组件的数据输入及输出关系、流数据源的标识,此外,第一流处理任务中还可包括存储设备的标识,其中,流处理组件承载对数据进行处理的计算逻辑,例如计算逻辑可以是数据筛选、求和、求平均值、选取特征值等等。其中,流处理组件的数据输入及输出关系是指流处理组件的输入数据是由哪个流处理组件或者流数据源输入的,及流处理组件的输出数据是发送给哪个流处理组件或者存储设备的,或者,流处理组件的数据输入及输出关系时指流处理组件的输入数据是由哪个流处理组件和流数据源输入的,及流处理组件的输出数据是发送给哪个流处理组件或者存储设备的。例如:流处理组件的A和流数据源B将数据输入至流处理组件C,数据经过流处理组件C之后发送给流处理组件D,则流处理组件C的输入关系包括流处理组件A和流数据源,流处理组件C的输出关系包括流处理组件D。
202、计算第一流处理任务包含的一个或多个流处理组件中的每一个流处理组件所需要的计算资源;
在本发明实施例中,流处理装置可计算第一流处理任务包含的一个或多个流处理组件中的每一个流处理组件所需要的计算资源,具体可以是,流处理装置根据预先设置的预测函数计算流处理任务中的每个流处理组件所需要的计算资源,其中,预先设置的预测函数可以根据需要设置,本发明实施例对此不作限定。
203、若第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与第一流处理组件具有相同计算逻辑的第二流处理组件,第二流处理组件的个数为一个或多个,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务;
在本发明实施例中,流处理装置在得到第一流处理任务中的每一个流处理组件所需要的计算资源之后,若该第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与该第一流处理组件具有相同计算逻辑的第二流处理组件,该第二流处理组件的个数为一个或多个,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,在该第二流处理任务中,第二流处理组件与第一流处理组件具有相同的数据输入及输出 关系,且,若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件。
在本发明实施例中,第一数据分配策略可以为平均分配策略,则若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据平均分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件;例如:若第一流处理任务中的第一流处理组件所需要的计算资源为K,第二流处理任务中第一流处理组件及与该第一流处理组件具有相同的计算逻辑的第二流处理组件的个数和为N,则第二流处理任务中的第一流处理组件和第二流处理组件所需要的计算资源均为K/N。
其中,第二数据分配策略也可以为平均分配策略,则若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据平均分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件。
在第一数据分配策略和第二数据分配策略均为平均分配策略的时,资源分配策略也可以为平均分配策略,第一流处理任务中的第一流处理组件所需要的计算资源按照平均分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件。
需要说明的是,在本发明实施例中,第一数据分配策略还可以为随机分配策略,奇偶分配策略,或者按照预先设置的比例进行分配等等,且第二数据分配策略也可以为随机分配策略、奇偶分配策略或者参照预先设置的比例进行分配等等,且第一数据分配策略和第二数据分配策略可以相同也可以不同。
需要说明的是,在本发明实施例中,资源分配策略与第一数据分配策略及第二数据分配策略有关,例如,若第一数据分配策略和第二数据分配策略均为 平均分配策略,则资源分配策略也为平均分配策略,若第一数据分配策略为随机分配策略,第二数据分配策略为奇偶分配策略,则资源分配策略为平均分配策略。
为了更好的理解,请参阅图3a,为本发明实施例中第一流处理任务一个示例的结构示意图,其中,节点A和B表示流数据源,节点C、D、E、F、H表示流处理组件,箭头方向表示数据的流向,节点G表示存储设备,若节点D所需要的计算资源不满足预先设置的约束条件,则复制一个与节点D具有相同计算逻辑流处理组件,表示为节点D’,得到第二流处理任务,请参阅图3b,为本发明实施例中第二流处理组件的一个示例的结构示意图,在图3b中,节点D’与节点D具有相同的数据输入及输出关系,且节点D的上一层节点C、H将按照平均分配策略的方式将数据发送给节点D和D’。
在本发明实施例中,可根据具体的需要预先设置约束条件,例如:该预先设置的约束条件可以为:流处理组件所需要的计算资源小于或等于预先设置的数值,或者,流处理组件所需要的计算资源小于流处理装置中的各个计算节点能够提供的最大的空闲计算资源,或者流处理组件所需要的计算资源小于流处理装置中的各个计算节点的空闲计算资源的平均值。在实际应用中,可根据具体的需要设置约束条件,此处不做限定。
需要说明的是,在本发明实施例中,若预先设置的约束条件为:流处理组件所需要的计算资源大于预先设置的数值,或者流处理组件所需要的计算资源大于或等于流处理装置中的各个计算节点能够提供的最大的空闲计算资源,或者流处理组件所需要的计算资源大于或等于流处理装置中的各个计算节点的空闲计算资源的平均值,则流处理组件可在第一流处理任务中包含所需要的计算资源满足预先设置的约束条件的第一流处理组件的情况下,复制与第一流处理组件具有相同计算逻辑的第二流处理组件。
204、将第二流处理任务中的流处理组件分配给流处理系统中的满足流处理组件所需要的计算资源的计算节点。
在本发明实施例中,流处理装置将得到的第二流处理任务中的流处理组件分配给流处理装置中的满足流处理组件所需要的计算资源的计算节点,需要说明的是,数据流在计算节点上的数据输入及输出的方向与计算节点分配的流处 理组件之间的数据输入输出的方向一致。
在本发明实施例中,流处理装置接收第一流处理任务,计算该第一流处理任务中的每一个流处理组件所需要的计算资源,若该第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与该第一流处理组件有相同计算逻辑的第二流处理组件,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,并将该第二流处理任务中的流处理组件分配给流处理装置中的满足该流处理组件所需要的计算资源的计算节点,通过复制至少一个与不满足约束条件的第一流处理组件具有相同计算逻辑的第二流处理组件,且由于第二流处理组件与第一流处理组件具有相同的数据输入及输出关系,且若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件按照第一数据分配策略将数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源将根据第二数据分配策略将数据分组发送给第一流组件和第二流处理组件,使得可将不满足约束条件的第一流处理组件所需要的计算资源在第一流处理组件及第二流处理组件间分配,降低第一流处理组件所需要的计算资源,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
为了更好的理解本发明实施例中的技术方案,请参阅图4,为本发明实施例中一种流处理方法的实施例,包括:
401、接收第一流处理任务;
在本发明实施例中,流处理装置可接收用户提交的第一流处理任务,该第一流处理任务中包含一个或多个流处理组件、流处理组件的数据输入及输出关系、流数据源的标识、流处理组件的算子计算量及流传输计算量。
其中,算子估计计算量是指按照流处理组件的计算逻辑对单位数据进行处理估计需要的计算量,流传输估计计算量是指对单位数据进行传输估计需要的计算量,其中,单位数据是指单位时间内传输的数据,该单位数据与流数据源输出数据的速度有关。
需要说明的是,第一流处理任务中包含的流处理组件的算子估计计算量及 流处理组件的流传输估计计算量可以用于计算流处理组件所需要的计算资源,若需要计算流处理组件的其他资源,例如内存资源、网络带宽资源等等,可以在流处理任务中携带与所需要计算的资源类型相关的参数,本发明实施例中是以计算流处理组件所需要的计算资源描述的技术方案,在实际应用中,用户可通过设置流处理任务中的参数设置流处理组件所需要资源的具体类型,此处不做限定。
402、根据第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算每一个流处理组件所需要的计算资源;
在本发明实施例中,流处理任务中包含了第一流处理任务中的每一个流处理组件的算子估计计算量及流处理组件的流传输估计计算量,流处理装置将利用第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算每一个流处理组件所需要的计算资源。
需要说明的是,在本发明实施例中,可使用预先设置的预测函数计算每一个流处理组件所需要的计算资源,且需要计算的流处理组件的资源的类型不同,所需要使用的预测函数也是不同的,在实际应用中,流处理组件中已预先设置计算不同资源所需要使用的预测函数,此处不做限定。
在本发明实施例中,流处理装置计算每一个流处理组件所需要的计算资源可以是:
1)根据每一个流处理组件的算子估计计算量及流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算每一个流处理组件的算子计算量;
其中,流处理装置在接收到流处理任务之后,可预先估计该流处理任务中包含的第一流处理组件中的每一个流处理组件的源代码的估计计算量,且作为可参考的计算方式,预先设置的算子计算量预测函数可以为:
Vi=a*Pi+b*Mi
其中,i表示第一流处理任务中的第i个流处理组件,a和b为预先设置的调整参数,Vi表示第i个流处理组件的算子计算量,Pi表示的第i个流处理组件的算子估计计算量,Mi表示流处理装置第i个流处理组件的源代码的估计计算量。
优选的,为了在流处理装置执行接收到的流处理任务的过程中能够对流处 理组件的分配情况进行调整,流处理装置还可监测每一个流处理组件,获取每一个流处理组件的监测到的算子计算量,并基于该监测到的算子计算量确定流处理组件的算子计算量,则预先设置的算子计算量预测函数可以为:
Vi=a*Pi+b*Mi+c*Ki
其中,c为预先设置的调整参数,Ki表示第i个流处理组件的监测到的算子计算量。
2)根据每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数得到每一个流处理组件的流传输计算量;
其中,流传输计算量预测函数可参考如下:
Ei=d*Fi
其中,Ei表示第i个流处理组件的流传输计算量,d为预先设置的调整参数,Fi表示第i个流处理组件的流传输估计计算量。
优选的,为了在流处理装置执行接收到的流处理任务的过程中能够对流处理组件的分配情况进行调整,流处理装置还可监测每一个流处理组件,获取每一个流处理组件的监测到的流传输计算量,并基于该监测到的流传输计算量确定流处理组件的流传输计算量,则流传输计算量预测函数可参考如下:
Ei=d*Fi+e*Gi
其中,e为预先设置的调整参数,Gi表示监测到的第i个流处理组件的流传输计算量。
3)将每一个流处理组件的算子计算量与每一个流处理组件的流传输计算量的和作为每一个流处理组件所需要的计算资源。
其中,流处理装置可将Ei与Vi的和作为第i个流处理组件所需要的计算资源。
403、若第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,复制与第一流处理组件具有相同计算逻辑的第二流处理组件,第二流处理组件的个数为一个或多个,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务;
在本发明实施例中,流处理装置在得到第一流处理任务中每一个流处理组件所需要的计算资源之后,将判断第一流处理任务中是否包含所需要的计算资 源不满足预先设置的约束条件的第一流处理组件,且若第一流处理任务不满足预先设置的约束条件,则复制与第一流处理组件具有相同计算逻辑的第二流处理组件,且第二流处理组件的个数为一个或多个,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务。在该第二流处理任务中,第二流处理组件与第一流处理组件具有相同的数据输入及输出关系,且,若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第二流处理任务中的第一流处理组件和第二流处理组件所需要的计算资源是根据资源分配策略将第一流处理任务中第一流处理组件所需要的计算资源划分得到的。
需要说明的是,本发明实施例中的第一流处理组件可以是一个流处理组件,也可以是多个流处理组件,且若为多个流处理组件,流处理装置将分别为该多个流处理组件中的每一个流处理组件复制与其具有相同计算逻辑的流处理组件。
需要说明的是,在本发明实施例中,复制的与第一流处理组件具有相同计算逻辑的第二流处理组件的个数可以预先设置或者根据需要进行设置,此处不做限定。
404、将第二流处理任务中的流处理组件分配给流处理系统中的满足流处理组件所需要的计算资源的计算节点。
在本发明实施例中,流处理装置将第二流处理任务中的流处理组件分配给流处理系统中的满足该流处理组件所需要的计算资源的计算节点。具体的包括:按照第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序将流处理组件排序,将流处理组件按照排序分配给流处理系统中的满足流处理组件所需要的计算资源的计算节点,其中,流处理组件分配的计算节点为该流处理组件在各个计算节点上的计算资源比例最小的计算节点,该计算资源比例为流处理组件所需要的计算资源与该计算资源已使用的计算资源的和占该计算 节点总的计算资源的比例。
其中,步骤404还按照以下流程执行:
1)流处理装置按照第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序排序,得到排序后的流处理组件集合S;
i初始值1,i小于或等于N,N为流处理组件集合S中包含的流处理组件的个数,H为计算节点的计算资源的集合,执行以下步骤:
2)流处理装置计算流处理组件Si所需要的计算资源在集合Hi中的各个计算节点上占用的计算资源比例;
其中,Si所需要的计算资源在第K个计算节点上占用的计算资源比例的计算公式为:
Tik=(B’k+SCost(Si))/Bk
其中,Tik表示流处理组件Si在集合Hi的第k个计算节点上的计算资源比例,B’k表示集合Hi的第k个计算节点已使用的计算资源,Bk表示集合Hi的第k个计算节点的总的计算资源,SCost(Si)表示流处理组件Si所需要的计算资源。
3)将流处理组件Si分配给集合Hi中计算资源比例最小的计算节点;
其中,流处理装置得到流处理组件Si在集合Hi中的各个计算节点上占用的计算资源比例之后,将流处理组件Si分配给集合Hi中计算资源比例最小的计算节点。
4)更新集合Hi,得到更新后的集合Hi+1中,且已分配给流处理组件Si的计算算节点上,分配给流处理组件Si的计算资源为已使用的计算资源,若i小于N,令i=i+1,返回执行计算流处理组件Si所需要的计算资源在集合Hi中的各个计算节点上的占用的计算资源比例的步骤;
5)若i=N,则停止流处理组件的分配。
在本发明实施例中,流处理装置可按照上述步骤1)至5)将第二流处理任务中的每一个流处理组件分配给计算节点,且计算节点均能够满足分配得到的流处理组件所需要的计算资源。且在流处理装置的计算节点中可以存在分配两个或多个流处理组件的计算节点。
需要说明的是,在本发明实施例中,上述步骤1)至5)仅为流处理组件可行的一种分配方式,在实际应用中,还可按照第二流处理任务的类型对第二流 处理任务中的流处理组件进行分配,因此,上述的将第二流处理任务中的流处理组件分配给流处理装置中的满足该流处理组件所需要的计算资源的计算节点,具体为:根据预先设置的分类模型确定第二流处理任务的类型,并查找预先设置的任务类型与分配方式的对应关系表,确定与第二流处理任务的类型对应的分配方式,按照该对应的分配方式,为第二流处理任务中的流处理组件分配满足该流处理组件所需要的计算资源的计算节点。其中,预先设置的分类模型是基于多个流处理任务的特征分类算法得到的模型,其中,分类算法可包括决策树、叶贝斯分类器、支持向量机等,且流处理装置在使用该分类模型确定流处理任务的过程中,还可通过学习的方式改善该分类模型。
其中,可根据具体的需要设置与任务类型对应的分配方式,上述步骤1)至5)中为其中可行的一种分配方式,此处不做限定。
需要说明的是,在本发明实施例中,若在将第二流处理任务中的流处理组件分配给计算节点的过程中,第二流处理任务中,仍然存在不满足于预先设置的约束条件的流处理组件,则可继续复制与该不满足流处理组件具有相同计算逻辑的流处理组件。
在本发明实施例中,流处理装置在接收第一流处理任务之后,利用该第一流处理任务中包含的每一个流处理组件的算子估计计算量及流传输估计计算量,计算每一个流处理组件所需要的计算资源,且若第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制至少一个与该第一流处理组件具有相同计算逻辑的第二流处理组件,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,由于在第二流处理任务中,第二流处理组件的数据输入及输出关系与第一流处理组件相同,且若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件按照第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源将根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处 理组件,使得本来发送给第一流处理组件的数据可以在第一流处理组件和第二流处理组件之间分配,降低了第一流处理组件所需要的计算资源,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
请参阅图5,为本发明实施例中流处理系统的结构的示意图,包括:
接收单元501,用于接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;
计算单元502,用于在所述接收单元501接收所述第一流处理任务之后,计算所述第一流处理任务包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;
复制更新单元503,用于在所述计算单元502得到所述每一个流处理组件所需要的计算资源之后,若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且,所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
分配单元504,用于在所述复制更新单元503得到所述第二流处理任务之后,将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所 述流处理组件所需要的计算资源的计算节点。
在本发明实施例中,接收单元501接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;接着,计算单元502计算所述第一流处理任务中的每一个流处理组件所需要的计算资源;且若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制更新单元503复制至少一个与所述第一流处理组件具有相同计算逻辑的第二流处理组件,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件;最后分配单元504将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点。
在本发明实施例中,流处理装置接收第一流处理任务,计算该第一流处理任务中的每一个流处理组件所需要的计算资源,若该第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制至少一个与该第一流处理组件有相同计算逻辑的第二流处理组件,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,并将该第二流处理任务中的流处理组件分配给流处理装置中的满足该流处理组件所需要的计算资源的计算节点,通过复制至少一个与不满足约束条件的第一流处理组件具有相同计算逻辑的第二流处理组件,且由于第二流处理组件与第一流处理组件具有相同的数据输入及输出关系,且若在第一流处理任务中存在向第一流处理组件发送 数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件,使得可将不满足约束条件的第一流处理组件所需要的计算资源在第一流处理组件及第二流处理组件间分配,降低第一流处理组件所需要的计算资源,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
请参阅图6,为本发明实施例中流处理系统的结构的实施例,包括如图5所示实施例中描述的接收单元501,计算单元502,复制更新单元503及分配单元504,且与图5所示实施例中描述的方案相似,此处不再赘述。
在本发明实施例中,所述第一流处理任务中还包括流处理组件的算子估计计算量及流处理组件的流传输估计计算量;
则所述计算单元502具体用于根据所述第一流处理任务中的每一个流处理组件对应的所述流处理组件的算子估计计算量及所述流处理组件的流传输估计计算量,计算所述每一个流处理组件所需要的计算资源。
其中,所述计算单元502包括:
第一计算单元601,用于在所述接收单元501接收所述第一流处理任务之后,根据所述每一个流处理组件算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;
第二计算单元602,用于在所述第一计算单元601计算所述每一个流处理组件的算子计算量之后,根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;
第三计算单元603,用于在所述第二计算单元602计算所述每一个流处理组 件的流传输计算量之后,将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
在本发明实施例中,所述分配单元504包括:
排序单元604,用于在所述复制更新单元503得到所述第二流处理任务之后,按照所述第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序排序;
组件分配单元605,用于在所述排序单元604进行排序后,将所述流处理组件按照所述排序分配给所述流处理系统中满足所述流处理组件所需要的计算资源的计算节点,所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算资源节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
或者,在本发明实施例中,所述分配单元包括:
确定单元606,用于在所述复制更新单元503得到所述第二流处理任务之后,利用预先设置的分类模型确定所述第二流处理任务的类型;
查找单元607,用于在所述确定单元606确定所述第二流处理任务的类型之后,查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;
节点分配单元608,用于在所述查找单元607确定所述对应的分配方式之后,按照所述对应的分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
在本发明实施例中,接收单元501接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系;接着,计算单元502计算所述第一流处理任务中的每一个流处理组件所需要的计算资源,具体的,计算单元502中的第一计算单元601根据所述每一个流处理组件算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;接着第二计算单元602根据所述每一个流处理组件的流传输估计计算量,按照预先设置的 流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;并由第三计算单元603根据所述每一个流处理组件的算子计算量及所述每一个流处理组件的流传输计算量得到所述每一个流处理组件所需要的计算资源。
且若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制更新单元503复制至少一个与所述第一流处理组件具有相同计算逻辑的第二流处理组件,将所述第二流处理组件添加到第一流处理任务中,得到第二流处理任务在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件;最后分配单元504将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点。
其中分配单元504可按照如下方式分配流处理组件:
排序单元604按照所述第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序排序,并由组件分配单元605将所述流处理组件按照所述排序分配给所述流处理系统中满足所述流处理组件所需要的计算资源的计算节点,所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算资源节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
或者,分配单元504可以按照如下方式分配流处理组件:
确定单元606利用预先设置的分类模型确定所述第二流处理任务的类型;接着,查找单元607查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;并由节点分配单元608按照所 述对应的分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
在本发明实施例中,流处理装置在接收第一流处理任务之后,利用该第一流处理任务中包含的每一个流处理组件的算子估计计算量及流传输估计计算量,计算每一个流处理组件所需要的计算资源,且若第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制至少一个与该第一流处理组件具有相同计算逻辑的第二流处理组件,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,由于在第二流处理任务中,第二流处理组件的数据输入及输出关系与第一流处理组件相同,且若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件,使得本来发送给第一流处理组件的数据可以在第一流处理组件和第二流处理组件之间分配,降低了第一流处理组件所需要的计算资源,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
请参阅图7、为本发明实施例中流处理系统的结构图,包括:
流处理装置701和多个计算节点702。本发明实施例提供的计算节点可以是云计算中心的云服务器,或者普通数据处理中心的数据处理服务器,或者大数据处理中心的数据处理服务器等,本发明实施例对此不做限定。
所述流处理装置701用于:接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;计算所述第一流处理任务中的每一个流处理组件所需要的计算资源;若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条 件的第一流处理组件,则复制至少一个与所述第一流处理组件具有相同计算逻辑的第二流处理组件,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件;将所述第二流处理任务中的流处理组件分配给满足所述流处理组件所需要的计算资源的计算节点;
所述计算节点702用于:接受所述流处理装置701分配的流处理组件,按照所述流处理组件的计算逻辑对发送给所述流处理组件的数据进行处理。
在本发明实施例中,第一流处理任务中还包括流处理组件的算子估计计算量及流传输估计计算量。则流处理组件具体用于:根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
在本发明实施中,所述第一数据分配策略可以为平均分配策略,所述第二数据分配策略可以为平均分配策略,且所述资源分配策略可以为平均分配策略。在其他实施例中可采取其它分配策略,这里不再赘述。
在本发明实施例中,约束条件为流处理组件所需要的计算资源小于或等于预先设置的数值,或者,流处理组件所需要的计算资源小于各个计算节点能提供的最大的空闲计算资源,或者,流处理组件所需要的计算资源小于各个计算 节点的空闲计算资源的平均值。
需要说明的是,流处理装置701的其它功能、功能的具体实现、或模块划分等可参考前述实施例所述。
可见,在本发明实施例中,流处理装置701接收第一流处理任务,计算该第一流处理任务中的每一个流处理组件所需要的计算资源,若该第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制至少一个与该第一流处理组件有相同计算逻辑的第二流处理组件,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,并将该第二流处理任务中的流处理组件分配给流处理装置中的满足该流处理组件所需要的计算资源的计算节点,通过复制至少一个与不满足约束条件的第一流处理组件具有相同计算逻辑的第二流处理组件,且由于第二流处理组件与第一流处理组件具有相同的数据输入及输出关系,且若在第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件,使得第一流处理组件所需要处理的数据减少且所需要的计算资源也减少,能够有效的降低因计算节点分配的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
请参阅图8,为本发明实施例中流处理器的结构的实施例,包括:
处理器801、接收装置802、发送装置803、存储器804;
其中,接收装置802用于接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;
存储器804用于存储计算机程序,处理器801用于读取存储器中存储的计算机程序并执行如下处理:计算所述第一流处理任务中的每一个流处理组件所需 要的计算资源;若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制至少一个与所述第一流处理组件具有相同计算逻辑的第二流处理组件,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件;将所述第二流处理任务中的流处理组件分配给满足所述流处理组件所需要的计算资源的计算节点。
具体的,作为一个实施例,处理器801具体用于根据预先设置的预测函数计算所述第一流处理任务中的每一个流处理组件所需要的计算资源。其中预测函数可以根据实际情况设定,本发明实施例对此不做限定。
可选的,作为一个实施例,处理器801还用于:若所述第一流处理任务中还包括流处理组件的算子计算量及流传输估计计算量;则根据所述第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算所述每一个流处理组件所需要的计算资源。
进一步的,处理器801还用于根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;
根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;
将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
作为一个可选实施例,处理器801还用于按照所述第二流处理任务中的流 处理组件所需要的计算资源从大到小的顺序将所述流处理组件排序;
将所述流处理组件按照所述排序分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点,其中所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
作为一个可选实施例,处理器801还用于根据预先设置的分类模型确定所述第二流处理任务的类型;查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;按照所述分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
进一步的,上述的约束条件为所述流处理组件所需要的计算资源小于或等于预先设置的数值,或者,所述流处理组件所需要的计算资源小于各个计算节点能提供的最大的空闲计算资源,或者,所述流处理组件所需要的计算资源小于各个计算节点的空闲计算资源的平均值。
在本发明实施中,所述第一数据分配策略可以为平均分配策略,所述第二数据分配策略可以为平均分配策略,且所述资源分配策略可以为平均分配策略。在其他实施例中可采取其它分配策略,这里不再赘述。
其中,发送装置803用于将所述第二流处理任务中的流处理组件发送给该流处理组件分配的计算节点;
其中,存储器804还可以用于存储第一流处理任务、第二流处理任务、计算节点的标识及计算节点已用的计算资源及总的计算资源。
可见,采用上述方案后,流处理器在接收第一流处理任务之后,利用该第一流处理任务中包含的每一个流处理组件的算子估计计算量及流传输估计计算量,计算每一个流处理组件所需要的计算资源,且若第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制至少一个与该第一流处理组件具有相同计算逻辑的第二流处理组件,并将第二流处理组件添加到第一流处理任务中,得到第二流处理任务,由于在第二流处理任务中,第二流处理组件的数据输入及输出关系与第一流处理组件相同,且若在 第一流处理任务中存在向第一流处理组件发送数据的第三流处理组件,则第三流处理组件根据第一数据分配策略将第三流处理组件发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,若在第一流处理任务中存在向第一流处理组件发送数据的流数据源标识对应的流数据源,则流数据源根据第二数据分配策略将流数据源发送给第一流处理组件的数据发送给第一流处理组件和第二流处理组件,且第一流处理任务中的第一流处理组件所需要的计算资源根据资源分配策略划分给第二流处理任务中的第一流处理组件和第二流处理组件,使得本来发送给第一流处理组件的数据可以在第一流处理组件和第二流处理组件之间分配,降低了第一流处理组件所需要的计算资源,能够一定程度上降低因分配给计算节点的流处理组件所需要的计算资源超过该计算节点能提供的计算资源所导致出现系统不稳定及数据处理故障的概率,从而改善系统性能。
需要说明的是,上述的流处理器可以应用于个人电脑或者服务器中。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。
需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本发明提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发明可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路 或专用电路等。但是,对本发明而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘,U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述的方法。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (18)

  1. 一种流处理方法,其特征在于,所述方法包括:
    接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;
    计算所述第一流处理任务包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;
    若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,所述第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
    将所述第二流处理任务中的流处理组件分配给流处理系统中的满足所述流处理组件所需要的计算资源的计算节点。
  2. 根据权利要求1所述的方法,其特征在于,所述第一流处理任务中还包括流处理组件的算子估计计算量及流传输估计计算量;
    所述计算所述第一流处理任务中的每一个流处理组件所需要的计算资源包括:
    根据所述第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算所述每一个流处理组件所需要的计算资源。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述第一流处理任务中的每一个流处理组件的算子估计计算量及流传输估计计算量,计算所述每一个流处理组件所需要的计算资源,包括:
    根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;
    根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;
    将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
  4. 根据权利要求1至3任一项所述的方法,其特征在于,所述将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点包括:
    按照所述第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序将所述流处理组件排序;
    将所述流处理组件按照所述排序分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点,其中所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
  5. 根据权利要求1至3任一项所述的方法,其特征在于,所述将所述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点包括:
    根据预先设置的分类模型确定所述第二流处理任务的类型;
    查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;
    按照所述分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述约束条件为 所述流处理组件所需要的计算资源小于或等于预先设置的数值,或者,所述流处理组件所需要的计算资源小于各个计算节点能提供的最大的空闲计算资源,或者,所述流处理组件所需要的计算资源小于各个计算节点的空闲计算资源的平均值。
  7. 根据权利要求1至5所述的方法,其特征在于,所述第一数据分配策略为平均分配策略,所述第二数据分配策略为平均分配策略,所述资源分配策略为平均分配策略。
  8. 一种流处理装置,其特征在于,包括:
    接收单元,用于接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;
    计算单元,用于在所述接收单元接收所述第一流处理任务之后,计算所述第一流处理任务中包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;
    复制更新单元,用于在所述计算单元得到所述每一个流处理组件所需要的计算资源之后,若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,所述第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
    分配单元,用于在所述复制更新单元得到所述第二流处理任务之后,将所 述第二流处理任务中的流处理组件分配给所述流处理系统中的满足所述流处理组件所需要的计算资源的计算节点。
  9. 根据权利要求8所述的装置,其特征在于,所述第一流处理任务中还包括流处理组件的算子估计计算量及流处理组件的流传输估计计算量;
    则所述计算单元具体用于根据所述第一流处理任务中的每一个流处理组件对应的所述流处理组件的算子估计计算量及所述流处理组件的流传输估计计算量,计算所述每一个流处理组件所需要的计算资源。
  10. 根据权利要求9所述的装置,其特征在于,所述计算单元包括:
    第一计算单元,用于在所述接收单元接收所述第一流处理任务之后,根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;
    第二计算单元,用于在所述第一计算单元计算所述每一个流处理组件的算子计算量之后,根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;
    第三计算单元,用于在所述第二计算单元计算所述每一个流处理组件的流传输计算量之后,将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
  11. 根据权利要求8至10任一项所述的装置,其特征在于,所述分配单元包括:
    排序单元,用于在所述复制更新单元得到所述第二流处理任务之后,按照所述第二流处理任务中的流处理组件所需要的计算资源从大到小的顺序将所述流处理组件排序;
    组件分配单元,用于在所述排序单元进行排序后,将所述流处理组件按照所述排序分配给所述流处理系统中满足所述流处理组件所需要的计算资源的计算节点,所述计算节点为所述流处理组件在各个计算节点上的计算资源比例最小的计算节点,所述计算资源比例为所述流处理组件所需要的计算资源与所述计算资源节点已使用的计算资源的和占所述计算节点总的计算资源的比例。
  12. 根据权利要求8至10任一项所述的装置,其特征在于,所述分配单元 包括:
    确定单元,用于在所述复制更新单元得到所述第二流处理任务之后,根据预先设置的分类模型确定所述第二流处理任务的类型;
    查找单元,用于在所述确定单元确定所述第二流处理任务的类型之后,查找预先设置的任务类型与分配方式的对应关系表,确定与所述第二流处理任务的类型对应的分配方式;
    节点分配单元,用于在所述查找单元确定所述对应的分配方式之后,按照所述对应的分配方式,为所述第二流处理任务中的流处理组件分配满足所述流处理组件所需要的计算资源的计算节点。
  13. 根据权利要求8至12任一项所述的装置,其特征在于,所述约束条件为流处理组件所需要的计算资源小于或等于预先设置的数值,或者,流处理组件所需要的计算资源小于所有计算节点中剩余计算资源最大的计算节点所能提供的计算资源,或者流处理组件所需要的计算资源小于计算节点的剩余计算资源的平均值。
  14. 根据权利要求8至12任一项所述的装置,其特征在于,所述第一数据分配策略为平均分配策略,所述第二数据分配策略为平均分配策略,且所述资源分配策略为平均分配策略。
  15. 一种流处理系统,其特征在于,包括:流处理装置和多个计算节点,其中:
    所述流处理装置用于:接收第一流处理任务,所述第一流处理任务中包含一个或多个流处理组件、所述流处理组件的数据输入及输出关系、流数据源的标识;计算所述第一流处理任务包含的所述一个或多个流处理组件中的每一个流处理组件所需要的计算资源;若所述第一流处理任务中包含所需要的计算资源不满足预先设置的约束条件的第一流处理组件,则复制与所述第一流处理组件具有相同计算逻辑的第二流处理组件,所述第二流处理组件的个数为一个或多个,并将所述第二流处理组件添加到所述第一流处理任务中,得到第二流处理任务;在所述第二流处理任务中,所述第二流处理组件与所述第一流处理组件具有相同的数据输入及输出关系,且,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的第三流处理组件,则所述第三流处理组件根据第 一数据分配策略将所述第三流处理组件发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,若在所述第一流处理任务中存在向所述第一流处理组件发送数据的所述流数据源标识对应的流数据源,则所述流数据源根据第二数据分配策略将所述流数据源发送给所述第一流处理组件的数据发送给所述第一流处理组件和所述第二流处理组件,且,所述第一流处理任务中的所述第一流处理组件所需要的计算资源根据资源分配策略划分给所述第二流处理任务中的所述第一流处理组件和所述第二流处理组件;
    所述计算节点用于:接受所述流处理装置分配的流处理组件,按照所述流处理组件的计算逻辑对发送给所述流处理组件的数据进行处理。
  16. 根据权利要求15所述的系统,其特征在于,所述第一流处理任务中还包括流处理组件的算子估计计算量及流传输估计计算量;
    则所述流处理组件具体用于:根据所述每一个流处理组件的算子估计计算量及所述流处理组件的源代码的估计计算量,按照预先设置的算子计算量预测函数计算所述每一个流处理组件的算子计算量;根据所述每一个流处理组件的流传输估计计算量,按照预先设置的流传输计算量预测函数计算所述每一个流处理组件的流传输计算量;将所述每一个流处理组件的算子计算量与所述每一个流处理组件的流传输计算量的和作为所述每一个流处理组件所需要的计算资源。
  17. 根据权利要求15或16所述的系统,其特征在于,所述约束条件为所述流处理组件所需要的计算资源小于或等于预先设置的数值,或者,所述流处理组件所需要的计算资源小于各个计算节点能提供的最大的空闲计算资源,或者,所述流处理组件所需要的计算资源小于各个计算节点的空闲计算资源的平均值。
  18. 根据权利要求15或16所述的系统,其特征在于,所述第一数据分配策略为平均分配策略,所述第二数据分配策略为平均分配策略,且所述资源分配策略为平均分配策略。
PCT/CN2015/081533 2014-06-23 2015-06-16 一种流处理方法、装置及系统 WO2015196940A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410284343.5 2014-06-23
CN201410284343.5A CN105335376B (zh) 2014-06-23 2014-06-23 一种流处理方法、装置及系统

Publications (1)

Publication Number Publication Date
WO2015196940A1 true WO2015196940A1 (zh) 2015-12-30

Family

ID=53502443

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081533 WO2015196940A1 (zh) 2014-06-23 2015-06-16 一种流处理方法、装置及系统

Country Status (4)

Country Link
US (1) US9692667B2 (zh)
EP (1) EP2966568A1 (zh)
CN (1) CN105335376B (zh)
WO (1) WO2015196940A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102181640B1 (ko) * 2016-05-17 2020-11-23 아브 이니티오 테크놀로지 엘엘시 재구성가능한 분산 처리
US11397624B2 (en) * 2019-01-22 2022-07-26 Arm Limited Execution of cross-lane operations in data processing systems
EP3915021A1 (en) * 2019-02-15 2021-12-01 Huawei Technologies Co., Ltd. A system for embedding stream processing execution in a database
CN110347489B (zh) * 2019-07-12 2021-08-03 之江实验室 一种基于Spark的多中心数据协同计算的流处理方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120167103A1 (en) * 2010-12-23 2012-06-28 Electronics And Telecommunications Research Institute Apparatus for parallel processing continuous processing task in distributed data stream processing system and method thereof
CN102904919A (zh) * 2011-07-29 2013-01-30 国际商业机器公司 流处理方法和实现流处理的分布式系统
WO2013145310A1 (ja) * 2012-03-30 2013-10-03 富士通株式会社 データストリームの並列処理プログラム、方法、及びシステム
CN103595651A (zh) * 2013-10-15 2014-02-19 北京航空航天大学 基于分布式的数据流处理方法和系统
CN103782270A (zh) * 2013-10-28 2014-05-07 华为技术有限公司 流处理系统的管理方法和相关设备及系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5925102A (en) * 1997-03-28 1999-07-20 International Business Machines Corporation Managing processor resources in a multisystem environment in order to provide smooth real-time data streams, while enabling other types of applications to be processed concurrently
US7990968B2 (en) * 2005-05-31 2011-08-02 Broadcom Corporation Method and apparatus for demultiplexing, merging, and duplicating packetized elementary stream/program stream/elementary stream data
KR100740210B1 (ko) * 2005-10-21 2007-07-18 삼성전자주식회사 듀얼 전송 스트림 생성 장치 및 그 방법
KR100813000B1 (ko) * 2005-12-01 2008-03-13 한국전자통신연구원 데이터 중복 처리 방지 기능을 가지는 스트림 데이터 처리시스템 및 그 방법
US8131840B1 (en) * 2006-09-12 2012-03-06 Packet Plus, Inc. Systems and methods for data stream analysis using embedded design logic
US7826365B2 (en) * 2006-09-12 2010-11-02 International Business Machines Corporation Method and apparatus for resource allocation for stream data processing
US7889651B2 (en) * 2007-06-06 2011-02-15 International Business Machines Corporation Distributed joint admission control and dynamic resource allocation in stream processing networks
US8291006B2 (en) * 2008-05-30 2012-10-16 International Business Machines Corporation Method for generating a distributed stream processing application
JP5836229B2 (ja) * 2012-09-04 2015-12-24 株式会社日立製作所 ストリーム処理装置、サーバ、及び、ストリーム処理方法
JP6021120B2 (ja) * 2014-09-29 2016-11-09 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation データをストリーム処理する方法、並びに、そのコンピュータ・システム及びコンピュータ・システム用プログラム

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120167103A1 (en) * 2010-12-23 2012-06-28 Electronics And Telecommunications Research Institute Apparatus for parallel processing continuous processing task in distributed data stream processing system and method thereof
CN102904919A (zh) * 2011-07-29 2013-01-30 国际商业机器公司 流处理方法和实现流处理的分布式系统
WO2013145310A1 (ja) * 2012-03-30 2013-10-03 富士通株式会社 データストリームの並列処理プログラム、方法、及びシステム
CN103595651A (zh) * 2013-10-15 2014-02-19 北京航空航天大学 基于分布式的数据流处理方法和系统
CN103782270A (zh) * 2013-10-28 2014-05-07 华为技术有限公司 流处理系统的管理方法和相关设备及系统

Also Published As

Publication number Publication date
CN105335376A (zh) 2016-02-17
US9692667B2 (en) 2017-06-27
CN105335376B (zh) 2018-12-07
US20150372882A1 (en) 2015-12-24
EP2966568A1 (en) 2016-01-13

Similar Documents

Publication Publication Date Title
CN105279027B (zh) 一种虚拟机部署方法及装置
WO2021012663A1 (zh) 一种访问日志的处理方法及装置
JP5998206B2 (ja) クラスタデータグリッドにおける拡張可能な中央集中型動的リソース分散
WO2015196940A1 (zh) 一种流处理方法、装置及系统
CN110798517B (zh) 去中心化集群负载均衡方法、系统、移动终端及存储介质
KR20100113098A (ko) 전개 계획 제공 방법 및 컴퓨터 판독가능 저장 매체
CN102622275A (zh) 一种云计算环境下负载均衡实现方法
CN109617710B (zh) 数据中心间有截止时间约束的大数据传输带宽调度方法
CN106101196B (zh) 一种基于概率模型的云渲染平台任务调度系统
US20120233313A1 (en) Shared scaling server system
CN105227601A (zh) 流处理系统中的数据处理方法、装置和系统
CN110347515A (zh) 一种适合边缘计算环境的资源优化分配方法
CN103763174A (zh) 一种基于功能块的虚拟网络映射方法
US8819239B2 (en) Distributed resource management systems and methods for resource management thereof
CN112543354A (zh) 业务感知的分布式视频集群高效伸缩方法和系统
WO2018166249A1 (zh) 一种网络业务传输的方法及系统
Tatbul et al. Dealing with overload in distributed stream processing systems
WO2017045640A1 (zh) 一种数据中心内关联流的带宽调度方法及装置
CN108200185B (zh) 一种实现负载均衡的方法及装置
CN106878356B (zh) 一种调度方法及计算节点
Lakshmanan et al. Placement of replicated tasks for distributed stream processing systems
Nishanbayev et al. Evaluating the effectiveness of a software-defined cloud data center with a distributed structure
CN109688171B (zh) 缓存空间调度方法、装置和系统
Zhu et al. Load balancing algorithm for web server based on weighted minimal connections
Qiao et al. Load balancing in peer-to-peer systems using a diffusive approach

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15812346

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15812346

Country of ref document: EP

Kind code of ref document: A1