WO2015131721A1 - 流计算系统中的数据处理方法、控制节点及流计算系统 - Google Patents

流计算系统中的数据处理方法、控制节点及流计算系统 Download PDF

Info

Publication number
WO2015131721A1
WO2015131721A1 PCT/CN2015/071645 CN2015071645W WO2015131721A1 WO 2015131721 A1 WO2015131721 A1 WO 2015131721A1 CN 2015071645 W CN2015071645 W CN 2015071645W WO 2015131721 A1 WO2015131721 A1 WO 2015131721A1
Authority
WO
WIPO (PCT)
Prior art keywords
working node
node
working
concurrency
data
Prior art date
Application number
PCT/CN2015/071645
Other languages
English (en)
French (fr)
Inventor
史云龙
夏命榛
何诚
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020167027194A priority Critical patent/KR101858041B1/ko
Priority to EP15759274.2A priority patent/EP3115896A4/en
Priority to JP2016555667A priority patent/JP6436594B2/ja
Publication of WO2015131721A1 publication Critical patent/WO2015131721A1/zh
Priority to US15/257,722 priority patent/US10097595B2/en
Priority to US16/112,236 priority patent/US10630737B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5083Techniques for rebalancing the load in a distributed system
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/401Support for services or applications wherein the services involve a main real-time session and one or more additional parallel real-time or time sensitive sessions, e.g. white board sharing or spawning of a subconference
    • H04L65/4015Support for services or applications wherein the services involve a main real-time session and one or more additional parallel real-time or time sensitive sessions, e.g. white board sharing or spawning of a subconference where at least one of the additional parallel sessions is real time or time sensitive, e.g. white board sharing, collaboration or spawning of a subconference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/04Processing captured monitoring data, e.g. for logfile generation
    • H04L43/045Processing captured monitoring data, e.g. for logfile generation for graphical visualisation of monitoring data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/70Admission control; Resource allocation
    • H04L47/82Miscellaneous aspects
    • H04L47/828Allocation of resources per group of connections, e.g. per group of users

Definitions

  • the present invention relates to the field of computer technologies, and in particular, to a data processing method, a control node, and a stream computing system in a stream computing system.
  • Data-intensive services have been widely used. Typical data-intensive services include financial services, network monitoring, telecom data management, and web applications.
  • data is characterized by large, fast, and time-varying data. Therefore, data should not be modeled with persistent stable relationships, but it is suitable for modeling with transient data streams, which leads to the study of data flow calculation.
  • Data stream computing is a pipeline-like data processing model. Data stream computing comes from the idea that the value of data decreases with time, so it must be processed as soon as possible after event-triggered data generation. Data is processed as soon as it is generated, that is, an event is processed as soon as an event occurs, rather than buffering the data for batch processing.
  • the data flow calculation is based on the streaming data processing model.
  • the business data processing logic usually needs to be converted into a Directed Acyclic Graph (DAG; or a flow graph).
  • DAG Directed Acyclic Graph
  • a scheme for setting a streaming data processing model for data stream calculation is: physical equipment (PE, or called an execution unit) and a logical unit (in the DAG diagram, generally labeled with an Operator, or For the working node), the solution supports the degree of concurrency of the static configuration. That is, according to the concurrent degree of the static configuration of the user, each operator in the service execution process calls the corresponding number according to the concurrency.
  • the execution unit processes the data stream generated by the service.
  • the flow computing system is usually a distributed real-time stream processing system, various tasks in the system
  • the processing situation is changed in real time.
  • the initial user-set concurrency is often not optimal, so the streaming data processing model generated according to the concurrency set by the initial user can not adapt to the real-time changes of the system, which will lead to the flow computing system. Waste of resources and limited data processing capabilities.
  • the invention provides a data processing method, a control node and a flow computing system in a flow computing system, which can adjust the concurrency of working nodes in a flow computing system in real time according to the business processing situation, thereby improving the data processing capability and resource utilization of the flow computing system. rate.
  • the present invention provides a data processing method in a flow computing system, the flow computing system including a control node and a plurality of working nodes, the method comprising:
  • the control node invokes one or more working nodes of the plurality of working nodes to process the data stream according to the concurrency of the configured working nodes;
  • the control node collects data traffic information between each of the one or more working nodes and other working nodes and processing speed information of each of the one or more working nodes;
  • the control node determines whether the optimized concurrency of each of the one or more working nodes is the same as the concurrency of the working node, and if not, the work is performed according to the optimized concurrency of the working node. The degree of concurrency of the node is adjusted.
  • each working node includes one or more execution units, and when the working node is invoked to process the data stream, specifically processed by the execution unit included in the working node The data stream; the concurrency of the working node indicates the number of execution units included in the working node; then the control node adjusts the concurrency of the working node according to the optimized concurrency of the working node, including:
  • the control node adds at least one new one to the working node according to the optimized concurrency of the working node. Executing the unit, or deleting at least one execution unit of the working node, such that the working node represented by the working node currently has the same degree of concurrency as the working node.
  • control node adds at least one execution unit to the working node according to the optimized concurrency of the working node, or deletes the work.
  • At least one execution unit of the node including:
  • the control node When the optimized degree of concurrency of the working node is greater than the concurrency of the working node: the control node generates a first control instruction for creating a new execution unit and sends the same to the working node, so that the working node receives the first Creating a at least one new execution unit after a control instruction, and creating a data channel of the new execution unit and other execution units, wherein the total number of execution units currently included in the work node is represented by a degree of concurrency of the work node and the The optimization degree of the working nodes is the same;
  • the control node When the optimization concurrency of the working node is less than the concurrency of the working node: the control node generates a second control instruction for deleting the execution unit of the working node and sends the second control instruction to the working node, so that the working node receives Deleting at least one execution unit of the working node after the second control instruction, and deleting a data channel connected to the deleted execution unit, wherein the work is represented by a total number of execution units currently included in the work node
  • the concurrency of a node is the same as the optimal concurrency of the working node.
  • At least one execution unit is added to the working node according to the optimized concurrency of the working node, or the working node is deleted. After at least one execution unit, the method further includes:
  • control node adjusts a data distribution policy of the upstream working node corresponding to the working node according to the at least one execution unit added or deleted, where the data distribution policy is used to indicate that the working node receives the data, and receives the data, and receives the data The amount of data of the data device when receiving data;
  • the method further includes:
  • the control node adjusts a data distribution policy of the upstream working node corresponding to the working node according to the at least one execution unit added or deleted;
  • control node sends the adjusted data distribution policy to the upstream working node, so that the upstream working node determines, according to the adjusted data distribution policy, the working node group to which the target working node belongs, and the working node group includes at least one a working node; and determining a downstream target working node from the working node group, and after determining the target execution unit corresponding to the target working node, transmitting the data packet correspondingly to the target execution unit.
  • the present invention provides a control node in a flow computing system, the flow computing system including the control node and a plurality of working nodes, the control node comprising:
  • a calling unit configured to invoke one or more working nodes of the plurality of working nodes to process the data stream according to the concurrency of the configured working nodes
  • An information collecting unit configured to collect data traffic information between each of the one or more working nodes and other working nodes, and processing speed information of each of the one or more working nodes;
  • a calculating unit configured to determine, according to the data traffic information and the processing speed information collected by the information collecting unit, an optimized concurrency of each of the one or more working nodes
  • An adjusting unit configured to determine, respectively, whether an optimization concurrency of each of the one or more working nodes is the same as a concurrency of the working node, and if not, according to the optimized concurrency of the working node The degree of concurrency of the work node is adjusted.
  • each working node includes one or more execution units, and each working node processes the data stream by calling its own execution unit; the concurrency of the working node represents the working node.
  • the number of execution units included; the adjustment unit is specifically configured to adjust the concurrency of the working node according to the optimized concurrency of the working node.
  • the adjustment unit includes:
  • a first adjustment module configured to: when the optimized concurrency of the working node is greater than the concurrency of the working node: generate a first control instruction for adding a new execution unit and send the same to the working node, so that the working node receives the Creating at least one new execution unit after the first control instruction, and creating a data channel of the new execution unit and other execution units; wherein the work node represents the total number of execution units currently represented by the work node The degree of concurrency is the same as the optimization concurrency of the working node;
  • a second adjustment module when the optimization concurrency of the working node is less than the concurrency of the working node: generating a second control instruction for deleting an execution unit with the working node and sending the work to the working node, so that the work is performed After receiving the second control instruction, the node deletes at least one execution unit of the working node, and deletes a data channel connected to the deleted execution unit; wherein the total number of execution units currently included in the working node is characterized The degree of concurrency of the working node is the same as the optimal concurrency of the working node.
  • control node further includes:
  • a first dispatching policy adjusting unit configured to adjust a data dispatching policy of an upstream working node corresponding to the working node according to the at least one execution unit added or deleted, and send an adjusted data distribution policy to the upstream working node, so that The upstream working node sends a data packet corresponding to the target execution unit after determining the target execution unit corresponding to the downstream target working node according to the adjusted data distribution policy, where the data distribution policy is used to indicate the work.
  • control node further includes:
  • a second dispatching policy adjusting unit configured to adjust a data dispatching policy of the upstream working node corresponding to the working node according to the at least one execution unit added or deleted, and send the adjusted data distribution policy to the upstream working node, so that Determining, by the upstream working node, a working node group to which the target working node belongs according to the adjusted data distribution policy, the working node group includes at least one working node; and determining a downstream target working node from the working node group And after determining the target execution unit corresponding to the target working node, the data packet is correspondingly dispatched to the target execution unit.
  • the present invention provides a flow computing system, where the flow computing system includes: a control node and a plurality of working nodes;
  • the control node is configured to invoke one or more working nodes of the multiple working nodes to process the data stream according to the concurrency of each working node configured in the flow computing system;
  • the working node is configured to process the data stream under a call of the control node
  • the control node is further configured to collect data traffic information between each of the one or more working nodes and other working nodes and processing speed information of each of the one or more working nodes Determining, according to the collected data traffic information and processing speed information, an optimization concurrency of each of the one or more working nodes; and determining, respectively, each of the one or more working nodes
  • the degree of concurrency is the same as the degree of concurrency of the working node. If not, the concurrency of the working node is adjusted according to the optimization concurrency of the working node.
  • the working node includes one or more execution units, and when the working node is invoked to process the data stream, specifically processed by the execution unit included in the working node. a data stream; the concurrency of the working node indicates the number of execution units included in the working node; and the control node is specifically used in the aspect of adjusting the concurrency of the working node according to the optimized concurrency of the working node.
  • the optimization concurrency is the same as the working node.
  • control node is further configured to adjust an upstream working node corresponding to the working node according to the at least one execution unit added or deleted. a data distribution policy, and sending the adjusted data distribution policy to the upstream working node, so that the upstream working node according to the adjusted data distribution strategy, after determining the target execution unit corresponding to the downstream target working node,
  • the packet is correspondingly dispatched to the target execution unit, wherein the data dispatching strategy is used to indicate the amount of data of the device receiving the data and the device receiving the data when receiving the data when the working node is distributing the data.
  • control node is further configured to: adjust, by the control node, the upstream corresponding to the working node according to the at least one execution unit added or deleted. a data distribution policy of the working node, and sending the adjusted data distribution policy to the upstream working node, so that the upstream working node determines the working node group to which the target working node belongs according to the adjusted data distribution policy,
  • the working node group includes at least one working node; and determines a downstream target working node from the working node group, and after determining the target execution unit corresponding to the target working node, dispatching the data packet corresponding to the target execution unit .
  • the technical solution provided by the embodiment of the present invention collects the processing speed information of each working node and the traffic information between the working nodes in real time during the running of the system, and adjusts the concurrency of the working node according to the information collected in real time, so that the work is performed.
  • the processing capability of the node can meet the real-time requirements of the business processing, thereby achieving the data processing capability and resource utilization of the dynamic stream computing system.
  • FIG. 1 is a schematic diagram of a DAG diagram in the prior art
  • FIG. 2 is a flowchart of a data processing method in a stream computing system according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of correspondence between processing time of tuples and arrival time of tuples according to an embodiment of the present invention
  • FIG. 4 is a schematic diagram of calculating an optimization degree of concurrency in an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a DAG picture segment according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a data processing method in another flow computing system according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a control node in a flow computing system according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic diagram of a flow computing system according to an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of a control node in another flow computing system according to an embodiment of the present invention.
  • an embodiment of the present invention provides a data processing method in a flow computing system.
  • the method provided by the embodiment of the present invention can be used in a flow computing system.
  • the flow computing system includes a control node and multiple working nodes. (Also called operator), the control node can send corresponding control commands to the working nodes of the subordinates, so that the working node calls the execution unit to process the data stream generated by the service according to the control instruction.
  • Step 201 The control node invokes one or more working nodes of the multiple working nodes to process the data stream according to the concurrency of the configured working nodes.
  • the working node in the present invention is also generally referred to as an operator in the stream computing system, and the embodiment of the present invention does not make a special distinction between the two;
  • the concurrency of a node is configured in a flow graph (also referred to as a loop-free directed graph) that describes the business processing logic.
  • a flow graph also referred to as a loop-free directed graph
  • the control node invokes one or more working nodes to process the data flow generated by the service according to the concurrency of the configured working nodes, wherein the flow graph is in the flow computing system
  • a general representation of the business data processing logic For a detailed description of the flow graph, refer to the description in the background section, and details are not described here.
  • each working node includes one or more execution units.
  • the execution unit included in the working node processes the data stream; the execution unit may be a thread. Or the process; the concurrency of the working node is used to represent the correspondence between the working node and the execution unit.
  • the concurrency of the working node indicates the number of execution units included in the working node, such as the concurrency of the working node A. If it is 5, it means that the working node can call 5 execution units to process the data stream.
  • the concurrency of the working node in this step refers to the concurrency of the initial configuration of the working node.
  • Step 202 The control node collects data traffic information between each of the one or more working nodes and other working nodes, and processing speed information of each of the one or more working nodes.
  • the traffic information between the working nodes refers to: data traffic information between working nodes having a logical upstream relationship in the flow graph; the speed information of the working node represents the speed at which the working node processes the data, and the working node processes the data. The speed is determined by the concurrency of the working node, data traffic and other factors.
  • Step 203 The control node determines, by using the collected data traffic information and processing speed information, an optimized concurrency of each of the one or more working nodes.
  • the degree of concurrency of the working node is the degree of concurrency that matches the current load situation of the working node.
  • the execution unit is called to perform data stream processing according to the concurrency of the initial configuration, but the actual data is being processed. The specific processing cannot be estimated before processing, so many times the initial concurrency does not achieve the optimal effect. Therefore, for the above problems in the prior art, the method provided by the embodiment of the present invention is actually negative for each working node in the flow computing system.
  • the load is collected (that is, the collected flow information and processing speed information), and then the matched optimization concurrency is calculated according to the collected load conditions.
  • the optimization concurrency can match the data processing situation of the working node, thereby avoiding the waste of resources and the problem that the execution unit cannot meet the data processing requirements.
  • Step 204 The control node determines whether the optimized concurrency of each of the one or more working nodes is the same as the concurrency of the working node, and if not, according to the optimized concurrency of the working node. Adjust the concurrency of the worker node.
  • the optimized degree of concurrency of the working node is the same as its concurrency, it is not necessary to adjust it and maintain its current concurrency.
  • the specific algorithm for determining the optimal concurrency of each working node through the collected real-time data traffic information and processing speed information may adopt various implementation forms. The following is a specific example, but should not be understood as implementation. The only way of the invention.
  • a tuple (or a packet) in a data stream arrives at a working node matches the time at which the execution unit processes the tuple, the resource utilization of the system is optimal, which facilitates the full use of the system. Processing power. If the tuple arrival time is less than the tuple processing time, it means that the execution unit is overloaded, and the system will have tuple accumulation.
  • the execution unit if the time 1 at which the tuple reaches the execution unit is greater than the tuple service processing time (ie, the tuple service processing time), the execution unit is relatively idle (as shown in FIG. 3).
  • the tuple reaches the relationship between time 1 and tuple processing time); if the tuple arrival time a is smaller than the tuple business processing time, it means that the execution unit load is too large (as shown in Figure 3, the tuple reaches time 2 and the tuple processing time). ).
  • the tuple arrival time is the average time interval between the tuple and the execution unit
  • the tuple service processing time is the time required for the execution unit to averagely process one tuple.
  • the tuple arrival time and the tuple service processing time are based on the collected working nodes.
  • the flow rate information and the processing speed data of the processing unit are calculated and obtained.
  • the concurrency of the working node needs to be adjusted, so that the tuple service processing time is consistent with the tuple arrival time.
  • the operational relationship between the arrival time of the tuple and the concurrency of the working node is: the greater the degree of concurrency, the greater the arrival time of the tuple.
  • the concurrency of each operator is calculated layer by layer, and the calculation order is A, B, C, D, E, F.
  • Dop represents the concurrency value
  • the calculation formula dop ⁇ tuple processing time/tuple arrival time is calculated.
  • the tuple processing time and the tuple issuing time are obtained according to the statistical information reported by the actual service processing.
  • the tuple arrival time is obtained according to the tuple issuing time of the upstream node.
  • the working node A (or operator A) in FIG. 4 is the source node, and the tuple arrival time is obtained according to the throughput, that is, 1s/67 ⁇ 15ms.
  • the calculation method of the above optimization concurrency is an optional implementation manner of the embodiment of the present invention, and the optimization concurrency calculation method of the present invention is not limited to the above manner.
  • the method provided by the embodiment of the present invention is applicable to any one in a specific application environment, in which the requirements of different flow computing systems are different, and the specific device performance is different, so that the manner of calculating the optimization concurrency is different.
  • a scenario in which the optimization concurrency is calculated according to the real-time situation, and then the processing logic in the flow computing system is adjusted according to the optimization concurrency.
  • each working node includes one or more execution units
  • the working node is called to process the data stream
  • the calculated optimization concurrency can be determined. Whether the concurrency of each working node that is initially set is consistent with the current processing situation, and if the matching degree of the working node can be adjusted if the matching is not performed, the concurrency of the working node according to the optimized concurrency degree Making adjustments includes:
  • the adjusted concurrency is the optimal and the most ideal implementation, but in the specific implementation process, due to other objective conditions, the actual situation may be Adjusting the degree of concurrency such that the adjusted concurrency is compared to the optimized concurrency Off or close.
  • the specific effect of the adjustment is that the data processing capability of the working node is more suitable for the current system data processing needs.
  • the control node generates a first control instruction for creating a new execution unit and sends it to the working node, so that the working node creates at least one new execution unit after receiving the first control instruction, and creates the new execution unit and The data channel of the other execution unit; after adjustment, the total number of execution units currently included in the working node is characterized by the same degree of concurrency of the working node as the optimized degree of convergence of the working node.
  • the data channel of the new execution unit and the downstream execution unit is generally established first, and then the upstream data channel is established correspondingly.
  • the control node generates a second control instruction for deleting the execution unit with the working node and sends the second control instruction to the working node, so that the working node deletes at least one execution unit of the working node after receiving the second control instruction, and Data channel connected to the deleted execution unit; adjusted Then, the total number of execution units currently included in the working node is characterized by the same degree of concurrency of the working node as the working node.
  • the specific implementation steps of deleting the execution unit by the working node may be:
  • the second upstream working node corresponds to at least one second An upstream execution unit, where the second downstream working node corresponds to at least one second downstream execution unit;
  • the execution unit to be deleted is deleted.
  • one or some execution units need to be deleted.
  • the specific operations when deleting the execution unit may be:
  • the data channel of the execution unit to be deleted and the upstream execution unit are disconnected; then, the work node to be deleted is processed to process the unprocessed data, and after the data is processed, the data channel between the execution unit to be deleted and the downstream execution unit is deleted; Finally delete the execution unit to be deleted.
  • the execution unit since the execution unit is added or deleted, and the upstream working node of the working node that adjusts the degree of convergence is distributing data, the data distribution policy needs to be adjusted correspondingly. For example, if an execution unit is newly added, the data needs to be dispatched to the newly added execution unit for processing. Therefore, after the step 204 is performed, the method provided by the embodiment of the present invention further includes:
  • the data distribution policy of the upstream working node corresponding to the working node is correspondingly adjusted.
  • the concurrency of the working node is adjusted, that is, a certain number of execution units need to be newly added or deleted with respect to the original working node.
  • the execution unit is adjusted, if the data distribution policy of the upstream working node is not adjusted, there will be a problem in data processing.
  • the data distribution policy needs to be generated according to the number of downstream execution units and the processing capability of each execution unit, so the specific example of the data distribution policy It includes the path to the data distribution and the specific execution components corresponding to the data distribution.
  • the control node adjusts a data distribution policy of the upstream working node corresponding to the working node according to the at least one execution unit added or deleted, where the data distribution policy is used to indicate that the working node receives the data, receives the data, and receives the data.
  • n0 represents the upstream operator
  • n1, n2, and n3 respectively represent the downstream operator of n0
  • n0 transmits two streams s1 and s2, where n1 and n2 subscribe to the s1 stream, and n3 subscribes to s2.
  • the concurrency of n1 is 1, and is performed by one PE, that is, n1 is performed by pe1; the concurrency of n2 is 2, which is performed by two PEs, that is, n2 is performed by pe2 and pe3; the concurrency of n3 is 3, by three PEs
  • the first-level dispatch is performed, that is, the target operator is selected.
  • the target operators selectable in the example are n1 and n2, and then respectively for n1 and n2.
  • the second level is distributed; when the second level is distributed for n1, since the concurrency of n1 is 1, it is directly determined that tuple0 is dispatched to pe1; when the second level is distributed for n2, since the concurrency of n2 is 2, it needs to be based on The distribution policy of the n2 configuration is used for data distribution.
  • it can be configured as a hash distribution, that is, firstly hashing some attribute fields of tuple0 to obtain a corresponding hash value, and then performing modulo according to the degree of concurrency, and the result can be used as an index value to select the PE corresponding to n2.
  • the corresponding data distribution policy needs to be set, and the distribution policy may be extended according to a specific embodiment, such as Corresponding embodiments may support random data distribution (that is, the flow is randomly sent to a PE corresponding to the downstream operator), all-distribution (that is, all PEs corresponding to the downstream operator), and hash distribution (that is, according to the hash modulo method). Make sure the stream is sent to a PE).
  • the control node adjusts a data distribution policy of the upstream working node corresponding to the working node according to the at least one execution unit added or deleted;
  • control node sends the adjusted data distribution policy to the upstream working node, so that the upstream working node determines, according to the adjusted data distribution policy, the working node group to which the target working node belongs, and the working node group includes at least one a working node; and determining a downstream target working node from the working node group, and after determining the target execution unit corresponding to the target working node, transmitting the data packet correspondingly to the target execution unit.
  • the solution provided by the embodiment of the present invention further provides a multi-level data dispatching scheme based on adjusting the concurrency of the operator, so that the degree of concurrency of the operator can be improved while ensuring the correct distribution of the data.
  • the control node of the flow computing system instructs one or more working nodes to process the data flow generated by the service according to the configured flow graph, where the flow graph includes initial concurrency set in advance for each working node, and the specific steps include:
  • Step 601 The working node collects the processing speed information of the processing data stream and the data traffic information of the other working node, and sends the collected processing speed information and the data traffic information to generate processing capability description information to the corresponding control node;
  • Step 602 The control node collects processing capability description information of each working node that is invoked;
  • Step 603 The control node determines the optimized concurrency of each working node by using the collected real-time processing capability description information and the flow graph.
  • Step 604 the control node determines whether the optimized concurrency of each working node is the same as the initial concurrency of the working node in the flow graph, and if not, generates the concurrency according to the optimization. Controlling an instruction and transmitting the control instruction to the working node;
  • Step 605 After receiving the control instruction, the working node adjusts its own concurrency according to the control instruction.
  • the method provided by the embodiment of the present invention collects the processing status of each working node in real time during the running of the system, and then according to the real-time processing situation.
  • the concurrency of the working node is adjusted so that the processing capability of the working node can meet the real-time requirements of the business processing, thereby achieving the effect of dynamically improving the data processing capability and resource utilization of the streaming computing system.
  • the present invention further provides a control node 700 in a flow computing system, where the flow computing system includes a control node and a plurality of working nodes, and the control node includes:
  • the calling unit 701 is configured to invoke one or more working nodes of the plurality of working nodes to process the data stream according to the concurrency of the configured working nodes;
  • the information collection unit 702 is configured to collect data traffic information between each of the one or more working nodes and other working nodes, and processing speed information of each of the one or more working nodes;
  • the calculating unit 703 is configured to determine, according to the data traffic information and the processing speed information collected by the information collecting unit 702, an optimized concurrency of each of the one or more working nodes;
  • the adjusting unit 704 is configured to determine, respectively, whether the optimized concurrency of each of the one or more working nodes is the same as the initial concurrency of the working node, and if not, according to the optimized concurrency of the working node Adjust the concurrency of the worker node.
  • each working node includes one or more execution units, and each working node processes the data flow by calling its own execution unit; the degree of concurrency of the working node indicates the number of execution units included in the working node; The aspect of adjusting the concurrency of the working node according to the optimization concurrency of the working node; the adjusting unit 704 is specifically configured to:
  • the adjusting unit 704 includes:
  • a first adjustment module configured to: when an optimized concurrency of the working node is greater than an initial concurrency of the working node: a first control instruction for generating a new execution unit and sent to the working node, so that the working node After receiving the first control instruction, creating at least one new execution unit, and creating a data channel of the new execution unit and other execution units;
  • a second adjustment module when the optimized concurrency of the working node is less than the initial concurrency of the working node: generating a second control instruction for deleting an execution unit with the working node, and sending the second control instruction to the working node, And causing the working node to delete the at least one execution unit of the working node after receiving the second control instruction, and deleting the data channel connected to the deleted execution unit.
  • the device further includes:
  • the first dispatching policy adjustment unit 705 is configured to adjust, according to the at least one execution unit added or deleted, a data dispatching policy of the upstream working node corresponding to the working node, and send the adjusted data distribution policy to the upstream working node, And causing the upstream working node to send a data packet corresponding to the target execution unit after determining the target execution unit corresponding to the downstream target working node according to the adjusted data distribution policy, where the data distribution policy is used to represent When the worker node distributes data, the device receiving the data and the amount of data when the device receiving the data receives the data.
  • the second dispatching policy adjusting unit 706 is configured to adjust a data dispatching policy of the upstream working node corresponding to the working node according to the at least one execution unit added or deleted, and send the adjusted data distribution policy to the upstream working node, And causing the upstream working node to determine a working node group to which the target working node belongs according to the adjusted data distribution policy, where the working node group includes at least one working node; and determining a downstream target work from the working node group And the node, and after determining the target execution unit corresponding to the target working node, the data packet is correspondingly dispatched to the target execution unit.
  • control node 700 in the flow computing system provided by the embodiment of the present invention is used to implement the foregoing method.
  • the control node 700 in the flow computing system provided by the embodiment of the present invention is used to implement the foregoing method.
  • the control node 700 in the flow computing system provided by the embodiment of the present invention is used to implement the foregoing method.
  • the control node 700 in the flow computing system provided by the embodiment of the present invention is used to implement the foregoing method.
  • the control node in the flow computing system collects the processing status of each working node in real time during the running of the flow computing system, and then adjusts the concurrency of the working node according to the real-time processing situation, so that the working node
  • the processing capability can meet the real-time requirements of business processing, thereby achieving the effect of dynamically improving the data processing capability and resource utilization of the stream computing system.
  • the embodiment of the present invention further provides a flow computing system 800, the flow computing system 800 includes: a control node 801 and a plurality of working nodes 802;
  • the control node 801 is configured to, according to the concurrency of each working node 802 configured in the flow computing system, invoke one or more working nodes of the multiple working nodes to process a data flow generated by the service;
  • a working node 802 configured to process, by the control node 801, a data stream generated by the service
  • the control node 801 is further configured to collect data traffic information between each of the one or more working nodes and other working nodes, and processing speed information of each of the one or more working nodes; Determining, by the collected data flow information and processing speed information, an optimization concurrency of each of the one or more working nodes; and determining, respectively, optimization of each of the one or more working nodes Whether the concurrency is the same as the concurrency of the working node. If not, the concurrency of the working node is adjusted according to the optimization concurrency of the working node.
  • the working node includes one or more execution units, and when the working node is called to process the data stream, the execution unit included in the working node processes the data stream; the concurrency of the working node indicates the execution of the working node The number of units; in the aspect of adjusting the concurrency of the working node according to the optimized concurrency of the working node; the control node 801 is specifically configured to:
  • the working node 802 is further configured to: add at least one execution unit according to the control instruction, or delete at least one execution unit of the working node 802, so that the working node 802 represented by the number of execution units currently included by the working node 802
  • the degree of concurrency is the same as the optimization concurrency of the worker node 802.
  • control node 801 is further configured to: adjust, according to the at least one execution unit added or deleted, a data dispatching policy of the upstream working node corresponding to the working node 802, and send the adjusted data distribution policy to the upstream working node.
  • the upstream working node determines the target execution unit corresponding to the downstream target working node according to the adjusted data distribution policy
  • the data packet is correspondingly distributed to the target execution unit, where the data distribution policy is used for Indicates the amount of data that the working node receives when receiving data, and the device that receives the data when it receives data.
  • control node 801 is further configured to: adjust, according to the at least one execution unit added or deleted, a data distribution policy of the upstream working node corresponding to the working node, and send the adjusted data distribution policy to the upstream working node, so that Determining, by the upstream working node, the working node group to which the target working node belongs according to the adjusted data distribution policy, the working node group includes at least one working node; and determining a downstream target working node from the working node group, And after determining the target execution unit corresponding to the target working node, the data packet is correspondingly dispatched to the target execution unit.
  • the flow computing system collects the processing speed information of each working node and the traffic information between the working nodes in real time during the running of the system, and adjusts the concurrency of the working node according to the information collected in real time, so that The processing power of the working node can meet the real-time requirements of the business processing, thereby achieving the data processing capability and resource utilization of the dynamic lifting flow computing system.
  • the present invention further provides a control node for performing the data processing method in the foregoing various embodiments, the control node including at least one processor 901 (eg, a CPU), at least one network interface 902, or other communication.
  • An interface, a memory 903, and at least one communication bus 904 are used to implement connection communication between these devices.
  • the processor 901 is configured to execute an executable module, such as a computer program, stored in the memory 903.
  • the memory 903 may include a high speed random access memory (RAM: Random Access Memory), and may also include a non-unstable memory. (non-volatile memory), such as at least one disk storage.
  • the communication connection between the system gateway and at least one other network element is implemented by at least one network interface 902 (which may be wired or wireless), and may use an Internet, a wide area network, a local network, a metropolitan area network, or the like.
  • the memory stores a program 9031, and the program may be executed by the processor, to: call one or more working node pairs of the plurality of working nodes according to a concurrency of the configured working nodes.
  • Flow processing collecting data traffic information between each of the one or more working nodes and other working nodes and processing speed information of each of the one or more working nodes;
  • Data flow information and processing speed information determine an optimized degree of convergence of each of the one or more working nodes; determining an optimal concurrency of each of the one or more working nodes and the The concurrency of the working nodes is the same. If they are not the same, the concurrency of the working nodes is adjusted according to the optimization concurrency of the working nodes.
  • the method provided by the embodiment of the present invention collects the processing status of each working node in real time during the running of the system, and then according to the real-time processing situation.
  • the concurrency of the working node is adjusted so that the processing capability of the working node can meet the real-time requirements of the business processing, thereby achieving the effect of dynamically improving the data processing capability and resource utilization of the streaming computing system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种流计算系统中的数据处理方法、控制节点及流计算系统,该方法包括:收集每个工作节点与其它工作节点之间的数据流量信息和每个工作节点的处理速度信息;根据收集到的数据流量信息和处理速度信息确定每个工作节点的优化并发度;根据工作节点的优化并发度对工作节点的并发度进行调整。本发明公开的方法和装置根据流计算系统中的实时数据对工作节点的并发度进行实时调整,使得工作节点的处理能力能够符合业务处理的实时需求。

Description

流计算系统中的数据处理方法、控制节点及流计算系统 技术领域
本发明涉及计算机技术领域,尤其涉及一种流计算系统中的数据处理方法、控制节点及流计算系统。
背景技术
当前,数据密集型业务已经得到了广泛的应用,典型的数据密集型业务包括金融服务、网络监控、电信数据管理、Web应用等等。在数据密集型业务中,数据具有量大、快速、时变的特点,因此数据不宜用持久稳定关系建模,而适宜用瞬态数据流建模,由此产生了数据流计算的研究。数据流计算是一种类似流水线式的数据处理模式,数据流计算来自于一个理念:数据的价值随着时间的流逝而降低,所以由事件触发生成数据之后必须尽快地对其进行处理,最好数据产生时便立刻对其进行处理,即发生一个事件就立即进行一次数据处理,而不是将数据缓存起来批量处理。
在流计算系统中,数据流计算是基于流式数据处理模型进行的,如图1所示,业务数据处理逻辑通常需要转化为无回路有向图(Directed Acyclic Graph,DAG;或称为流图)所示的数据处理模式,图中的算子(Operator)承载实际的数据处理操作,数据流(stream)代表各Operator间的数据传输,其中所有Operator可以被分布化执行。
现有技术中,一种设置数据流计算的流式数据处理模型的方案是:物理单元(physical equipment,PE,或者称为执行单元)与逻辑单元(在DAG图中一般用Operator标示,或称为工作节点)为多对一的关系,该方案支持静态配置Operator的并发度,即根据用户静态配置的Operator的并发度,每个Operator在业务执行过程中,根据该并发度调用对应个数的执行单元对业务产生的数据流进行处理。
因为流计算系统通常是分布式的实时流处理系统,所以系统中各种任务 的处理情况都是实时变化的。针对实时变化的情况,最初用户设置的并发度很多时候都不是最优的,所以根据最初用户设置的并发度生成的流式数据处理模型并不能适应系统的实时变化,从而会导致流计算系统的资源浪费,数据处理能力受到较大限制。
发明内容
本发明提供一种流计算系统中的数据处理方法、控制节点及流计算系统,以根据业务处理情况实时调整流计算系统中工作节点的并发度,从而提升流计算系统的数据处理能力及资源利用率。
第一方面,本发明提供一种流计算系统中的数据处理方法,所述流计算系统包括控制节点和多个工作节点,该方法包括:
所述控制节点根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
所述控制节点收集所述一个或多个工作节点中每个工作节点与其它工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;
所述控制节点根据收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;
所述控制节点分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
结合第一方面,在第一种可能的实现方式中,每个工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;则所述控制节点按照该工作节点的优化并发度对该工作节点的并发度进行调整,包括:
所述控制节点根据该工作节点的优化并发度为该工作节点新增至少一个 执行单元,或删除该工作节点的至少一个执行单元,使得该工作节点当前包含的执行单元个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
结合第一方面的第一种可能的实现方式,在第二种可能的实现方式中,所述控制节点根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,包括:
当该工作节点的优化并发度大于该工作节点的并发度时:所述控制节点生成用于创建新的执行单元的第一控制指令并发送给该工作节点,使得该工作节点接收到所述第一控制指令后创建至少一个新执行单元,并创建所述新执行单元与其它执行单元的数据通道,其中该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同;
当所述工作节点的优化并发度小于该工作节点的并发度时:所述控制节点生成用于删除所述工作节点的执行单元的第二控制指令并发送给该工作节点,使得该工作节点接收到所述第二控制指令后删除该工作节点的至少一个执行单元,并删除与所述删除的执行单元连接的数据通道,其中该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
结合第一方面的第一或二种可能的实现方式,在第三种可能的实现方式中,在根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元之后,该方法还包括:
所述控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量;
所述控制节点向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
结合第一方面的第一或二种可能的实现方式,在第四种可能的实现方式 中,在根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元之后,该方法还包括:
控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略;
控制节点向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
第二方面,本发明提供一种流计算系统中的控制节点,所述流计算系统包括所述控制节点和多个工作节点,该控制节点包括:
调用单元,用于根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
信息收集单元,用于收集所述一个或多个工作节点中每个工作节点与其它工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;
计算单元,用于根据所述信息收集单元收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;
调整单元,用于分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
结合第二方面,在第一种可能的实现方式中,每个工作节点包含一个或多个执行单元,且每个工作节点通过调用自身的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;在所述按照该工作节点的优化并发度对该工作节点的并发度进行调整的方面,所述调整单元具体用于,
根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或 删除该工作节点的至少一个执行单元,使得该工作节点当前包含的执行单元个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
结合第二方面的第一种可能的实现方式,在第二种可能的实现方式中,在所述根据该工作节点的优化并发度为该工作节点新增或删除至少一个执行单元的方面,所述调整单元包括:
第一调整模块,用于当该工作节点的优化并发度大于该工作节点的并发度时:生成用于新增执行单元的第一控制指令并发送给该工作节点,使得该工作节点接收到所述第一控制指令后创建至少一个新的执行单元,并创建所述新的执行单元与其它执行单元的数据通道;其中,该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同;
第二调整模块,当所述工作节点的优化并发度小于该工作节点的并发度时:生成用于删除与所述工作节点的执行单元的第二控制指令并发送给该工作节点,使得该工作节点接收到所述第二控制指令后删除该工作节点的至少一个执行单元,并删除与所述删除的执行单元连接的数据通道;其中,该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
结合第二方面的第一或二种可能的实现方式,在第三种可能的实现方式中,该控制节点还包括:
第一派发策略调整单元,用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元,其中,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量。
结合第二方面的第一至二种可能的实现方式,在第四种可能的实现方式中,该控制节点还包括:
第二派发策略调整单元,用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
第三方面,本发明提供一种流计算系统,所述流计算系统包括:控制节点和多个工作节点;
所述控制节点用于,根据所述流计算系统中配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
所述工作节点,用于在所述控制节点的调用下,对所述数据流进行处理;
所述控制节点还用于,收集所述一个或多个工作节点中每个工作节点与其他工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;根据收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;并分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
结合第三方面,在第一种可能的实现方式中,所述工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;则在所述按照该工作节点的优化并发度对该工作节点的并发度进行调整的方面,所述控制节点具体用于:
根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,使得该工作节点当前包含的执行单元个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
结合第三方面的第一种可能的实现方式,在第二种可能的实现方式中,所述控制节点还用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元,其中,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量。
结合第三方面的第二种可能的实现方式,在第三种可能的实现方式中,所述控制节点还用于控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
本发明实施例提供的技术方案,通过在系统运行的过程中实时收集各个工作节点的处理速度信息和工作节点间的流量信息,并根据实时收集的信息对工作节点的并发度进行调整,使得工作节点的处理能力能够符合业务处理的实时需求,从而达到动态提升流计算系统的数据处理能力和资源利用率。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为现有技术中一个DAG图的示意图;
图2为本发明实施例提供的一种流计算系统中的数据处理方法的流程图;
图3为本发明实施例元组处理时间和元组到达时间的对应关系示意图;
图4为本发明实施例中优化并发度计算示意图;
图5为本发明实施例中一个DAG图片段的示意图;
图6为本发明实施例提供的另一种流计算系统中的数据处理方法的流程图;
图7为本发明实施例提供的一种流计算系统中的控制节点的结构示意图;
图8为本发明实施例提供的一种流计算系统的示意图;
图9为本发明实施例提供的另一种流计算系统中的控制节点结构示意图。
具体实施方式
为使本发明的目的、技术方案、及优点更加清楚明白,下面结合附图并举实施例,对本发明提供的技术方案进一步详细描述。显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
下面结合附图对本申请实施例技术方案的主要实现原理、具体实施方式及其对应能够达到的有益效果进行详细的阐述。
如图2所示,本发明实施例提供一种流计算系统中的数据处理方法,本发明实施例所提供的方法可以使用在流计算系统中,通常流计算系统包括控制节点和多个工作节点(也称为算子,Operator),控制节点可以发送对应的控制指令给下属的工作节点,从而使得工作节点根据控制指令调用执行单元对业务产生的数据流进行处理。
本发明实施例提供的数据处理方法该方法包括:
步骤201,所述控制节点根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
需要说明的是,本发明所说的工作节点,在流计算系统中一般也称为算子(Operator),本发明实施例不对两者作特别区分;流计算系统中各个工作 节点的并发度是被配置在描述业务处理逻辑的流图(也称为无回路有向图)中的,在流计算系统中部署业务时,通常会根据业务需求,配置各个工作节点的并发度的初始值(也称为初始并发度),然后,控制节点根据配置的各个工作节点的并发度调用一个或多个工作节点对业务产生的数据流进行处理,其中,流图是流计算系统中业务数据处理逻辑的一种通用表示形式,关于流图的具体介绍,请参见背景技术部分的描述,此处不再赘述。
还需要说明的是,每个工作节点包含一个或多个执行单元,当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元来处理数据流;执行单元具体可以为线程或进程;工作节点的并发度用于表征工作节点与执行单元之间的对应关系,具体而言,工作节点的并发度就表示工作节点包含的执行单元的个数,例如工作节点A的并发度为5,就表示工作节点可以调用5个执行单元来对数据流进行处理,本步骤中工作节点的并发度,是指为工作节点初始配置的并发度。
步骤202,所述控制节点收集所述一个或多个工作节点中每个工作节点与其他工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;
需要说明的是,工作节点之间流量信息是指:在流图中具有逻辑上下游关系的工作节点间的数据流量信息;工作节点的速度信息表征工作节点处理数据的速度,工作节点处理数据的速度是由工作节点的并发度、数据流量等因素决定的。
步骤203,所述控制节点通过收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;
工作节点的优化并发度,是指与工作节点当前的负载情况相匹配的并发度,因为现有技术中,只是依据初始配置的并发度调用执行单元进行数据流处理,但是因为在进行实际的数据处理之前不能估算到具体的处理情况,所以很多时候初始并发度并不能达到最优化的效果。所以针对现有技术中的上述问题,本发明实施例所提供的方法会对流计算系统中各工作节点的实际负 载情况进行采集(即收集到的流量信息和处理速度信息),然后根据采集的负载情况计算出相匹配的优化并发度。该优化并发度能够与工作节点的数据处理情况匹配,从而避免资源浪费以及执行单元不能满足数据处理需求的问题。
步骤204,所述控制节点分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
可以理解的是,如果工作节点的优化并发度与其并发度相同,则不需要对其调整,维持其当前的并发度即可。另外,通过收集到的实时数据流量信息和处理速度信息确定每个工作节点的优化并发度的具体算法可以采用多种实现形式,下面以一个具体的实例来阐述,但不应将其理解为实现本发明的唯一方式。
在流计算系统中,如果数据流中的元组(或称作数据包)到达工作节点的时间与执行单元处理元组的时间匹配时,系统的资源利用率最优,这有利于充分发挥系统的处理能力。如果元组到达时间小于元组处理时间则代表执行单元过载,系统会出现元组堆积。
如图3所示,对于一个执行单元而言:如果元组到达执行单元的时间1大于元组业务处理时间(即元组业务处理时间),则代表所述执行单元较空闲(如图3中元组达到时间1与元组处理时间的关系);如果元组到达时间a小于元组业务处理时间则代表执行单元负载过大(如图3中元组达到时间2与元组处理时间的关系)。其中,元组到达时间为元组到达执行单元的平均时间间隔,元组业务处理时间为执行单元平均处理一个元组所需时间,元组到达时间和元组业务处理时间根据收集的工作节点间流量信息、处理单元的处理速度数据计算获得。
针对图3所示的两种情况都需要对工作节点的并发度进行调整,从而实现元组业务处理时间与元组到达时间一致。元组到达时间与工作节点并发度的运算关系为:并发度越大,元组到达时间越大。
以图4所示的流图为例,对本发明实施例中,计算所示优化并发度方法 作进一步详细的说明:
从流图(即DAG图)的源头开始逐层进行计算每个算子的并发度,计算顺序为A、B、C、D、E、F。Dop代表并发度值,计算公式dop≈元组处理时间/元组到达时间。元组处理时间和元组发出时间是根据实际业务处理的统计信息上报获得。元组到达时间是根据上游节点的元组发出时间计算获得,例如图4中工作节点A(或称为算子A)为源头节点,元组到达时间根据吞吐量获得,即1s/67≈15ms,图中节点C的元组到达时间根据A、B的元组发出时间计算,由于A并发度为2,则相当于A在15ms中发射2个元组,B在15ms发射1个元组,所以C的元组到达时间为15ms/(1+2)=5ms。所以节点C的优化dop≈15/5。
以上优化并发度的计算方式是本发明实施例的一种可选实现方式,本发明的优化并发度计算方式并不局限于上述方式。因为在具体的应用环境中,不同流计算系统的需求不同、具体设备性能不同等各种因素的影响,使得计算优化并发度的方式各不相同,本发明实施例所提供的方法适用于任何一种根据实时情况计算出优化并发度后,再根据优化并发度对流计算系统中的处理逻辑进行调整的场景。
当每个工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,根据上述方法算出每个工作节点的优化并发度后,就可以利用算出来的优化并发度确定最初设置的每个工作节点的并发度是否符合当前的处理情况,如果不匹配则可以对工作节点的并发度进行调整,则所述按照所述优化并发度对所述任一工作节点的并发度进行调整包括:
根据需要调整的工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,使得调整后的该工作节点的并发度与对应的优化并发度相同。
在本发明实施例中,调整后的并发度与所述优化并发度相同是最优,也是最理想的实现方式,但在具体的实施过程中,由于受到其它客观条件的限制,可以根据实际情况调整并发度使得调整后的并发度与所述优化并发度相 关或者是接近。调整后的具体效果是,所述工作节点的数据处理能力更能适应当前系统数据处理的需求。
最初设置的并发度相对于实际需要的并发度而言存在两种情况,即过高或者是过低,针对这两种情况以下对删除或是增加执行单元的可选实现方式进行详细的说明,具体包括:
一、当需要调整执行单元的工作节点的优化并发度大于该工作节点的初始并发度时:
控制节点生成用于创建新的执行单元的第一控制指令并发送给该工作节点,使得该工作节点接收到所述第一控制指令后创建至少一个新执行单元,并创建所述新执行单元与其它执行单元的数据通道;经过调整后,该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
其中:创建所述新执行单元与其他执行单元的数据通道的具体实现可以是:
根据所述工作节点在所述流图中的逻辑位置确定与所述新的执行单元逻辑相连的第一上游工作节点和第一下游工作节点;所述第一上游工作节点对应至少一个第一上游执行单元,所述第一下游工作节点对应至少一个第一下游执行单元;
根据所述第一控制指令创建所述新的执行单元与所述第一上游执行单元以及第一下游执行单元间的数据通道。
在该实施例中,建立新的执行单元与上下游执行单元的数据通道时,为了避免数据遗漏,一般先建立所述新执行单元与下游执行单元的数据通道,再对应建立与上游的数据通道。
二、当所述工作节点的优化并发度小于该工作节点的初始并发度时:
控制节点生成用于删除与所述工作节点的执行单元的第二控制指令并发送给该工作节点,使得该工作节点接收到所述第二控制指令后删除该工作节点的至少一个执行单元,以及与删除的执行单元连接的数据通道;经过调整 后,该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
其中,工作节点删除执行单元的具体实现步骤可以是:
根据所述工作节点在所述流图中的逻辑位置确定与所述待删除的执行单元逻辑相连的第二上游工作节点和第二下游工作节点;所述第二上游工作节点对应至少一个第二上游执行单元,所述第二下游工作节点对应至少一个第二下游执行单元;
删除所述第二上游执行单元和第二下游执行单元与所述待删除的执行单元间的数据通道;
删除所述待删除的执行单元。
在该实施例中,需要删除一个或一些执行单元,为了保证数据的顺利处理,在删除执行单元时的具体操作可以是:
先断开待删除执行单元与上游执行单元的数据通道;然后等待待删除工作节点对未处理完的数据进行处理,处理完数据后,删除待删除执行单元与下游执行单元之间的数据通道;最后删除待删除的执行单元。
在本发明实施例中,因为新增或删除了执行单元,并发度进行调整的工作节点的上游工作节点在派发数据时,则需要对数据派发策略作对应的调整。例如:新加入了执行单元,则需要将数据派发到新加入的执行单元进行处理。所以本发明实施例所提供的方法在执行步骤204之后,还进一步包括:
若一工作节点对应新增或删除了至少一个执行单元,则对应调整所述工作节点对应的上游工作节点的数据派发策略。
在本发明实施例所提供的方法中,调整了工作节点的并发度,即相对于原有的工作节点而言,需要新增加或删除一定数量的执行单元。在调整执行单元的时候,如果不调整上游工作节点的数据派发策略则会造成数据处理出现问题。
在具体的应用场景中,需要根据下游执行单元的个数以及每个执行单元的处理能力对应生成所述数据派发策略,所以所述数据派发策略的具体实例 中包括数据派发的路径以及数据派发所对应的具体执行部件。
在本发明实施例中,为了提高数据派发的效率,提供以下两种可选的数据派发策略规划方案,具体包括:
方式一、二级数据派发:
根据新增或删除的至少一个执行单元,调整所述任一工作节点对应的上游工作节点的数据派发策略;
控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量;
所述控制节点向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略将数据包派发到下游的目标工作节点时;确定所述目标工作节点对应的目标执行单元,将所述数据包对应派发到所述目标执行单元。
如图5中以一个DAG图片段为例,其中n0代表上游operator,n1、n2、n3分别代表n0的下游operator,n0发射两条流s1和s2,其中n1和n2订阅s1流,n3订阅s2流。n1的并发度为1,由一个PE来执行,即由pe1执行n1;n2并发度为2,由两个PE来执行,即由pe2和pe3执行n2;n3并发度为3,由三个PE来执行,即由pe4、pe5、pe6执行n3。
当n0发射s1流的tuple0(元组)时,进行第一级派发,即选择目标operator,如图5所示,该实例中可选择的目标operator是n1和n2,然后针对n1和n2分别作第二级派发;当针对n1进行二级派发时,由于n1的并发度为1,所以直接确定将tuple0派发至pe1;当针对n2进行二级派发时,由于n2的并发度为2,需要根据n2配置的派发策略进行数据派发。在该实施例中可以配置为哈希派发,即首先对tuple0的一些属性字段进行哈希计算得到对应哈希值,然后按并发度求模,结果可作为索引值选取n2对应的PE。对于下游operator如果设定的多并发,并且该下游operator订阅上游某operator发射的流时,需设定相应的数据派发策略,该派发策略可以根据具体的实施例进行扩展,如 对应的实施例可以支持随机数据派发(即将流随机发给下游operator对应的某个PE)、全派发(即将流发给下游operator对应的所有PE)、哈希派发(即根据哈希求模的方式确定流发往某个PE)等。
方式二、多级数据派发:
控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略;
控制节点向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
本发明实施例所提供的方案在调整算子并发度的基础上还提供一种多级数据派发方案,所以能够在提升算子并发度的同时,还能保证数据的正确派发的效果。
如图6所示,为了更详细的说明本发明所提供的方法,以下结合本发明方法的适用环境对本发明所提供的方法做进一步的说明,本发明实施例所提供的方法适用于流计算系统,该流计算系统的控制节点根据配置的流图调用一个或多个工作节点对业务产生的数据流进行处理,所述流图中包含预先为各个工作节点设置的初始并发度,具体步骤包括:
步骤601,工作节点收集自身处理数据流的处理处理速度信息和与其他工作节点的数据流量信息,将收集到的处理速度信息和数据流量信息组合生成处理能力描述信息发送到对应的控制节点;
步骤602,所述控制节点收集调用的每个工作节点的处理能力描述信息;
步骤603,所述控制节点通过收集到的实时处理能力描述信息和所述流图确定每个工作节点的优化并发度;
步骤604,所述控制节点确定每个工作节点的优化并发度与该工作节点在所述流图中的初始并发度是否相同,如果不相同,根据所述优化并发度生成 控制指令,并发送所述控制指令到该工作节点;
步骤605,该工作节点接收到所述控制指令后,根据所述控制指令对自身的并发度进行调整。
针对分布式的实时流处理系统中各种任务的处理情况实时变化的特点,本发明实施例提供的方法是在系统运行的过程中实时的采集各个工作节点的处理情况,然后根据实时的处理情况对工作节点的并发度进行调整,使得工作节点的处理能力能够符合业务处理的实时需求,从而达到动态提升流计算系统的数据处理能力和资源利用率的效果。
如图7所示,根据上述方法本发明还提供一种流计算系统中的控制节点700,所述流计算系统包括控制节点和多个工作节点,该控制节点包括:
调用单元701,用于根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
信息收集单元702,用于收集所述一个或多个工作节点中每个工作节点与其他工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;
计算单元703,用于根据信息收集单元702收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;
调整单元704,用于分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的初始并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
可选的,每个工作节点包含一个或多个执行单元,且每个工作节点通过调用自身的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;在所述按照该工作节点的优化并发度对该工作节点的并发度进行调整的方面;所述调整单元704具体用于:
根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,使得调整后的该工作节点的并发度与对应的优化并发度一致。
最初设置的并发度相对于实际需要的并发度而言存在两种情况:过高或者是过低。则在所述根据该工作节点的优化并发度为该工作节点新增或删除至少一个执行单元的方面,所述则所述调整单元704包括:
第一调整模块,用于当该工作节点的优化并发度大于该工作节点的初始并发度时:用于生成用于新增执行单元的第一控制指令并发送给该工作节点,使得该工作节点接收到所述第一控制指令后创建至少一个新的执行单元,并创建所述新的执行单元与其它执行单元的数据通道;
第二调整模块,当所述工作节点的优化并发度小于该工作节点的初始并发度时:用于生成用于删除与所述工作节点的执行单元的第二控制指令并发送给该工作节点,使得该工作节点接收到所述第二控制指令后删除该工作节点的至少一个执行单元,并删除与所述删除的执行单元连接的数据通道。
可选的,本发明实施例所提供的方案在提升算子并发度的同时,还需要保证数据的正确派发,所以该装置还包括:
第一派发策略调整单元705,用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元,其中,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量。
第二派发策略调整单元706,用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
本发明实施例提供的流计算系统中的控制节点700用于实现前述方法实 施例中描述的数据处理方法,其具体的实现细节可以参照前述方法实施例,此处不再赘述。
本发明实施例提供的流计算系统中的控制节点,在流计算系统运行的过程中实时的采集各个工作节点的处理情况,然后根据实时的处理情况对工作节点的并发度进行调整,使得工作节点的处理能力能够符合业务处理的实时需求,从而达到动态提升流计算系统的数据处理能力和资源利用率的效果。
如图8所示,基于上述实施例所提供的数据处理方法,本发明实施例还提供一种流计算系统800,所述流计算系统800包括:控制节点801和多个工作节点802;
所述控制节点801用于,根据所述流计算系统中配置的各个工作节点802的并发度,调用所述多个工作节点中的一个或多个工作节点对业务产生的数据流进行处理;
工作节点802,用于在所述控制节点801的调用下,对业务产生的数据流进行处理;
控制节点801还用于,收集所述一个或多个工作节点中每个工作节点与其他工作节点间的数据流量信息,以及所述一个或多个工作节点中每个工作节点的处理速度信息;通过收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;并分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
优选地,工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;则在所述按照工作节点的优化并发度对工作节点的并发度进行调整的方面;所述控制节点801具体用于:
向工作节点802发送控制指令,所述控制指令根据工作节点802的优化并发度生成;
相应地,工作节点802还用于,根据所述控制指令新增至少一个执行单元,或删除工作节点802的至少一个执行单元,使得工作节点802当前包含的执行单元个数所表征的工作节点802的并发度与工作节点802的优化并发度相同。
优选地,所述控制节点801还用于根据新增或删除的至少一个执行单元,调整工作节点802对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元,其中,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量。
优选地,控制节点801还用于根据新增或删除的至少一个执行单元,调整工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
本发明实施例提供的流计算系统,通过在系统运行的过程中实时收集各个工作节点的处理速度信息和工作节点间的流量信息,并根据实时收集的信息对工作节点的并发度进行调整,使得工作节点的处理能力能够符合业务处理的实时需求,从而达到动态提升流计算系统的数据处理能力和资源利用率。
如图9所示,本发明还提供一种控制节点,用于执行前述各个实施例中的数据处理方法,该控制节点包括至少一个处理器901(例如CPU),至少一个网络接口902或者其他通信接口,存储器903,和至少一个通信总线904,用于实现这些装置之间的连接通信。处理器901用于执行存储器903中存储的可执行模块,例如计算机程序。存储器903可能包含高速随机存取存储器(RAM:Random Access Memory),也可能还包括非不稳定的存储器 (non-volatile memory),例如至少一个磁盘存储器。通过至少一个网络接口902(可以是有线或者无线)实现该系统网关与至少一个其他网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。
在一些实施方式中,存储器存储了程序9031,程序可以被处理器执行,以用于:根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;收集所述一个或多个工作节点中每个工作节点与其它工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;根据收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
本申请实施例中的上述一个或多个技术方案,至少具有如下的技术效果:
针对分布式的实时流处理系统中各种任务的处理情况实时变化的特点,本发明实施例提供的方法是在系统运行的过程中实时的采集各个工作节点的处理情况,然后根据实时的处理情况对工作节点的并发度进行调整,使得工作节点的处理能力能够符合业务处理的实时需求,从而达到动态提升流计算系统的数据处理能力和资源利用率的效果。
本发明所述的方法并不限于具体实施方式中所述的实施例,本领域技术人员根据本发明的技术方案得出其它的实施方式,同样属于本发明的技术创新范围。
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。

Claims (14)

  1. 一种流计算系统中的数据处理方法,所述流计算系统包括控制节点和多个工作节点,其特征在于,该方法包括:
    所述控制节点根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
    所述控制节点收集所述一个或多个工作节点中每个工作节点与其它工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;
    所述控制节点根据收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;
    所述控制节点分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
  2. 如权利要求1所述的方法,其特征在于,每个工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;则所述控制节点按照该工作节点的优化并发度对该工作节点的并发度进行调整,包括:
    所述控制节点根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,使得该工作节点当前包含的执行单元个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
  3. 如权利要求2所述的方法,其特征在于,所述控制节点根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,包括:
    当该工作节点的优化并发度大于该工作节点的并发度时:
    所述控制节点生成用于创建新的执行单元的第一控制指令并发送给该工作节点,使得该工作节点接收到所述第一控制指令后创建至少一个新执行单元,并创建所述新执行单元与其它执行单元的数据通道,其中该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同;
    当所述工作节点的优化并发度小于该工作节点的并发度时:
    所述控制节点生成用于删除所述工作节点的执行单元的第二控制指令并发送给该工作节点,使得该工作节点接收到所述第二控制指令后删除该工作节点的至少一个执行单元,并删除与所述删除的执行单元连接的数据通道,其中该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
  4. 如权利要求2或3所述的方法,其特征在于,在根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元之后,该方法还包括:
    所述控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量;
    所述控制节点向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
  5. 如权利要求2或3所述的方法,其特征在于,在根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元之后,该方法还包括:
    控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略;
    控制节点向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作 节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
  6. 一种流计算系统中的控制节点,所述流计算系统包括所述控制节点和多个工作节点,其特征在于,所述控制节点包括:
    调用单元,用于根据配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
    信息收集单元,用于收集所述一个或多个工作节点中每个工作节点与其它工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;
    计算单元,用于根据所述信息收集单元收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;
    调整单元,用于分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
  7. 如权利要求6所述的控制节点,其特征在于,每个工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;在所述按照该工作节点的优化并发度对该工作节点的并发度进行调整的方面,所述调整单元具体用于,
    根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,使得该工作节点当前包含的执行单元个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
  8. 如权利要求7所述的控制节点,其特征在于,在所述根据该工作节点的优化并发度为该工作节点新增或删除至少一个执行单元的方面,所述调整单元包括:
    第一调整模块,用于当该工作节点的优化并发度大于该工作节点的并发 度时:生成用于新增执行单元的第一控制指令并发送给该工作节点,使得该工作节点接收到所述第一控制指令后创建至少一个新的执行单元,并创建所述新的执行单元与其它执行单元的数据通道;其中,该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同;
    第二调整模块,当所述工作节点的优化并发度小于该工作节点的并发度时:生成用于删除与所述工作节点的执行单元的第二控制指令并发送给该工作节点,使得该工作节点接收到所述第二控制指令后删除该工作节点的至少一个执行单元,并删除与所述删除的执行单元连接的数据通道;其中,该工作节点当前包含的执行单元的总个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
  9. 如权利要求7或8所述的控制节点,其特征在于,该控制节点还包括:
    第一派发策略调整单元,用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元,其中,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量。
  10. 如权利要求7或8所述的控制节点,其特征在于,该控制节点还包括:
    第二派发策略调整单元,用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
  11. 一种流计算系统,其特征在于,所述流计算系统包括:控制节点和多个工作节点;
    所述控制节点用于,根据所述流计算系统中配置的各个工作节点的并发度,调用所述多个工作节点中的一个或多个工作节点对数据流进行处理;
    所述工作节点,用于在所述控制节点的调用下,对所述数据流进行处理;
    所述控制节点还用于,收集所述一个或多个工作节点中每个工作节点与其他工作节点之间的数据流量信息和所述一个或多个工作节点中每个工作节点的处理速度信息;根据收集到的数据流量信息和处理速度信息确定所述一个或多个工作节点中的每个工作节点的优化并发度;并分别确定所述一个或多个工作节点中的每个工作节点的优化并发度与该工作节点的并发度是否相同,如果不相同,则按照该工作节点的优化并发度对该工作节点的并发度进行调整。
  12. 如权利要求11所述的流计算系统,其特征在于,所述工作节点包含一个或多个执行单元,且当工作节点被调用对数据流进行处理时,具体由工作节点所包含的执行单元处理数据流;工作节点的并发度表示工作节点包含的执行单元的个数;则在所述按照该工作节点的优化并发度对该工作节点的并发度进行调整的方面,所述控制节点具体用于:
    根据该工作节点的优化并发度为该工作节点新增至少一个执行单元,或删除该工作节点的至少一个执行单元,使得该工作节点当前包含的执行单元个数所表征的该工作节点的并发度与该工作节点的优化并发度相同。
  13. 如权利要求12所述的流计算系统,其特征在于,所述控制节点还用于根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,在确定下游的目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元,其中,所述数据派发策略用于表示工作节点在派发数据时,接收数据的设备以及接收数据的设备在接收数据时的数据量。
  14. 如权利要求12所述的流计算系统,其特征在于,所述控制节点还用于控制节点根据新增或删除的至少一个执行单元,调整该工作节点对应的上游工作节点的数据派发策略,并向所述上游工作节点发送调整后的数据派发策略,使得所述上游工作节点根据调整后的数据派发策略,确定所述目标工作节点所属的工作节点组,所述工作节点组包括至少一个工作节点;并从所述工作节点组中确定下游的目标工作节点,以及在确定所述目标工作节点对应的目标执行单元后,将数据包对应派发到所述目标执行单元。
PCT/CN2015/071645 2014-03-06 2015-01-27 流计算系统中的数据处理方法、控制节点及流计算系统 WO2015131721A1 (zh)

Priority Applications (5)

Application Number Priority Date Filing Date Title
KR1020167027194A KR101858041B1 (ko) 2014-03-06 2015-01-27 스트림 컴퓨팅 시스템의 데이터 처리 방법, 제어 노드, 그리고 스트림 컴퓨팅 시스템
EP15759274.2A EP3115896A4 (en) 2014-03-06 2015-01-27 Data processing method in stream computing system, control node and stream computing system
JP2016555667A JP6436594B2 (ja) 2014-03-06 2015-01-27 ストリーム計算システムにおけるデータ処理方法、制御ノードおよびストリーム計算システム
US15/257,722 US10097595B2 (en) 2014-03-06 2016-09-06 Data processing method in stream computing system, control node, and stream computing system
US16/112,236 US10630737B2 (en) 2014-03-06 2018-08-24 Data processing method in stream computing system, control node, and stream computing system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410082041.XA CN103870340B (zh) 2014-03-06 2014-03-06 流计算系统中的数据处理方法、控制节点及流计算系统
CN201410082041.X 2014-03-06

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/257,722 Continuation US10097595B2 (en) 2014-03-06 2016-09-06 Data processing method in stream computing system, control node, and stream computing system

Publications (1)

Publication Number Publication Date
WO2015131721A1 true WO2015131721A1 (zh) 2015-09-11

Family

ID=50908902

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/071645 WO2015131721A1 (zh) 2014-03-06 2015-01-27 流计算系统中的数据处理方法、控制节点及流计算系统

Country Status (6)

Country Link
US (2) US10097595B2 (zh)
EP (1) EP3115896A4 (zh)
JP (1) JP6436594B2 (zh)
KR (1) KR101858041B1 (zh)
CN (2) CN103870340B (zh)
WO (1) WO2015131721A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596895A (zh) * 2020-12-02 2021-04-02 中国科学院计算技术研究所 一种sql语义感知的弹性倾斜处理方法及系统

Families Citing this family (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870340B (zh) 2014-03-06 2017-11-07 华为技术有限公司 流计算系统中的数据处理方法、控制节点及流计算系统
CN104063293B (zh) * 2014-07-04 2017-06-27 华为技术有限公司 一种数据备份方法及流计算系统
CN104216766B (zh) * 2014-08-26 2017-08-29 华为技术有限公司 对流数据进行处理的方法及装置
CN104320382B (zh) * 2014-09-30 2018-04-20 华为技术有限公司 分布式的实时流处理装置、方法和单元
CN104317556B (zh) * 2014-10-22 2018-03-16 华为技术有限公司 一种流式应用升级方法、主控节点及流计算系统
CN104572182B (zh) * 2014-12-23 2018-07-13 杭州华为数字技术有限公司 一种流应用的配置方法、节点及流计算系统
CN106339252B (zh) * 2015-07-08 2020-06-23 阿里巴巴集团控股有限公司 分布式dag系统的自适应优化方法和装置
CN105224805B (zh) * 2015-10-10 2018-03-16 百度在线网络技术(北京)有限公司 基于流式计算的资源管理方法及装置
GB2544049A (en) * 2015-11-03 2017-05-10 Barco Nv Method and system for optimized routing of data streams in telecommunication networks
CN105930203B (zh) * 2015-12-29 2019-08-13 中国银联股份有限公司 一种控制消息分发的方法及装置
CN105976242A (zh) * 2016-04-21 2016-09-28 中国农业银行股份有限公司 一种基于实时流数据分析的交易欺诈检测方法及系统
CN107678790B (zh) * 2016-07-29 2020-05-08 华为技术有限公司 流计算方法、装置及系统
US10572276B2 (en) 2016-09-12 2020-02-25 International Business Machines Corporation Window management based on a set of computing resources in a stream computing environment
CN108241525A (zh) * 2016-12-23 2018-07-03 航天星图科技(北京)有限公司 一种多节点任务动态控制方法
CN109408219B (zh) * 2017-08-16 2021-04-02 中国电信股份有限公司 分布式数据接收方法、系统和用于分布式数据接收的装置
CN107943579B (zh) * 2017-11-08 2022-01-11 深圳前海微众银行股份有限公司 资源瓶颈预测方法、设备、系统及可读存储介质
US10496383B2 (en) * 2017-12-20 2019-12-03 Intel Corporation Methods and apparatus to convert a non-series-parallel control flow graph to data flow
CN108628605A (zh) * 2018-04-28 2018-10-09 百度在线网络技术(北京)有限公司 流式数据处理方法、装置、服务器和介质
CN108984770A (zh) * 2018-07-23 2018-12-11 北京百度网讯科技有限公司 用于处理数据的方法和装置
CN109117355A (zh) * 2018-08-31 2019-01-01 百度在线网络技术(北京)有限公司 用于分析信息流系统性能的方法和装置
CN110297640B (zh) * 2019-06-12 2020-10-16 北京三快在线科技有限公司 模型部署的方法、装置、存储介质及电子设备
CN112256444B (zh) * 2019-07-22 2023-08-01 腾讯科技(深圳)有限公司 基于dag的业务处理方法、装置、服务器及存储介质
CN112561051A (zh) * 2019-09-26 2021-03-26 中兴通讯股份有限公司 一种对深度学习模型进行并行处理的方法及装置
CN110795151A (zh) * 2019-10-08 2020-02-14 支付宝(杭州)信息技术有限公司 算子并发度调整方法、装置和设备
CN111400008B (zh) * 2020-03-13 2023-06-02 北京旷视科技有限公司 计算资源调度方法、装置及电子设备
CN112202692A (zh) * 2020-09-30 2021-01-08 北京百度网讯科技有限公司 数据分发方法、装置、设备以及存储介质
CN115996228B (zh) * 2023-03-22 2023-05-30 睿至科技集团有限公司 一种基于物联网的能源数据的处理方法及其系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661406A (zh) * 2008-08-28 2010-03-03 国际商业机器公司 处理单元调度装置和方法
US20100306005A1 (en) * 2009-05-29 2010-12-02 Perceptive Software, Inc. Workflow Management System and Method
CN103246570A (zh) * 2013-05-20 2013-08-14 百度在线网络技术(北京)有限公司 Hadoop的调度方法、系统及管理节点
CN103870340A (zh) * 2014-03-06 2014-06-18 华为技术有限公司 流计算系统中的数据处理方法、控制节点及流计算系统

Family Cites Families (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5548724A (en) * 1993-03-22 1996-08-20 Hitachi, Ltd. File server system and file access control method of the same
US5742806A (en) * 1994-01-31 1998-04-21 Sun Microsystems, Inc. Apparatus and method for decomposing database queries for database management system including multiprocessor digital data processing system
US5673407A (en) * 1994-03-08 1997-09-30 Texas Instruments Incorporated Data processor having capability to perform both floating point operations and memory access in response to a single instruction
US6088452A (en) * 1996-03-07 2000-07-11 Northern Telecom Limited Encoding technique for software and hardware
US5764905A (en) * 1996-09-09 1998-06-09 Ncr Corporation Method, system and computer program product for synchronizing the flushing of parallel nodes database segments through shared disk tokens
US6230313B1 (en) * 1998-12-23 2001-05-08 Cray Inc. Parallelism performance analysis based on execution trace information
US6820262B1 (en) * 1999-07-22 2004-11-16 Oracle International Corporation Method for computing the degree of parallelism in a multi-user environment
US6757291B1 (en) * 2000-02-10 2004-06-29 Simpletech, Inc. System for bypassing a server to achieve higher throughput between data network and data storage system
US7418470B2 (en) * 2000-06-26 2008-08-26 Massively Parallel Technologies, Inc. Parallel processing systems and method
US6671686B2 (en) * 2000-11-02 2003-12-30 Guy Pardon Decentralized, distributed internet data management
US6954776B1 (en) * 2001-05-07 2005-10-11 Oracle International Corporation Enabling intra-partition parallelism for partition-based operations
US7069268B1 (en) * 2003-01-13 2006-06-27 Cisco Technology, Inc. System and method for identifying data using parallel hashing
US7457261B2 (en) * 2003-07-30 2008-11-25 Cisco Technology, Inc. Wireless network self-adaptive load balancer
JP2005108086A (ja) * 2003-10-01 2005-04-21 Handotai Rikougaku Kenkyu Center:Kk データ処理装置
JP4638250B2 (ja) * 2005-02-03 2011-02-23 三菱電機株式会社 プログラムコード生成支援装置及び方法並びにプログラムコード生成支援方法のプログラム
US8289965B2 (en) * 2006-10-19 2012-10-16 Embarq Holdings Company, Llc System and method for establishing a communications session with an end-user based on the state of a network connection
US9094257B2 (en) * 2006-06-30 2015-07-28 Centurylink Intellectual Property Llc System and method for selecting a content delivery network
US8743703B2 (en) * 2006-08-22 2014-06-03 Centurylink Intellectual Property Llc System and method for tracking application resource usage
US8238253B2 (en) * 2006-08-22 2012-08-07 Embarq Holdings Company, Llc System and method for monitoring interlayer devices and optimizing network performance
JP2008097280A (ja) * 2006-10-11 2008-04-24 Denso Corp 移動体用マルチコアcpuの制御装置、移動体用マイクロコンピュータ及び移動体操縦支援装置
US20090135944A1 (en) * 2006-10-23 2009-05-28 Dyer Justin S Cooperative-MIMO Communications
US8209703B2 (en) * 2006-12-08 2012-06-26 SAP France S.A. Apparatus and method for dataflow execution in a distributed environment using directed acyclic graph and prioritization of sub-dataflow tasks
WO2009078428A1 (ja) * 2007-12-18 2009-06-25 Nec Corporation データストリーム処理システム、方法及びプログラム
JP5149840B2 (ja) * 2009-03-03 2013-02-20 株式会社日立製作所 ストリームデータ処理方法、ストリームデータ処理プログラム、および、ストリームデータ処理装置
US20100306006A1 (en) * 2009-05-29 2010-12-02 Elan Pavlov Truthful Optimal Welfare Keyword Auctions
US8880524B2 (en) * 2009-07-17 2014-11-04 Apple Inc. Scalable real time event stream processing
JP2011034137A (ja) * 2009-07-29 2011-02-17 Toshiba Corp 分散処理装置及び分散処理方法
US8656396B2 (en) * 2009-08-11 2014-02-18 International Business Machines Corporation Performance optimization based on threshold performance measure by resuming suspended threads if present or by creating threads within elastic and data parallel operators
JP5395565B2 (ja) 2009-08-12 2014-01-22 株式会社日立製作所 ストリームデータ処理方法及び装置
CN101702176B (zh) * 2009-11-25 2011-08-31 南开大学 一种基于局部路径锁的xml数据并发控制方法
JP4967014B2 (ja) 2009-12-16 2012-07-04 株式会社日立製作所 ストリームデータ処理装置及び方法
EP2461511A4 (en) * 2010-01-04 2014-01-22 Zte Corp SERIAL PROCESSING METHOD, BIT RATE MATCHING PARALLEL PROCESSING METHOD, AND DEVICE THEREOF
JP2011243162A (ja) * 2010-05-21 2011-12-01 Mitsubishi Electric Corp 台数制御装置、台数制御方法及び台数制御プログラム
US8699344B2 (en) * 2010-12-15 2014-04-15 At&T Intellectual Property I, L.P. Method and apparatus for managing a degree of parallelism of streams
CN102082692B (zh) * 2011-01-24 2012-10-17 华为技术有限公司 基于网络数据流向的虚拟机迁移方法、设备和集群系统
US8695008B2 (en) * 2011-04-05 2014-04-08 Qualcomm Incorporated Method and system for dynamically controlling power to multiple cores in a multicore processor of a portable computing device
CN102200906B (zh) 2011-05-25 2013-12-25 上海理工大学 大规模并发数据流处理系统及其处理方法
US8997107B2 (en) * 2011-06-28 2015-03-31 Microsoft Technology Licensing, Llc Elastic scaling for cloud-hosted batch applications
US8694486B2 (en) * 2011-09-27 2014-04-08 International Business Machines Corporation Deadline-driven parallel execution of queries
CN103164261B (zh) * 2011-12-15 2016-04-27 中国移动通信集团公司 多中心数据任务处理方法、装置及系统
JP2013225204A (ja) * 2012-04-20 2013-10-31 Fujitsu Frontech Ltd トラフィック量予測に基づき、稼働サーバ台数を自動で最適化する負荷分散方法及び装置
US9002822B2 (en) * 2012-06-21 2015-04-07 Sap Se Cost monitoring and cost-driven optimization of complex event processing system
US9235446B2 (en) * 2012-06-22 2016-01-12 Microsoft Technology Licensing, Llc Parallel computing execution plan optimization
US9063788B2 (en) * 2012-08-27 2015-06-23 International Business Machines Corporation Stream processing with runtime adaptation
CA2883159C (en) * 2012-09-21 2018-09-04 Nyse Group, Inc. High performance data streaming
US9081870B2 (en) * 2012-12-05 2015-07-14 Hewlett-Packard Development Company, L.P. Streaming system performance optimization
US10051024B2 (en) * 2013-03-14 2018-08-14 Charter Communications Operating, Llc System and method for adapting content delivery
US9106391B2 (en) 2013-05-28 2015-08-11 International Business Machines Corporation Elastic auto-parallelization for stream processing applications based on a measured throughput and congestion
US20150039555A1 (en) * 2013-08-02 2015-02-05 International Business Machines Corporation Heuristically modifying dbms environments using performance analytics
US10069683B2 (en) * 2013-09-27 2018-09-04 Nxp Usa, Inc. Apparatus for optimising a configuration of a communications network device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101661406A (zh) * 2008-08-28 2010-03-03 国际商业机器公司 处理单元调度装置和方法
US20100306005A1 (en) * 2009-05-29 2010-12-02 Perceptive Software, Inc. Workflow Management System and Method
CN103246570A (zh) * 2013-05-20 2013-08-14 百度在线网络技术(北京)有限公司 Hadoop的调度方法、系统及管理节点
CN103870340A (zh) * 2014-03-06 2014-06-18 华为技术有限公司 流计算系统中的数据处理方法、控制节点及流计算系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3115896A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112596895A (zh) * 2020-12-02 2021-04-02 中国科学院计算技术研究所 一种sql语义感知的弹性倾斜处理方法及系统
CN112596895B (zh) * 2020-12-02 2023-09-12 中国科学院计算技术研究所 一种sql语义感知的弹性倾斜处理方法及系统

Also Published As

Publication number Publication date
US20180367584A1 (en) 2018-12-20
CN103870340B (zh) 2017-11-07
KR101858041B1 (ko) 2018-06-27
JP6436594B2 (ja) 2018-12-12
US20160373494A1 (en) 2016-12-22
KR20160127814A (ko) 2016-11-04
CN103870340A (zh) 2014-06-18
US10097595B2 (en) 2018-10-09
JP2017509075A (ja) 2017-03-30
US10630737B2 (en) 2020-04-21
EP3115896A1 (en) 2017-01-11
CN107729147A (zh) 2018-02-23
EP3115896A4 (en) 2017-04-05
CN107729147B (zh) 2021-09-21

Similar Documents

Publication Publication Date Title
WO2015131721A1 (zh) 流计算系统中的数据处理方法、控制节点及流计算系统
WO2021213004A1 (zh) 一种微服务管理系统、部署方法及相关设备
CN103812949B (zh) 一种面向实时云平台的任务调度与资源分配方法及系统
KR101781063B1 (ko) 동적 자원 관리를 위한 2단계 자원 관리 방법 및 장치
US20160269247A1 (en) Accelerating stream processing by dynamic network aware topology re-optimization
WO2019001092A1 (zh) 负载均衡引擎,客户端,分布式计算系统以及负载均衡方法
US20160188376A1 (en) Push/Pull Parallelization for Elasticity and Load Balance in Distributed Stream Processing Engines
WO2021012663A1 (zh) 一种访问日志的处理方法及装置
WO2013104217A1 (zh) 基于云基础设施的针对应用系统维护部署的管理系统和方法
CN102427475A (zh) 一种云计算环境中负载均衡调度的系统
CN108092895A (zh) 一种软件定义网络联合路由选择及网络功能部署方法
Liu et al. Service resource management in edge computing based on microservices
WO2015123974A1 (zh) 一种数据分发策略的调整方法、装置及系统
Kettimuthu et al. An elegant sufficiency: load-aware differentiated scheduling of data transfers
Xia et al. A QoE-aware service-enhancement strategy for edge artificial intelligence applications
Wang et al. Task scheduling for MapReduce in heterogeneous networks
WO2015196940A1 (zh) 一种流处理方法、装置及系统
CN103176850A (zh) 一种基于负载均衡的电力系统网络集群任务分配方法
US20190065555A1 (en) System, method of real-time processing under resource constraint at edge
CN112995241B (zh) 服务调度方法和装置
CN104753751B (zh) 一种动态确定虚拟网络的方法及系统
CN111782354A (zh) 一种基于强化学习的集中式数据处理时间优化方法
CN111966497B (zh) 一种广域网环境中分布式系统的计算任务分配方法
KR20140076956A (ko) 클라우드 네트워크의 노드 제어 시스템
Li et al. Research on Traffic Modeling Based on Integrated In-Queue Shaping

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15759274

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2016555667

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2015759274

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015759274

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 1020167027194

Country of ref document: KR