CN110795151A

CN110795151A - Operator concurrency degree adjusting method, device and equipment

Info

Publication number: CN110795151A
Application number: CN201910948545.8A
Authority: CN
Inventors: 方丰斌; 王东旭; 周家英
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-10-08
Filing date: 2019-10-08
Publication date: 2020-02-14

Abstract

The embodiment of the specification discloses a method, a device and equipment for adjusting operator concurrency. In one embodiment, the method is applied to a control node in a stream computing system, the stream computing system includes the control node and at least one working node, and at least one operator is distributed on the working node, and the method includes: receiving an operation index of an operator on a working node, wherein the operation index is used for representing the data processing capacity of the operator; determining a concurrency degree adjustment strategy of a target operator needing concurrency degree adjustment based on the operation index of the operator; and sending a concurrency degree adjustment strategy to the target working node so that the target working node adjusts the concurrency degree of the target operator, wherein the target working node comprises at least one of a working node distributed by the target operator to be newly added and a working node distributed by the target operator to be stopped.

Description

Operator concurrency degree adjusting method, device and equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of data processing, and in particular, to a method, an apparatus, and a device for adjusting operator concurrency.

Background

The flow calculation is a data processing mode similar to a pipeline, and data to be processed continuously enters a flow calculation system like a pipeline. The stream computing system acquires mass data from different data sources in real time, and valuable information is acquired through real-time analysis and processing.

In a stream computing system, the stream computing system generally includes a control node and at least one worker node, with at least one operator distributed on each worker node. The operator is the smallest unit that performs the computational logic, carrying the actual data processing operations. Business data processing logic in a stream computing system is typically represented as a direct Acyclic Directed Graph (DAG) that indicates the direction of data streams among multiple operators. The data stream is used to represent the data transfer between operators. In a stream computing system, the operator concurrency, which is used to describe the degree of parallelism when operators with the same computing logic operate cooperatively, can be configured.

In many service data processing scenarios, it is desirable to improve the real-time performance of data processing. For example, in a financial service scenario, as traffic volume increases, the initial operator concurrency configuration is not sufficient to support traffic processing, and in such a case, the operator concurrency configuration needs to be adjusted. At present, concurrent configuration of operators is adjusted manually, and efficiency is low.

Disclosure of Invention

One or more embodiments of the present specification provide a method, an apparatus, and a device for adjusting operator concurrency, which can implement automatic adjustment of operator concurrency, and improve efficiency of adjusting operator concurrency.

The technical scheme provided by one or more embodiments of the specification is as follows:

in a first aspect, an embodiment of the present specification provides an operator concurrency adjustment method, which is applied to a control node in a stream computing system, where the stream computing system includes the control node and at least one working node, and at least one operator is distributed on the working node, and the method includes:

receiving an operation index of an operator on a working node, wherein the operation index is used for representing the data processing capacity of the operator;

determining a concurrency degree adjustment strategy of a target operator needing concurrency degree adjustment based on the operation index of the operator;

and sending the concurrency degree adjustment strategy to a target working node to enable the target working node to adjust the concurrency degree of the target operator, wherein the target working node comprises at least one of a working node distributed by the target operator to be newly added and a working node distributed by the target operator to be stopped.

In a second aspect, an embodiment of the present specification provides an operator concurrency adjustment apparatus, which is applied to a control node in a stream computing system, where the stream computing system includes the control node and at least one working node, and at least one operator is distributed on the working node, and includes:

the operation index receiving module is used for receiving operation indexes of operators on the working nodes, and the operation indexes are used for representing the data processing capacity of the operators;

the concurrency degree adjustment strategy determining module is used for determining a concurrency degree adjustment strategy of a target operator needing concurrency degree adjustment based on the operation indexes of the operators;

and the sending module is used for sending the concurrency degree adjustment strategy to a target working node so as to enable the target working node to adjust the concurrency degree of the target operator, wherein the target working node comprises at least one of a working node distributed by the target operator to be newly added and a working node distributed by the target operator to be stopped.

In a third aspect, an embodiment of the present specification provides an operator concurrency adjustment device, including: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the operator concurrency adjustment method provided by the embodiments of the specification.

In the embodiment of the present specification, by receiving the operation index of the operator reported by the working node, a target operator that needs to be subjected to concurrency adjustment and a concurrency adjustment policy of the target operator can be automatically determined. The control node sends a concurrency degree adjusting strategy to the target working node, so that the target working node adjusts the concurrency degree of the target operator without manually configuring the operator concurrency degree, and the adjusting efficiency of the operator concurrency degree is improved.

Drawings

In order to more clearly illustrate the technical solutions of one or more embodiments of the present disclosure, the drawings needed to be used in one or more embodiments of the present disclosure will be briefly described below, and other drawings can be obtained by those skilled in the art without creative efforts.

FIG. 1 is an architectural diagram of a stream computing system provided in one embodiment of the present description;

FIG. 2 is a schematic diagram of a DAG, according to an embodiment of the present disclosure;

fig. 3 is a schematic flow chart of an operator concurrency adjustment method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of an operator concurrency adjustment apparatus according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of an operator concurrency adjustment device according to an embodiment of the present disclosure.

Detailed Description

Features and exemplary embodiments of various aspects of the present specification will be described in detail below, and in order to make objects, technical solutions and advantages of the specification more apparent, the specification will be further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely configured to explain the present specification and are not configured to limit the present specification. It will be apparent to one skilled in the art that the present description may be practiced without some of these specific details. The following description of the embodiments is merely intended to provide a better understanding of the present specification by illustrating examples thereof.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

FIG. 1 illustrates an architectural diagram of a stream computing system provided by one embodiment of the present description. As shown in fig. 1, the stream computing system includes a control node and a plurality of work nodes. When the cluster is deployed, one or more control nodes and one or more working nodes can be provided, and the control nodes can be physical nodes separated from the working nodes. In a standalone deployment, the control node and the working node may be logical units deployed on the same physical node. The physical node may specifically be a computer or a server.

The control node is used for controlling the working node to create an operator on the control node and managing the operator on the working node. The control node may send a corresponding control instruction to the working node, so that the working node invokes an operator to process a data stream generated by the service according to the control instruction.

Generally, a physical node is a working node, and in some cases, a physical node may correspond to a plurality of working nodes, and the number of working nodes corresponding to a physical node depends on the hardware physical resources of the physical node. A worker node may be understood to be a share of hardware physical resources. The working nodes corresponding to the same physical node communicate with each other in a process communication mode, and the working nodes corresponding to different physical nodes communicate with each other in a network communication mode.

As shown in fig. 2, the operators are represented by circles and the data flow direction is represented by arrows. In the DAG shown in fig. 2, the first operator 1 that receives input data is a source operator, the operator 6 that outputs the processing result is an output operator, and the operators other than the source operator and the output operator are intermediate operators. The DAG indicated stream computing system shown in fig. 2 contains 6 operators: operator 1, operator 2, operator 3, operator 4, operator 5, and operator 6, and indicate the data flow trends between these 6 operators. The trend of the data streams among the operators also reflects the dependency relationship between the input data streams and the output data streams of the operators.

In fig. 2, the output data stream of operator 1 flows to operator 2 and operator 3, operator 2 processes the output data stream of operator 1, and operator 2 processes the output data stream of operator 1, that is, the output data streams of operator 2 and operator 3 depend on the input streams of operator 1 to operator 2 and operator 3 (that is, the output streams of operator 1), and are also generally referred to as operators 2 and operator 3, which are downstream of operator 1. And operator 1 is an upstream operator of operator 2 and operator 3. It will be appreciated that the upstream operator and the downstream operator are determined according to the data flow direction between the operators.

As shown in fig. 1, the stream computing system includes a control node, a worker node 1, a worker node 2, and a worker node 3. The control node controls the working nodes (i.e., working node 1, working node 2, and working node 3) to configure the 6 operators on the working nodes according to the DAG as shown in fig. 2 to process the input data. Specifically, the control node controls the working node 1 to configure an operator 1 and an operator 2 on the working node 1 according to the DAG; controlling the working node 2 to configure an operator 3 and an operator 4 on the working node 2; and controlling the working node 3 to configure an operator 5 and an operator 6 on the working node 3. It can be seen that after configuration, the dataflow trends between the configured operators on worker node 1, worker node 2, and worker node 3 match the flow graph as shown in fig. 2.

The concurrency of each operator in the flow computing system is configured in a DAG (also called as a flow graph), and when a service is deployed in the flow computing system, an initial value of the concurrency of each operator is generally configured according to a service requirement. And then, the control node calls one or more operators to process the data stream generated by the service according to the configured concurrency of each operator.

The concurrency of an operator refers to the total number of all running operators having the same operator identification with the operator. As an example, the concurrency of operator 1 is the total number of all running operators 1.

In order to automatically adjust the concurrency of each operator and improve the adjustment efficiency of the concurrency of the operators, embodiments of the present specification provide an operator concurrency adjustment method, which is applied to control nodes. Fig. 3 is a schematic flow chart of an operator concurrency adjustment method 300 provided in an embodiment of the present specification, the method including the following steps:

s310, receiving an operation index of an operator on the working node, wherein the operation index is used for representing the data processing capacity of the operator.

And S320, determining a concurrency degree adjustment strategy of a target operator needing concurrency degree adjustment based on the operation indexes of the operators.

S330, sending a concurrency degree adjusting strategy to the target working node to enable the target working node to adjust the concurrency degree of the target operator, wherein the target working node comprises at least one of a working node distributed by the target operator to be newly added and a working node distributed by the target operator to be stopped.

In the embodiment of the present specification, before determining the concurrency adjustment policy of the target operator, it is necessary to determine which operators are the target operators that need to be subjected to concurrency adjustment. Therefore, in step S320, a total operation index of the operator corresponding to each operator identifier is obtained based on the operation indexes of all the operators. And then, for the operator corresponding to each operator identification, determining whether the first operator is a target operator needing concurrency adjustment or not based on the total operation index of the first operator corresponding to one operator identification and the total operation index of the first upstream operator of the first operator.

It should be noted that, for any operator, the total operation index of the operator is the sum of the operation indexes of all the operating operators having the same operator identification as the operator.

That is, for each worker node, the worker node reports the operation index of each operator distributed on the worker node. And after the control node acquires the operation indexes of all operators, summarizing the operation indexes of the operators of each operator identification to obtain the total operation index of the operator corresponding to each operator identification.

As one example, the operation metrics include an operator's input transaction Throughput Per Second (TPS) and output TPS.

Wherein, the input TPS of the operator represents the data input quantity of the operator per second, and the output TPS of the operator represents the data output quantity of the operator per second.

Wherein the total operation index of the first operators is a first sum of input TPS of each first operator, and the total operation index of the first upstream operators is a second sum of output TPS of each first upstream operator.

In step S320, if the difference between the second sum and the first sum is greater than the first preset TPS threshold, it is determined that the first operator is a target operator that needs to be subjected to concurrency increase. And if the difference value obtained by subtracting the second sum from the first sum is larger than a second preset TPS threshold value, determining the first operator as a target operator needing concurrency reduction.

The first preset TPS threshold and the second preset TPS threshold may be equal to or unequal to each other, and are not limited herein.

After determining the target operator that needs to be adjusted in concurrency, in step S320, the total operation index of the target operator and the total operation index of the upstream operator of the target operator are obtained. And finally, determining a concurrency degree adjustment strategy of the target operator based on the total operation index of the target operator and the total operation index of the upstream operator of the target operator.

And the total operation index of the target operator is the sum of the operation indexes of all the operating operators with the same operator identification as the target operator. The total operation index of the upstream operator of the target operator is the sum of the operation indexes of all the operating operators with the same operator identification as the upstream operator of the target operator.

As an example, the total operation index of the target operators is the sum of the input TPS of each target operator, i.e. the total input TPS. The total operation index of the upstream operator of the target operator is the sum of the output TPS of the upstream operator of each target operator, namely the total output TPS.

As an example, if the target operator is an operator that needs to be added concurrently, the number of the target operators that need to be added newly can be obtained based on the difference between the total output TPS of the upstream operator of the target operator and the total input TPS of the target operator, and the input TPS of a single target operator.

As an example, if the target operator is an operator that needs to be subjected to concurrency reduction, the number of target operators that need to be stopped may be obtained based on the difference between the total input TPS of the target operator and the total output TPS of the upstream operator of the target operator, and the input TPS of a single target operator.

In the embodiment of the present specification, since there may be two cases of increasing or decreasing the concurrency of the target operator, the concurrency adjustment policy of the target operator may be a concurrency increase policy or a concurrency decrease policy.

In some embodiments of the present specification, the concurrency increase policy includes the number of target operators to be newly added and identification information of target working nodes distributed by each target operator to be newly added.

As described above, the control node may determine the number of target operators to be newly added based on the total operation index of the target operator and the total operation index of the upstream operator of the target operator.

In the embodiment of the present specification, a plurality of original target operators which are concurrent may be distributed on different working nodes. If the concurrency of the target operators needs to be increased, the control node can judge the target working nodes distributed by the target operators to be newly increased based on the resource utilization rate of the machine (such as a server) where the working nodes where each original target operator is located are located.

If the resource utilization rate of the machine in which the working node in which each original target operator is located is high, the new working node can be used as the working node in which the target operator to be newly added is distributed. That is, the target worker node may be a newly created worker node.

If the resource utilization rate of the machine where the working node where a certain original target operator is located is low, the operator to be newly added can be deployed at the working node where the original target operator is located. That is, the target working node may be a working node where the original target operator is located.

It should be noted that, the target working nodes where each target operator to be newly added is located may be the same or different. The target working node is determined by the control node according to the resource utilization rate of the machine where the working node where each original target operator is located, and the specific mode is not limited herein.

In some embodiments of the present specification, the concurrency reduction policy includes the number of target operators to be stopped and identification information of target working nodes distributed by each operator to be stopped.

As above, the control node may determine the number of target operators to be stopped based on the total operation index of the target operator and the total operation index of the upstream operator of the target operator.

In an embodiment of the present specification, if the concurrency of the target operator needs to be reduced, the control node may determine, based on a resource utilization condition on a machine where a working node where each original target operator is located, a target working node where the target operator needs to be stopped is distributed.

If the resource utilization rate of the machine where the working node where a certain original target operator is located is high, some target operators on the working node can be used as target operators to be stopped, and the working node is the target working node of the target operators to be stopped.

It should be noted that the target work node where each target operator to be stopped is located may be the same or different. The target working node is determined by the control node according to the resource utilization rate of the machine where the working node where each original target operator is located, and the specific mode is not limited herein.

In the embodiment of the present specification, after the control node determines the concurrency adjustment policy of the target operator, the DAG is updated. The control node may determine upstream operators and downstream operators of the target operator from the DAG. And the control node determines target working nodes distributed by the target operators to be adjusted according to the resource utilization condition of the machine in which the working node where each original operator is located. And the target operator to be adjusted is a target operator to be increased or a target operator to be stopped.

In some embodiments of the present description, the concurrency adjustment strategies for each operator may be decoupled, i.e., independent of each other. And the master control node respectively sends the concurrency degree adjustment strategy of the corresponding target operator to be adjusted to the target working node distributed by each target operator to be adjusted, so that the target working node distributed by the target operator to be adjusted adjusts the target operator to be adjusted.

And if the target operator to be adjusted is the target operator to be increased, increasing the target operator to be increased by the target working node distributed by the target operator to be adjusted. And if the target operator to be adjusted is the target operator to be stopped, deleting the target operator to be stopped by the target working node distributed by the target operator to be stopped.

In the embodiment of the specification, by means of the breakdown into the single operator concurrent adjustment flows, the concurrent adjustment flows of each operator are decoupled, and if one operator fails, the operator concurrent adjustment which has been successful before does not need to be rolled back. Therefore, no matter how large the operator is concurrent, the influence of dynamic adjustment on the online service can be controlled at a fixed granularity, and the service availability is greatly improved.

In a stream computing scenario, stateful and stateless jobs may be supported. And under different processing scenes, the concurrent adjustment strategy of the target operator is different. The following describes specific implementation manners of adjusting operator concurrency under stateful operation and stateless operation, respectively.

(one) stateless operation

(1) Increased degree of concurrency

In a scenario of stateless operation, if the target operator needs to increase the concurrency, step S330 includes the following steps:

and A11, sending a creating instruction of a target operator to be newly added corresponding to each target working node distributed by the target operator to be newly added based on the identification information of each target working node, so that the target working node creates the target operator to be newly added based on the creating instruction, creates a data channel between the target operator to be newly added and the upstream operator thereof, and creates a data channel between the target operator to be newly added and the downstream operator thereof.

That is to say, after the control node determines the number of target operators to be added and the identification information of the target working node to which each target operator to be added is distributed, the control node may send, to the corresponding target working node, a creation instruction of the target operators to be added, which are distributed on the target working node, according to the identification information of each target working node.

For any target working node, after the target working node receives the creation instruction, the target operator to be newly added is created based on the creation instruction. The creating instruction comprises identification information of an upstream operator of the target operator to be newly added and identification information of a downstream operator of the target operator to be newly added. Therefore, the target working node also creates a data channel between the target operator to be newly added and the upstream operator thereof and a data channel between the target operator to be newly added and the downstream operator thereof based on the creation instruction.

It should be noted that the control node may determine, based on the DAG and the data distribution policy of the upstream operator of the original target operator, the identification information of the upstream operator of the target operator to be newly added and the identification information of the downstream operator of the target operator to be newly added.

And A21, sending the adjusted first data distribution strategy of the upstream operator of the created new target operator to each first working node, so that the first working node controls the upstream operator of the new target operator to distribute data to the new target operator.

And the first working node is a working node distributed by an upstream operator of the created newly added target operator.

In the embodiment of the specification, since the target operator is newly added, data needs to be dispatched to the newly added target operator for processing. Therefore, after the created new target operator establishes a data channel between the created new target operator and the upstream operator and the downstream operator of the new target operator, the control node sends a first data distribution strategy to the first working node where the upstream operator of the new target operator is located. And the first working node controls the upstream operator of the newly-added target operator to distribute data to the newly-added target operator based on the first data distribution strategy, so that the concurrency of the target operator is increased.

The data dispatching strategy is used for expressing a downstream operator receiving data and the data quantity when the downstream operator receives the data when the operator dispatches the data. The data serving policy may support random data serving, full serving, and hash serving, and is not limited herein.

(2) Concurrency reduction

In a scenario of stateless operation, if the target operator needs to reduce the concurrency, step S330 includes:

and B11, sending the adjusted second data dispatching strategy of the corresponding second upstream operator to each third working node, so that the third working node controls the second upstream operator to stop sending data to the target operator to be stopped.

And the second upstream operator is an upstream operator of the target operator to be stopped, and the third working node is a working node distributed by the second upstream operator.

In the embodiment of the present specification, if the concurrency of operators needs to be reduced, that is, the original target operator needs to be deleted, the data dispatch to the target operator to be stopped needs to be stopped. Therefore, the control node needs to adjust the data dispatching strategy of the second upstream operator of each target operator to be stopped, so that the upstream operator of the target operator to be stopped no longer sends data to the target operator to be stopped.

And B21, for each target operator to be stopped, sending a deleting instruction to the target working nodes distributed by the target operators to be stopped based on the identification information of the target working nodes distributed by the target operators to be stopped when the target operators to be stopped finish data processing, so that the target working nodes distributed by the target operators to be stopped delete the data channel between the target operators to be stopped and the upstream operator, delete the data channel between the target operators to be stopped and the downstream operator, and delete the target operators to be stopped.

In some embodiments of the present specification, when the second upstream operator stops sending data to the target operator to be stopped, the target operator to be stopped may have some previously received data that has not been processed, and therefore, the control node will send a delete instruction to the target work node where the target operator to be stopped is located when the target operator to be stopped completes data processing.

And when the target working node where the target operator to be stopped is located deletes the target operator to be stopped, the concurrency of the target operator is reduced. And deleting the target operator to be stopped, namely exiting the target operator to be stopped.

(II) stateful operation

(1) Increased degree of concurrency

In the scenario of stateful operation, if the concurrency of the target operator needs to be increased, step S330 further includes, before step a 11:

a10, reallocating all key field groups corresponding to original target operators, and allocating a corresponding first key field group (key group) to each target operator to be newly added.

In the embodiments of the present specification, the State (State) refers to an intermediate calculation result of an operator in the flow calculation process. If the concurrency of a certain operator needs to be adjusted, the states of all the concurrent operators need to be subdivided. When the State is divided again, the Key Group is used as the minimum granularity for dividing. The Key Group is an organizational structure of a set of Key fields (keys), which are atomic units assigned by State. Each operator may correspond to at least one key group.

And as the target operator needs to be newly added, the key groups corresponding to all the original target operators need to be subdivided, and part of the key groups belonging to the original target operators are divided into the target operators to be newly added.

Because the data volume corresponding to each key group is different, the control node can re-divide the key groups of all the original target operators according to the key groups corresponding to each original target operator, so that the data volume corresponding to each original target operator and each key group of the target operator to be newly added is approximately balanced.

Between step a11 and step a21, step S330 further includes:

and executing the following steps for each newly added target operator:

and A13, sending a first data sending stopping instruction to a second working node distributed by an upstream operator of the first original target operator, so that the second working node controls the upstream operator of the first original target operator to stop sending data corresponding to the first key field group to the first original target operator, wherein the first original target operator is the original target operator of the first key field group.

In the embodiment of the present specification, since the first key field group of the first original target operator is already allocated to the newly added target operator, the control node needs to control the upstream operator of the first original target operator to no longer send data corresponding to the first key field group to the first original target operator, and temporarily cache the data belonging to the first key field group.

A15, sending a first obtaining instruction to the target working nodes distributed by the newly added target operator, so that the target working nodes distributed by the newly added target operator control the newly added target operator to obtain the state information of the first key field set.

In the embodiment of the present specification, in a scenario of stateful operation, a newly added target operator needs to acquire state information of a first key field set, so as to continue processing data. Therefore, the control node needs to send the first obtaining instruction to the target working node distributed by the newly added target operator.

In some examples, the state information of the first key field set is sent to the distributed storage system by the first original target operator and obtained from the distributed storage system by the newly added target operator. That is to say, the state information of the first key field group is sent to the distributed storage system of the third party by the first original target operator, and then the newly added target operator is acquired from the distributed storage system.

By utilizing the distributed storage system to store the state information of the key fields, the occupied space of the memory of the machine where the working node is located can be saved.

And after the newly added target operator acquires the state information corresponding to the first key field, the control node sends the adjusted first data distribution strategy to the first working node. The first working node can control an upstream operator of the newly added target operator to send data corresponding to the first key field group to the newly added target operator. That is, the data distributed to the newly added target operator by the upstream operator of the newly added target operator is the data corresponding to the first key field group.

The first key field group corresponding to the newly added target operator is from the first original target operator, and the upstream operator of the newly added target operator is the upstream operator which issues data corresponding to the first key field group to the first original target operator.

It should be noted that the concurrent addition strategies for each operator to be added may be independent of each other. That is, after step a10, for N target operators to be added, steps a13 to a15 may be performed on the 1 st target operator to be added, and then the adjusted first data distribution policy of the upstream operator may be sent to the first working node where the upstream operator of the 1 st target operator to be added is located. And after the 1 st target operator to be newly added normally performs data processing, executing the steps A13-A15 on the 2 nd target operator to be newly added, and then sending the adjusted first data distribution strategy of the upstream operator to the first working node where the upstream operator of the 1 st target operator to be newly added is located. And sequentially processing each target operator to be newly added until the Nth newly added target operator normally performs data processing. N is an integer greater than or equal to 1.

That is to say, the concurrency adjustment strategies of each target operator to be newly added are independent from each other, that is, the concurrency adjustment flows of each operator are not affected by each other.

In other embodiments of the present description, each target operator to be newly added may be created together, and then the first original target operator corresponding to each target operator to be newly added may make a distributed snapshot together, that is, state information of the first key field group corresponding to each target operator to be newly added is sent to the distributed storage system. And then, loading the state information of the corresponding first key field group from the distributed storage system together with each newly added target operator.

(1) Concurrency reduction

In the scenario of stateful operation, if the concurrency of the target operator needs to be reduced, step S330 further includes, before step B11:

and B10, distributing the second key field group corresponding to each target operator to be stopped to other target operators which do not need to be stopped.

In the embodiment of the present specification, since each target operator to be stopped needs to be deleted, the key field group corresponding to each target operator to be stopped needs to be first allocated to other target operators that do not need to be stopped.

For any target operator to be stopped, the target operator to be stopped can allocate all key field groups of the target operator to an original target operator which does not need to be stopped, and can also be distributed to a plurality of original target operators which do not need to be stopped.

In the scenario of stateful operation, in step B11, the reason that the control node sends the adjusted second data dispatch policy of the second upstream operator corresponding to the control node to the third control node is to enable the third working node to control the second upstream operator to stop sending the data corresponding to the second key field group to the target operator to be stopped is that the second working node stops sending the data corresponding to the second key field group to the target operator to be stopped.

In step S330, after step B11 and before step B21, the method further includes:

for each operator to be stopped, performing the following steps:

and B13, sending a second obtaining instruction to a fourth working node distributed by a second original target operator, so that the fourth working node controls the second original target operator to obtain state information of a second key field group corresponding to the target operator to be stopped, wherein the second original target operator is the target operator to which the second key field group is distributed.

In the embodiment of the present specification, since the second key field group of the target operator to be stopped is allocated to the second original target operator, the second original target segment needs to acquire the state information of the second key field group.

As an example, the state information of the second key field set is sent to the distributed storage system by the target operator to be stopped, and is obtained from the distributed storage system by the second original target operator.

And B15, sending a second data sending instruction to the third working node, so that the third working node controls the upstream operator of the target operator to be stopped to send the data corresponding to the second key field group to the second original target operator.

Besides the second original target operator acquires the state information corresponding to the second key field group, the data corresponding to the second key field group is temporarily buffered in the upstream operator of the target operator to be stopped. Therefore, after the second original target operator acquires the state information corresponding to the second key field group, the upstream operator of the target operator needs to be stopped to send the data corresponding to the second key field group to the second original target operator.

It should be noted that the concurrency reduction strategies for each target operator to be stopped are independent of each other. That is, after B10, for M target operators to be stopped, steps B11 to B21 may be performed on the 1 st target operator to be stopped, and then the control node sends a delete instruction to the target work node to which the 1 st target operator to be stopped is distributed, so as to delete the 1 st target operator to be stopped. And then, executing the steps B11-B21 on the 2 nd target operator to be stopped, and then sending a deleting instruction to the target working node distributed by the 2 nd target operator to be stopped by the control node so as to delete the 2 nd target operator to be stopped. And sequentially processing each target operator to be stopped until the Mth target operator to be stopped is deleted. M is an integer greater than or equal to 1.

That is to say, the concurrency adjustment strategies of each target operator to be stopped are independent from each other, that is, the concurrency adjustment flows of each operator are not affected by each other.

In other embodiments of the present description, M target operators to be stopped may also be deleted at the same time.

Fig. 4 is a schematic structural diagram illustrating an operator concurrency adjustment apparatus 400 provided according to an embodiment of the present disclosure. As shown in fig. 4, the operator concurrency adjustment apparatus 400 is applied to a control node in a stream computing system, and includes:

and an operation index receiving module 410, configured to receive an operation index of an operator on the work node, where the operation index is used to characterize data processing capability of the operator.

And the concurrency adjustment strategy determining module 420 is configured to determine a concurrency adjustment strategy of a target operator that needs to be adjusted in concurrency based on the operation index of the operator.

The sending module 430 is configured to send a concurrency adjustment policy to the target working node, so that the target working node adjusts the concurrency of the target operator, where the target working node includes at least one of a working node to which the target operator is to be newly added and a working node to which the target operator is to be stopped.

In an embodiment of the present specification, the concurrency adjustment policy determining module 420 includes:

the first acquisition unit is used for acquiring the total operation index of the target operator and the total operation index of the upstream operator of the target operator.

And the first determination unit is used for determining the concurrency degree adjustment strategy of the target operator based on the total operation index of the target operator and the total operation index of the upstream operator of the target operator.

And the total operation index of the target operator is the sum of the operation indexes of all the operating operators with the same operator identification as the target operator.

The total operation index of the upstream operator of the target operator is the sum of the operation indexes of all the operating operators with the same operator identification as the upstream operator of the target operator.

In an embodiment of the present specification, the concurrency adjustment policy determining module 420 further includes:

and the second determining unit is used for obtaining the total operation index of the operator corresponding to each operator identifier based on the operation indexes of all the operators.

And the third determining unit is used for determining whether the first operator is a target operator needing concurrency adjustment or not based on the total operation index of the first operator corresponding to the operator identification and the total operation index of the first upstream operator of the first operator for each operator identification.

In an embodiment of the present specification, the operation index includes an input transaction processing amount per second TPS and an output TPS of the operator; the total operation index of the first operators is a first sum of input TPS of each first operator, and the total operation index of the first upstream operators is a second sum of output TPS of each first upstream operator.

Wherein the third determination unit is configured to:

if the difference value obtained by subtracting the first sum from the second sum is larger than a first preset TPS threshold value, determining the first operator as a target operator needing concurrency increase;

and if the difference value obtained by subtracting the second sum from the first sum is larger than a second preset TPS threshold value, determining the first operator as a target operator needing concurrency reduction.

In an embodiment of the present specification, the concurrency adjustment policy includes concurrency adjustment policies of respective mutually independent target operators to be adjusted.

The sending module 430 is configured to send the concurrency adjustment policy of the corresponding target operator to be adjusted to the target working node to which each target operator to be adjusted is distributed, so that the target working node to which the target operator to be adjusted is distributed adjusts the target operator to be adjusted.

In an embodiment of the present specification, if the concurrency adjustment policy is a concurrency increase policy, the concurrency increase policy includes the number of target operators to be newly added and identification information of target working nodes distributed by each target operator to be newly added.

In an embodiment of the present specification, the sending module 430 includes:

and the creating instruction sending unit is used for sending a creating instruction of the target operator to be newly added corresponding to each target working node distributed by the target operator to be newly added based on the identification information of each target working node, so that the target working nodes create the target operator to be newly added based on the creating instruction, create a data channel between the target operator to be newly added and the upstream operator of the target operator to be newly added and create a data channel between the target operator to be newly added and the downstream operator of the target operator.

And the first data dispatching strategy sending unit is used for sending the adjusted first data dispatching strategy of the established upstream operator of the newly added target operator to each first working node so that the first working node controls the upstream operator of the newly added target operator to dispatch data to the newly added target operator.

In an embodiment of the present specification, the sending module 430 further includes:

and the first allocation unit is used for reallocating all the key field groups corresponding to the original target operators and allocating a corresponding first key field group to each target operator to be newly added.

A first state information obtaining unit, configured to perform the following steps for each newly added target operator:

sending a first data sending stopping instruction to a second working node distributed by an upstream operator of a first original target operator, so that the second working node controls the upstream operator of the first original target operator to stop sending data corresponding to the first key field group to the first original target operator, wherein the first original target operator is a target operator to which the first key field group belongs;

and sending a first acquisition instruction to the target working nodes distributed by the newly added target operator, so that the target working nodes distributed by the newly added target operator control the newly added target operator to acquire the state information of the first key field group.

And the data distributed to the newly added target operator by the upstream operator of the newly added target operator is the data corresponding to the first key field group.

In an embodiment of the present specification, the state information of the first key field set is sent to the distributed storage system by the first original target operator, and is obtained from the distributed storage system by the newly added target operator.

In an embodiment of the present specification, the concurrency adjustment policy is a concurrency reduction policy, and the concurrency reduction policy includes the number of target operators to be stopped and identification information of target working nodes distributed by each operator to be stopped.

In an embodiment of the present specification, the sending module 430 includes:

and the second data dispatching strategy sending unit is used for sending the adjusted second data dispatching strategy of the corresponding second upstream operator to each third working node so that the third working node controls the second upstream operator to stop sending data to the target operator to be stopped. The second upstream operator is the upstream operator of the target operator to be stopped. The third working node is a working node distributed by the second upstream operator.

And the deleting instruction sending unit is used for sending a deleting instruction to the target working nodes distributed by the target operators to be stopped based on the identification information of the target working nodes distributed by the target operators to be stopped under the condition that the target operators to be stopped finish data processing, so that the target working nodes distributed by the target operators to be stopped delete the data channel between the target operators to be stopped and the upstream operator, delete the data channel between the target operators to be stopped and the downstream operator and delete the target operators to be stopped.

and the second allocation unit is used for allocating the second key field group corresponding to each target operator to be stopped to other target operators which do not need to be stopped.

A second state information obtaining unit configured to, for each operator to be stopped, perform the following steps:

sending a second obtaining instruction to a fourth working node distributed by a second original target operator, so that the fourth working node controls the second original target operator to obtain state information of a second key field group corresponding to the target operator to be stopped, wherein the second original target operator is the target operator to which the second key field group is distributed;

and sending a second data issuing instruction to the third working node, so that the third working node controls the upstream operator of the target operator to be stopped to send the data corresponding to the second key field group to the second original target operator.

In an embodiment of the present specification, the state information of the second key field set is sent to the distributed storage system by the target operator to be stopped, and is acquired from the distributed storage system by the second original target operator.

Other details of the operator concurrency adjustment apparatus according to the embodiment of the present disclosure are similar to the operator concurrency adjustment method according to the embodiment of the present disclosure described above with reference to fig. 3, and are not described again here.

The operator concurrency adjustment method and apparatus according to the embodiments of the present specification described in conjunction with fig. 3 to 4 may be implemented by an operator concurrency adjustment device. Fig. 5 is a schematic diagram illustrating a hardware structure 500 of an operator concurrency adjustment device according to an embodiment of the specification.

As shown in fig. 5, the operator concurrency adjustment device 500 includes an input device 501, an input interface 502, a central processing unit 503, a memory 504, an output interface 505, and an output device 506. The input interface 502, the central processing unit 503, the memory 504, and the output interface 505 are connected to each other through a bus 510, and the input device 501 and the output device 506 are connected to the bus 510 through the input interface 502 and the output interface 505, respectively, and further connected to other components of the operator concurrency adjustment device 500.

Specifically, the input device 501 receives input information from the outside and transmits the input information to the central processor 503 through the input interface 502; the central processor 503 processes input information based on computer-executable instructions stored in the memory 504 to generate output information, temporarily or permanently stores the output information in the memory 504, and then transmits the output information to the output device 506 through the output interface 505; the output device 506 outputs the output information to the outside of the operator concurrency adjustment device 500 for use by the user.

That is, the operator concurrency adjustment apparatus shown in fig. 5 may also be implemented to include: a memory storing computer-executable instructions; and a processor which, when executing computer executable instructions, may implement the operator concurrency adjustment method and apparatus described in connection with fig. 3-4.

It should also be noted that the exemplary embodiments mentioned in this specification describe some methods or systems based on a series of steps or devices. However, the present specification is not limited to the order of the above-described steps, that is, the steps may be performed in the order mentioned in the embodiments, may be performed in an order different from the order in the embodiments, or may be performed simultaneously.

Above, only the specific implementation manner of the present specification is provided, and it is clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the system, the module and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. It should be understood that the scope of the present disclosure is not limited thereto, and any person skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the present disclosure, and these modifications or substitutions should be covered within the scope of the present disclosure.

Claims

1. An operator concurrency adjustment method is applied to a control node in a stream computing system, wherein the stream computing system comprises the control node and at least one working node, and at least one operator is distributed on the working node, and the method is characterized by comprising the following steps:

2. The method according to claim 1, wherein determining a concurrency adjustment strategy for a target operator requiring concurrency adjustment based on the operation index of the operator comprises:

acquiring a total operation index of the target operator and a total operation index of an upstream operator of the target operator;

determining a concurrency degree adjustment strategy of the target operator based on the total operation index of the target operator and the total operation index of an upstream operator of the target operator;

the total operation index of the target operator is the sum of the operation indexes of all operating operators with the same operator identification as the target operator;

the total operation index of the upstream operator of the target operator is the sum of the operation indexes of all operating operators with the same operator identification as the upstream operator of the target operator.

3. The method of claim 2, wherein prior to said obtaining the total operating indicators of the target operator and upstream operators of the target operator, the method further comprises:

obtaining the total operation index of the operator corresponding to each operator identification based on the operation indexes of all the operators;

and for the operator corresponding to each operator identification, determining whether the first operator is a target operator needing concurrency adjustment or not based on the total operation index of the first operator corresponding to the operator identification and the total operation index of the first upstream operator of the first operator.

4. The method of claim 3, wherein the operation metrics include an operator's input amount of transactions per second, TPS, and output TPS; the total operation index of the first operator is a first sum of input TPS of each first operator, and the total operation index of the first upstream operator is a second sum of output TPS of each first upstream operator;

determining whether the first operator is a target operator needing concurrency adjustment based on a total operation index of the first operator corresponding to the operator identifier and a total operation index of a first upstream operator of the first operator, including:

and if the difference value obtained by subtracting the second sum from the first sum is larger than a second preset TPS threshold value, determining that the first operator is a target operator needing concurrency reduction.

5. The method according to claim 1, wherein the concurrency adjustment strategy comprises concurrency adjustment strategies of respective mutually independent target operators to be adjusted;

wherein the sending the concurrency degree adjustment strategy to the target working node to enable the target working node to adjust the concurrency degree of the target operator includes:

and respectively sending the concurrency degree adjustment strategy of the target operator to be adjusted corresponding to each target working node distributed by the target operator to be adjusted, so that the target working nodes distributed by the target operator to be adjusted adjust the target operator to be adjusted.

6. The method according to claim 1, wherein if the concurrency adjustment policy is a concurrency increase policy, the concurrency increase policy includes the number of target operators to be newly added and identification information of target working nodes distributed by each target operator to be newly added;

based on the identification information of each target working node, sending a creation instruction of the target operator to be newly added corresponding to each target working node distributed by the target operator to be newly added, so that the target working node creates the target operator to be newly added based on the creation instruction, creates a data channel between the target operator to be newly added and the upstream operator of the target operator to be newly added and creates a data channel between the target operator to be newly added and the downstream operator of the target operator;

sending the adjusted first data dispatching strategy of the upstream operator of the created newly-added target operator corresponding to each first working node, so that the first working node controls the upstream operator of the newly-added target operator to dispatch data to the newly-added target operator;

7. The method according to claim 6, wherein before sending the creation instruction of the target operator to be added corresponding to each target working node to which the target operator to be added is distributed, the method further comprises:

reallocating all key field groups corresponding to original target operators, and allocating a corresponding first key field group to each target operator to be newly added;

before the sending, to each first working node, the adjusted first data dispatch policy of the created upstream operator of the newly added target operator corresponding to the first working node, so that the first working node controls the upstream operator of the newly added target operator to dispatch data to the newly added target operator, the method further includes:

and executing the following steps for each newly added target operator:

sending a first acquisition instruction to a target working node distributed by the newly added target operator, so that the target working node distributed by the newly added target operator controls the newly added target operator to acquire state information of the first key field group;

8. The method of claim 7, wherein the state information of the first key field set is sent to the distributed storage system by the first original target operator and obtained from the distributed storage system by the newly added target operator.

9. The method according to claim 1, wherein the concurrency adjustment strategy is a concurrency reduction strategy, and the concurrency reduction strategy includes the number of target operators to be stopped and identification information of target working nodes distributed by each operator to be stopped;

sending the adjusted second data dispatching strategy of the corresponding second upstream operator to each third working node, so that the third working node controls the second upstream operator to stop sending data to the target operator to be stopped, wherein the second upstream operator is the upstream operator of the target operator to be stopped; the third working node is a working node distributed by the second upstream operator;

for each target operator to be stopped, under the condition that the target operator to be stopped completes data processing, sending a deleting instruction to the target working nodes distributed by the target operator to be stopped based on identification information of the target working nodes distributed by the target operator to be stopped, so that the target working nodes distributed by the target operator to be stopped delete a data channel between the target operator to be stopped and an upstream operator of the target operator to be stopped, delete a data channel between the target operator to be stopped and a downstream operator of the target operator to be stopped, and delete the target operator to be stopped.

10. The method of claim 9, wherein prior to said sending each third worker node the adjusted second data dispatch policy for its corresponding second upstream operator, the method further comprises:

distributing the second key field group corresponding to each target operator to be stopped to other target operators which do not need to be stopped;

after sending the adjusted second data dispatching strategy of the corresponding second upstream operator to each third working node and before sending a deletion instruction to the target working node distributed by the target operator to be stopped, the method further includes:

for each operator to be stopped, performing the following steps:

11. The method of claim 10, wherein the state information of the second key field set is sent to the distributed storage system by the target operator to be stopped and is obtained from the distributed storage system by the second original target operator.

12. An operator concurrency adjusting device applied to a control node in a stream computing system, wherein the stream computing system comprises the control node and at least one working node, and at least one operator is distributed on the working node, the device is characterized by comprising:

13. An operator concurrency adjustment device, comprising: a processor and a memory storing computer program instructions;

the processor, when executing the computer program instructions, implements the operator concurrency adjustment method of any one of claims 1-11.