CN113391907A

CN113391907A - Task placement method, device, equipment and medium

Info

Publication number: CN113391907A
Application number: CN202110714071.8A
Authority: CN
Inventors: 戴晓; 赵曦滨; 万海; 黄潇; 刘悦
Original assignee: China Bond Jinke Information Technology Co ltd; Tsinghua University
Current assignee: China Bond Jinke Information Technology Co ltd; Tsinghua University
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2021-09-14

Abstract

The embodiment of the invention discloses a task placement method, a task placement device, task placement equipment and a task placement medium. The method comprises the following steps: generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators; determining task embedding vectors corresponding to all task nodes in a task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes in a resource graph; the resource map is a fully-connected undirected graph. And based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to the task embedding vector and the resource embedding vector. The task placement model provided by the embodiment of the invention is suitable for heterogeneous resources, and both the throughput attribute and the delay attribute can meet the preset requirements in the actual flow processing operation process by adopting the model.

Description

Task placement method, device, equipment and medium

Technical Field

The embodiment of the invention relates to the technical field of data stream processing, in particular to a task placement method, a device, equipment and a medium.

Background

In a variety of different industrial fields, there are numerous tasks that require the use of large amounts and different types of data to make data-intensive decisions. Data is generated from streaming events such as financial transactions, sensor measurements, and the like. To be able to extract valuable information from such huge amounts of data in a timely manner, data stream processing frameworks and applications are becoming popular, which are able to continuously process unbounded data streams of arbitrary size in a near real-time manner.

The computational process of a data stream processing application is generally described by a DAG (Directed Acyclic Graph). Each node in the DAG represents an operator to perform some specific operation (e.g., mapping, filtering). The data that arrives continuously is processed by the operator and is transmitted from the source node through the directed edges in the DAG to the sink node. To fully exploit parallelism in DAGs, stream processing applications are typically deployed into distributed clusters, and in this scenario, a key issue is how to decide on which compute node to place and process each operator in the stream processing application, and to be able to optimize some relevant quality attributes. This problem is called the operator placement problem. This has long been a problem because stream processing applications typically do not stop running after they are deployed and it is difficult to make runtime adjustments without impacting performance. However, obtaining optimal operator placement is a difficult problem for NP (Non-deterministic Polynomial complex). Therefore, a number of heuristics have been devised that can solve the operator placement problem in an acceptable time. Generally, heuristics are designed manually based on the characteristics of a particular problem.

At present, the use of deep reinforcement learning to train heuristic methods becomes a research focus, and the methods based on the deep reinforcement learning are included. The current method based on deep reinforcement learning assumes that CPUs (Central Processing units), memories, networks, and the like of all resources are homogeneous, but due to continuous deployment of stream Processing applications, the actually available resources are actually heterogeneous, and the amount of available resources is also changing continuously. In this case, the task placement scheme obtained by the deep reinforcement learning-based method is not applicable to heterogeneous resources.

Disclosure of Invention

Embodiments of the present invention provide a task placement method, device, apparatus, and medium, so that a task placement scheme is applicable to heterogeneous resources, and when the scheme is deployed in a real cluster, a throughput value with higher accuracy can be obtained.

In a first aspect, the present invention provides a task placement method, applied to a stream processing task, including:

generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;

determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes of a processing unit in a resource graph; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;

based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to task embedding vectors and resource embedding vectors so as to determine a deployment mode of the flow processing operation;

the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.

In a second aspect, an embodiment of the present invention further provides a task placement device, which is applied to a stream processing task, and includes:

the task graph generating module is configured to generate a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;

the encoding module is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;

the decoding module is configured to determine a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector based on a cyclic neural network in a preset task placement model so as to determine a deployment mode of the flow processing operation;

In a third aspect, an embodiment of the present invention further provides a computing device, including:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program codes stored in the memory to execute the task placement method provided by any embodiment of the invention.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the task placement method provided in any embodiment of the present invention.

According to the technical scheme provided by the embodiment of the invention, the task graph corresponding to the stream processing operation can be generated according to the parallelism of each operator in the stream processing operation and the connection mode among the operators. The task graph and the resource graph can be coded through a graph neural network in a preset task placement model, task embedding vectors corresponding to all task nodes in the task graph are determined, and resource embedding vectors corresponding to all processing unit slot nodes in the resource graph are determined. The resource map in the embodiment of the invention is a fully-connected undirected graph, and in the training process of the preset task placement model, the CPU calculation power attribute and the available memory attribute of each slot node are considered when the iteration of the embedding vector is carried out, so that the preset task placement model provided by the embodiment can be suitable for heterogeneous resources. Through a circulating neural network in a preset task placement model, slot nodes corresponding to each task node can be determined according to the task embedding vectors and the resource embedding vectors, and therefore the deployment mode of the outflow processing operation is determined. In addition, because the throughput and the delay attribute are considered in the training process of the preset task placement model, the throughput attribute and the delay attribute can meet the preset requirements in the actual stream processing job process of the preset task placement model provided by the embodiment.

The innovation points of the embodiment of the invention comprise:

1. when the embedding vector is generated by modeling the resource graph by using the graph neural network, the CPU computing power attribute and the available memory attribute of each slot node are considered, so that the trained task placement model is suitable for heterogeneous resources, and the method is one of the innovation points of the embodiment of the invention.

2. When the throughput is estimated in the running process, the scheme of calculating the throughput by adopting iteration recovery, resource distribution and application backpressure mechanisms realizes the offline estimation of the throughput under the condition of not deploying practical applications. The setting can not only accelerate the training process of the model, but also enable the throughput and the delay of the trained task placement model to respectively meet the preset requirements in the actual running process, and is one of the innovation points of the embodiment of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1a is a flowchart of a training method for a data stream processing task placement model according to an embodiment of the present invention;

FIG. 1b is a block diagram of an encoding-decoding model according to an embodiment of the present invention;

FIG. 1c is a flowchart of an off-line throughput estimation algorithm according to an embodiment of the present invention;

FIG. 1d is a screenshot of the relationship between the estimated and actual values of throughput during the experiment;

FIG. 1e is a screenshot of the error between the estimated throughput and the actual value during the experiment;

fig. 1f is a schematic structural diagram of a throughput testing topology according to an embodiment of the present invention;

fig. 1g is a screenshot of an experimental effect of a throughput test corresponding to a throughput test topology according to an embodiment of the present invention;

fig. 1h is a schematic structural diagram of a word counting topology according to an embodiment of the present invention;

fig. 1i is a screenshot of an experimental effect of a throughput test corresponding to a word count topology according to an embodiment of the present invention;

fig. 1j is a schematic structural diagram of a log stream processing topology according to an embodiment of the present invention;

fig. 1k is an experimental effect screenshot of a throughput test corresponding to a log stream processing topology according to an embodiment of the present invention;

fig. 2a is a flowchart of a data stream processing task placement method according to a second embodiment of the present invention;

FIG. 2b is a flowchart of a stream processing job and a corresponding task chart generated according to the second embodiment of the present invention;

fig. 3 is a block diagram of a data flow processing task placing device according to a third embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.

It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

The embodiment of the invention discloses a method, a device, equipment and a medium for placing a data stream processing task. For the purpose of clearly and clearly describing the content of the embodiments of the present invention, the following briefly introduces the working principle of the present invention:

1. stream processing model

A stream processing job may be represented as a Directed Acyclic Graph (DAG), by G_job＝(V_job,E_job) To indicate. Wherein, V_jobRepresenting a set of all operators, E_jobRepresenting the set of all edges. Each job node j ∈ V_jobIs an operator that performs a specific operation (e.g., mapping, filtering, aggregation). For each job node j ∈ V_jobThere is a user-defined parallelism j.p that represents the number of subtasks for job node j. All parallel subtasks t_j,k(

k

1, 2.., j.p) is created and executed.

Each side (j)_u,j_v)∈E_jobConnected with a working node j_uAnd job node j_vIndicating that streaming data will be from job node j_uFlow direction operation node j_vWherein u and v are node numbers. A tuple (metaancestor) may be used to describe a data item flowing in the DAG. For one edge (j)_u,j_v)∈E_jobOperation node j_uThe task in (1) will communicate with the task in the job node. There are two ways of connection: direct connections and broadcast connections. The direct connection will only be at the job node j_uWith the same degree of parallelism as the job node, i.e. j_u.p＝j_v.pTime of flight, each task node t_u,i∈j_uWill only communicate with one task node t_v,i∈j_vAre connected. A broadcast connection means that each task node t_u,i∈j_uWill communicate with all job nodes j_vAre connected. It is assumed that each task will send tuples evenly to its downstream tasks. According to the definition of parallelism and connection mode, based on a given job graph, a corresponding task hierarchy graph can be constructed, and G is used for_task＝(V_task,E_task) To indicate. Wherein, V_taskRepresenting a collection of all task nodes, E_taskRepresenting the set of all edges. Task Placement attention is directed to task hierarchy graph G_task. Each task t ∈ V_taskIs the smallest unit that can be placed on a resource node and can be described by two parameters: CPU (CPU utilization) and t.mem (required memory). The true CPU utilization value is very difficult to estimate, so it is described here using relative values. Mem represents the maximum memory value that task t may consume, which is a true value.

2. Resource model

A resource can be represented as a fully-connected undirected graph, through G_res＝(V_res,E_res) And (4) showing. Wherein, V_resIs a set of slots that represent the smallest unit that can be used to place a task. E_resEach edge in (a) represents a logical connection between two slots. Each slot has two attributes: s.cpu (CPU computing power) and s.mem (available memory).

3. Task placement problem

Consider a pair of task level graphs G_taskAnd resource map G_resThe DSP (Digital Signal processing) scheduler needs a placement solution P: v_task→V_resIs denoted by V_taskAnd V_resA suitable mapping between. In particular, it is necessary to assign each task t_i∈V_taskPut to a specific slot node s_j∈V_resWhile simultaneously optimizing certain quality attributes, such as throughput attributes and delay attributes. In an embodiment of the invention, throughput is the quality attribute of most interest. In other words, embodiments of the present invention need to find a placement scheme that can fully utilize all available resources and support DSP applications with high throughput requirements.

Since the task placement problem is NP-hard, this means that no one algorithm can find the optimal solution in polynomial time anymore. In order to deploy DSP applications efficiently, a heuristic approach is needed that can find a better solution in the feasible time.

In fact, the tasks in the task graph have different CPU utilization rates and complex dependencies. If the scheduler places tasks on a small subset of the set of all slots, the communication delay becomes small, but this adversely affects throughput since a single slot needs to perform many tasks. Conversely, if the scheduler distributes the tasks evenly across all slots, each task will have sufficient resources, but the communication delay will become significant. Even allocation is not a good choice in view of resource heterogeneity. Therefore, an ideal scheduler should consider the information of both task and resource maps, and make a good trade-off between different quality attributes, and efficiently give a suitable allocation scheme. The stream processing task placement method provided by the embodiment of the invention fully considers the information of the task graph and the resource graph. The task placement scheme, when deployed into a real cluster, can fully utilize all available resources and support DSP applications with high throughput requirements.

The following describes in detail a specific implementation process of the stream processing task placement method provided by the embodiment of the present invention from a training phase and an application phase of a model, respectively.

Example one

Fig. 1a is a flowchart of a training method for a data stream processing task placement model according to an embodiment of the present invention, where the task placement model is applicable to a stream processing task placement process to obtain a placement scheme in which a throughput attribute and a delay attribute meet preset requirements. As shown in fig. 1a, the training method includes:

s110, acquiring an initial task embedding vector corresponding to each task node in the task graph, an initial resource embedding vector corresponding to each slot node of the processing unit in the resource graph, and a randomly generated task placement array.

The length of the task placement array represents the number of tasks in the task graph, and the value in the array is the number of the processing unit (slot).

In this embodiment, graphs with different structures are perceived by using a graph neural network, including a task graph (task graph) and a resource graph (resource graph). The graph neural network encodes the graph information into a set of initial embedding (embedding) vectors. The Graph neural Network may be a Graph Convolutional Network Graph (GCN).

And S120, respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector.

In this embodiment, a Graph neural network algorithm (Graph SAmple and aggreGatE) is adopted to iterate the embedding vector of each current node vAnd (4) updating. Wherein the initial characteristic of the current node v is defined as f_vIts embedding vector at the k step is

When k is equal to 0, the first step is,

encoding task graph G due to different graph types_taskAnd resource map G_resThe process of (a) may be different.

In this embodiment, for one task graph G_taskSince the upstream and downstream neighbor nodes of the current node v will have different influences on the generated placement scheme, the upstream and downstream neighbor nodes of the current node v will be aggregated respectively. Here, the sets of upstream and downstream nodes of the current node v are respectively denoted as N_u(v) And N_d(v) In that respect With N_u(v) For example, for each current node, the upstream node u ∈ N_u(v) Its embedding vector at the k step is

Is calculated by the following formula

Wherein the content of the first and second substances,

is the intermediate variable that is the variable between,

is a parameter matrix, the values in the matrix are all model parameters, and the formula is to multiply the parameter matrix by the input vector

Then calculated by the activation function ReLUA target value.

In all of

After calculation, the upstream perspective embedding of v is updated using the following equation:

wherein the content of the first and second substances,

is a parameter matrix, the values in the matrix are all model parameters, [:]indicating that the two vectors are to be connected.

Similarly, downstream view of v

Will use

And

and (4) calculating. Subsequently, v is at step k +1

Is the connection of the upstream and downstream viewing angles embedding:

in this embodiment, the resource map is an undirected graph G with edge attribute information (communication delay)_res. For the resource graph, in order to sense the edge attribute of the resource graph, embedding of a resource node is connected with the edge attribute during the first transformation, which is embodied by the following formula:

because the resource graph is a fully-connected undirected graph, there is no difference between upstream and downstream neighbors, all other node vectors not including v need to be averaged during aggregation, and the formula after the colon means averaging, specifically, the aggregation formula can be adjusted as follows:

after the K iterations, the sample task embedding vectors corresponding to each task node in the task graph and the sample resource embedding vectors corresponding to all resource nodes in the resource graph are obtained through calculation. The information of the whole task graph can be obtained by calculating after entering embedding of each task into a full connection layer and a maximum pooling layer. Fig. 1b is a block diagram of an encoding-decoding model according to an embodiment of the present invention. As shown in fig. 1b, the encoded results of the task map and the resource map are input into a recurrent neural network for decoding. Hereinafter, each task node t in the task graph is_iExpressed as an embedding vector of

Each slot node s in the resource map_iExpressed as an embedding vector of

And S130, sequencing the task nodes by adopting a topological sequencing method.

In this embodiment, the task placement is targeted to place each task t_i∈V_taskPut into a specific slots_i∈V_resThe above. For a task t_iIn other words, the slot to which the upstream task is assigned will be for t_iThe placement of (a) has a large impact. Thus, the present embodiment guarantees t by topologically ordering all tasks_iAt t_iAll previously placed. The decoder will decide for each task in order according to the result of the topological orderingAnd (4) placing.

And S140, for any sequenced current task node, calculating a current context vector corresponding to the current task node according to the sample task embedding vector corresponding to the current task node and the sample resource embedding vectors corresponding to the upstream nodes of the current task node respectively.

In this embodiment, when calculating a current context vector corresponding to a current task node, an integrating operation may be performed on resource embedding vectors corresponding to each upstream node of the current task node, so that all upstream nodes correspond to one integrated resource embedding vector; wherein the integrating operation comprises: taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension;

adding the integrated resource embedding vector and the task embedding vector of the current task child node; and inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node.

Specifically, as shown in FIG. 1b, S is applied to all upstream nodes^(up)(t_i) After max operation, with

Adding operation is carried out, the result of the adding operation is input into the attention layer, and the current context vector corresponding to the current task node is obtained

In this embodiment, the placement solution finally generated by the decoder is denoted as P, and the task placement problem is described formally by the following formula:

wherein S is^(up)(t_i) Finger task (t)_i) Is placed toP represents the probability value that the task node is assigned to the slot node.

Representing a task (t)_i) The corresponding slot node.

In this embodiment, a GRU (Gate Current Unit) can be used to learn a state representation

Thereby memorizing the dependency relationship between tasks. h is_tiCode S^(up)(t_i) And G_taskThe information of (1). As shown in FIG. 1b, at each step, a vector is input

Will interact with S^(up)(t_i) Adding the embedding vectors of the middle slot to enhance the sum t_iAn understanding of the relative placement. Unlike the prior art, the present embodiment considers only t_iIs received from the upstream task. The results over the GRU are expressed as:

to predict on which slot t is placed_iIn the embodiment, the attention layer of the recurrent neural network is firstly adopted to obtain the intermediate vector c_i. In particular, the intermediate vector c_iCan be calculated by the following formula:

wherein the content of the first and second substances,

indicating the embedding vector of the task node at the ith step

An attention score obtained at the passage through the attention layer; a is_ijDenotes e_ijFractional values obtained through softmax layer.

By means of the intermediate vector, a context vector can be obtained

Specifically, it can be calculated by the following formula:

wherein, W^QThe method is characterized in that the method is a coefficient matrix, elements in the coefficient matrix are parameters of a task placement model, and the parameters are determined after the task placement model is trained.

S150, determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.

Specifically, step S150 includes:

inputting the current context vector and the sample resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain a probability value p of the current task node allocated to each slot node, which can be specifically obtained by the following formula:

wherein the intermediate parameter

d_kIs the dimension, k, of the target slot embedding vector_jIs a generalized representation, referred to as a slot embedding vector in this embodiment.

And for the probability value of the current task node distributed to each slot node, taking the slot node with the maximum probability value as a target slot node corresponding to the current task node. After all task nodes determine corresponding target slot nodes, a mapping relation between each task node and the corresponding target slot node is established.

And S160, calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node.

Where for a tuple, the delay is defined as the sum of all inter-slot communication delays that pass from source to sink. There may be many different paths from the source node to the sink node, so in this embodiment, the final delay is calculated as: average of the delays over all paths from the source node to the sink node.

Throughput represents the number of tuples that a particular DSP job can handle per second. For most DSP applications, throughput is the most important quality attribute. But unlike latency, it is very difficult to estimate throughput without deploying a job. In the present embodiment, an offline throughput estimation algorithm is provided without deploying a stream processing job, please refer to fig. 1c, which specifically includes:

s161, when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to the corresponding task node;

and S162, calculating the current throughput of each task node according to the sequence of topological sorting.

And S163, traversing each task node according to the sequence of the reverse topology sorting, and recovering the computing power corresponding to the target slot corresponding to each task node based on a back pressure mechanism.

And S164, redistributing the calculated force corresponding to each target slot to the corresponding task node until the throughput of the source node is converged, and taking the throughput corresponding to the source node as the throughput in the operation process.

The specific allocation mode of the calculation force corresponding to each target slot can be performed according to the utilization rate of the CPU.

The estimation algorithm of throughput and delay provided by the embodiment can not only accelerate the training process of the task placement model, but also guide the actual deployment of the flow processing task. The main computational logic can accommodate different deployment scenarios. A plurality of different placement schemes are generated through the task placement model, and an optimal placement scheme can be selected for final deployment by adopting the estimation algorithm provided by the embodiment.

Further, in order to verify that the throughput estimation algorithm provided by the present embodiment is applicable to heterogeneous DSP applications and resources, the present embodiment generates 100 job graphs as verification data, where the number of job nodes ranges from 1 to 10, the parallelism of each job node ranges from 1 to 6, and the job graphs are deployed into a random subset of all slots in the cluster.

The number of cycles can be a measure of the CPU utilization of different tasks, which can be controlled in a user-defined function through Flink's DataStream API.

Fig. 1d is a screenshot of the relationship between the estimated value and the actual value of the throughput during the experiment, and fig. 1e is a screenshot of the error between the estimated throughput and the actual value during the experiment. As shown in fig. 1d, a clear linear relationship is shown between the estimated value and the actual value, and the relative throughput value can be converted into the actual value by using a linear regression method, and then the error is calculated. As shown in fig. 1e, the absolute value error does not exceed 10% for 78% of the test cases. The maximum absolute value error is 19%. The average absolute value error is 6.7% for all test cases, which means that the throughput estimation tool provided by the present embodiment is feasible for estimating the quality of different placement solutions without deploying an application.

And S170, determining a target reward value according to the calculation results of the throughput and the delay.

Wherein the target bonus value is a linear combination of the throughput bonus value and the delay bonus value, and can be represented by the following formula:

wherein, P is a placing scheme, namely the determined mapping relation between each task node and the corresponding target slot node; λ isWith the reward factor configured according to the quality attribute requirements. The setting of the prize value is very flexible and scalable and can be easily extended to other quality attributes. r is_delay(p) a reward value representing the calculated delay using the placement scheme provided by embodiments of the present invention; r is_throughtput(p) a reward value representing the throughput calculated using the placement scheme provided by embodiments of the present invention. Specifically, the reward value of the delay and the reward value of the throughput can be calculated by the following formulas respectively:

wherein, delay_pRepresents a first delay value calculated by using the placement scheme provided by the embodiment of the invention; through cpu_pThe first throughput value calculated by the placement scheme provided by the embodiment of the invention is shown; delay_QThe second delay average value is obtained by adopting a heuristic method provided by the prior art; through cpu_QIs the second throughput average obtained using the heuristic provided by the prior art. In this example, delay is adopted_QAnd through ughput_QTo avoid the drawback that the task placement model only improves on a single heuristic approach, rather than optimizing the set target based on actual quality attributes (such as throughput and latency, etc.).

And S180, obtaining a preset task placement model when the target reward value reaches convergence.

The preset task placement model associates each task node in the task graph with a slot node in the resource graph.

In the practical application process, the task placement model provided by the embodiment can find a relatively suitable placement scheme for the graph with the complex structure. Specifically, the model provided by the present embodiment can be tested using three different topologies.

Fig. 1f is a schematic structural diagram of a throughput testing topology according to an embodiment of the present invention, and fig. 1g is a screenshot of an experimental effect of a throughput test corresponding to the throughput testing topology according to the embodiment of the present invention. As shown in fig. 1f, the throughput testing topology is a topology having a source node, an equivalent node and a sink node. The source node will continue to generate random strings of fixed 10K size as tuples. The equivalent node will send the string to the sink node intact. The sink node increments the counter by 1 each time it receives a tuple. As shown in FIG. 1g, the task placement model provided by this embodiment improves the throughput by at least 8.9%, 10% and 47% over the prior art schedulers I-Storm, Flink-even and Flink, respectively. Therefore, the task placement model provided by the present embodiment has significant advantages.

Fig. 1h is a schematic structural diagram of a word counting topology according to a first embodiment of the present invention, and fig. 1i is a screenshot of an experimental effect of a throughput test corresponding to the word counting topology according to the first embodiment of the present invention; as shown in FIG. 1h, the word count topology includes a source node, a split node, a count node, and a sink node. Which is used to count the number of occurrences of each word in one or more files. The source node will send a list of words of random length, one line at a time (randomly generated between 1 and 1000 in length). The splitting node will split each line into words and the counting node will increment the counter based on the input word and send the result to an empty sink node.

As shown in fig. 1i, for the word count topology, the curve fluctuates slightly due to the random length of the input data. The throughput difference between Flink (2.6K) and Flink-even (2.3K) is not large, and I-Storm (3.9K) is better than the two. The task placement model provided by the embodiment can support 4.3K throughput, and compared with I-Storm, Flink and Flink-even are respectively improved by at least 10%, 63% and 87%. Therefore, the task placement model provided by the present embodiment has significant advantages.

Fig. 1j is a schematic structural diagram of a log stream processing topology according to a first embodiment of the present invention, and fig. 1k is a screenshot of an experimental effect of a throughput test corresponding to the log stream processing topology according to the first embodiment of the present invention; as shown in fig. 1j, the source node sends one row of log records at a time. The rule application node performs a rule-based analysis and sends a log entry. The log entry is sent to two operators, which perform indexing and counting operations, respectively.

For log stream processing topologies, this topology is more complex and it is not easy to find a suitable solution. As shown in fig. 1j, the throughput performance of the different methods varies greatly. The task placement model provided by the embodiment has the highest throughput (66K), which is improved by at least 31% relative to I-Storm (50K), 75% relative to Flink-even (37K) and 143% relative to Flink (27K). The task placement model provided by the present embodiment has significant advantages.

According to the technical scheme provided by the embodiment, when the embedding vector is generated by modeling the resource graph by using the graph neural network, the CPU computing power attribute and the available memory attribute of each slot node are considered, so that the trained task placement model is suitable for heterogeneous resources. And in the training process of the task placement model, the throughput and delay in the operation process are considered. When the throughput is estimated in the running process, the scheme of calculating the throughput by adopting iterative recovery, resource allocation and application backpressure mechanisms realizes the offline estimation of the throughput under the condition of not deploying practical application. The setting can not only accelerate the training process of the model, but also enable the trained task placement model to enable the throughput and the delay to respectively meet the preset requirements so as to be used for guiding the actual deployment of the follow current processing operation.

Example two

Fig. 2a is a flowchart of a data stream processing task placement method according to a second embodiment of the present invention, which can be applied in a process of deploying a stream processing job. The method may be performed by a task placement device, which may be implemented by means of software and/or hardware, as shown in fig. 2a, and comprises:

and S210, generating a task graph corresponding to the flow processing job according to the parallelism of each operator in the flow processing job and the connection mode among the operators.

Specifically, fig. 2b is a flow processing job map and a corresponding generated task map according to the second embodiment of the present invention. As shown in FIG. 2b, the stream processing job graph includes a source operator, a map operator, an aggregate operator, a filter operator, and a sink operator. And according to the parallelism (represented by p in fig. 2 b) of each operator, the obtained nodes in the task graph comprise the number of source nodes, mapping nodes, aggregation nodes, filtering nodes and sink nodes. The connection mode between the nodes is determined by the connection mode between the operators, as shown in fig. 2b, the solid line arrow represents the broadcast connection, and the dotted line arrow represents the direct connection.

S220, determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes in the resource graph.

The resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute.

In this embodiment, the preset task placement model includes an encoding portion and a decoding portion, wherein the encoding portion includes a graph neural network, and the graph neural network may be preferably adopted as the GCN. Based on the graph neural network, after the initial embedding vectors in the task graph and the resource graph are iteratively updated, the task embedding vectors and the resource embedding vectors are obtained.

And S230, based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to the task embedding vectors and the resource embedding vectors so as to determine a deployment mode of flow processing operation.

In this embodiment, the preset task placement model further includes a decoding portion, where the decoding portion may be implemented by a recurrent neural network, and the recurrent neural network may take an embedding vector corresponding to the task graph and the resource graph as an input, and may establish a mapping relationship between each task node in the task graph and a slot in the resource graph after passing through the GRU unit, the attention layer, and the Softmax layer. Please refer to the contents of the above embodiments, which will not be described herein again. After the preset task placement model is trained, each task node in the task graph can be associated with a slot in the resource graph, and the throughput attribute and the delay attribute in the running process of the stream processing job can meet the preset requirements.

In this embodiment, based on a recurrent neural network in a preset task placement model, according to a task embedding vector and a resource embedding vector, a slot node corresponding to each task node is determined, which may specifically include:

sequencing each task node by adopting a topological sequencing method; for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to each upstream node of the current task node; and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.

Specifically, according to the task embedding vector corresponding to the current task node and the resource embedding vector corresponding to each upstream node of the current task node, calculating the current context vector corresponding to the current task node, including:

integrating the resource embedding vectors corresponding to each upstream node of the current task node respectively, so that all the upstream nodes correspond to one integrated resource embedding vector; adding the integrated resource embedding vector and the task embedding vector of the current task child node; and inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node. The calculation formula of the specific context vector is the same as the calculation formula of the context vector corresponding to the above embodiment in the model training process, and is not described here again.

Wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.

Specifically, determining a probability value of each slot node allocated to the current task node according to the current context vector includes:

and inputting the current context vector and the resource encoding vectors corresponding to all the slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node, wherein the calculation formula of the specific probability value is the same as the calculation formula of the corresponding probability value in the model training process of the embodiment, and the description is omitted here.

According to the technical scheme provided by the embodiment, the task graph corresponding to the stream processing job can be generated according to the parallelism of each operator in the stream processing job and the connection mode among the operators. The task graph and the resource graph can be coded through a graph neural network in a preset task placement model, task embedding vectors corresponding to all task nodes in the task graph are determined, and resource embedding vectors corresponding to all processing unit slot nodes in the resource graph are determined. The resource map in the embodiment of the invention is a fully-connected undirected graph, and in the training process of the preset task placement model, the CPU calculation power attribute and the available memory attribute of each slot node are considered when the iteration of the embedding vector is carried out, so that the preset task placement model provided by the embodiment can be suitable for heterogeneous resources. Through a circulating neural network in a preset task placement model, slot nodes corresponding to each task node can be determined according to the task embedding vectors and the resource embedding vectors, and therefore the deployment mode of the outflow processing operation is determined. In addition, because the throughput and the delay attribute are considered in the training process of the preset task placement model, the throughput attribute and the delay attribute can meet the preset requirements in the actual stream processing job process of the preset task placement model provided by the embodiment.

EXAMPLE III

Fig. 3 is a block diagram of a data stream processing task placing device according to a third embodiment of the present invention, where the device includes: a task graph generation module 310, an encoding module 320, and a decoding module 330; wherein the content of the first and second substances,

the task graph generating module 310 is configured to generate a task graph corresponding to a stream processing job according to the parallelism of each operator in the stream processing job and the connection mode between the operators;

the encoding module 320 is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;

a decoding module 330 configured to determine, based on a recurrent neural network in a preset task placement model, a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector, so as to determine a deployment manner of the stream processing job;

Optionally, the encoding module specifically includes:

a sorting unit configured to: sequencing each task node by adopting a topological sequencing method;

a context vector calculation unit configured to: for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to upstream nodes of the current task node respectively;

a target slot node determination unit configured to: and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.

Optionally, the context vector calculating unit is specifically configured to:

integrating the resource embedding vectors respectively corresponding to each upstream node of the current task node, so that all the upstream nodes correspond to one integrated resource embedding vector;

adding the integrated resource embedding vector and the task embedding vector of the current task child node;

inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node;

Optionally, the target slot node determining unit is specifically configured to:

and inputting the current context vector and the resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node.

Optionally, the preset task placement model is obtained by training in the following manner:

acquiring an initial task embedding vector corresponding to each task node in a task graph, an initial resource embedding vector corresponding to each slot node of a processing unit in a resource graph, and a randomly generated task placement array; the length of the task placement array represents the number of tasks in the task graph, and the value in the array is the slot number;

respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector;

sequencing each task node by adopting a topological sequencing method;

for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a sample task embedding vector corresponding to the current task node and sample resource embedding vectors corresponding to each upstream node of the current task node;

determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node;

calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node;

determining a target reward value according to the calculation results of the throughput and the delay respectively; wherein the target reward value is a linear combination of a throughput reward value and a delay reward value;

and when the target reward value reaches convergence, obtaining the preset task placement model, wherein the preset task placement model enables each task node in the task graph to be associated with the slot node in the resource graph.

Optionally, the throughput is calculated as follows:

when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to a corresponding task;

calculating the current throughput of each task node according to the sequence of topological sorting;

traversing each task node according to the sequence of reverse topological sorting, and recovering the computing power corresponding to the target slot node corresponding to each task node based on a back pressure mechanism;

and redistributing the calculated force corresponding to each target slot node to the corresponding task node until the throughput of the source node reaches convergence, and taking the throughput corresponding to the source node as the throughput in the operation process.

Optionally, the delay is calculated in the following manner: the average of the delays on all paths from the source node to the sink node in the task node.

The task placement device provided by the embodiment of the invention can execute the task placement method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Technical details that are not described in detail in the above embodiments may be referred to a task placement method provided in any embodiment of the present invention.

Example four

Referring to fig. 4, fig. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention. As shown in fig. 4, the computing device may include:

a memory 701 in which executable program code is stored;

a processor 702 coupled to the memory 701;

wherein, the processor 702 calls the executable program code stored in the memory 701 to execute the task placement method provided by any embodiment of the present invention.

The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute a task placement method provided by any embodiment of the invention.

In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.

In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.

It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.

Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.

Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A task placement method is applied to a stream processing task, and is characterized by comprising the following steps:

2. The method of claim 1, wherein determining a slot node corresponding to each task node according to a task embedding vector and a resource embedding vector based on a recurrent neural network in a preset task placement model comprises:

sequencing each task node by adopting a topological sequencing method;

for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to upstream nodes of the current task node respectively;

and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.

3. The method of claim 2, wherein calculating the current context vector corresponding to the current task node according to the task embedding vector corresponding to the current task node and the resource embedding vectors corresponding to the upstream nodes of the current task node respectively comprises:

4. The method of claim 2, wherein determining probability values assigned to the slot nodes by the current task node according to the current context vector comprises:

5. The method of claim 1, wherein the preset task placement model is trained by:

sequencing each task node by adopting a topological sequencing method;

determining a target reward value according to the calculation results of the throughput and the delay; wherein the target reward value is a linear combination of a throughput reward value and a delay reward value;

6. The method of claim 5, wherein the throughput is calculated by:

when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to the corresponding task node;

7. The method of claim 6, wherein the delay is calculated by: the average of the delays on all paths from the source node to the sink node in the task node.

8. A task placement device applied to a stream processing task, comprising:

9. A computing device, comprising:

a memory storing executable program code;

a processor coupled with the memory;

the processor calls the executable program code stored in the memory to perform the placement method of the task according to any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a placement method for a task according to any one of claims 1 to 7.