CN113391907A - Task placement method, device, equipment and medium - Google Patents

Task placement method, device, equipment and medium Download PDF

Info

Publication number
CN113391907A
CN113391907A CN202110714071.8A CN202110714071A CN113391907A CN 113391907 A CN113391907 A CN 113391907A CN 202110714071 A CN202110714071 A CN 202110714071A CN 113391907 A CN113391907 A CN 113391907A
Authority
CN
China
Prior art keywords
task
node
graph
resource
slot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110714071.8A
Other languages
Chinese (zh)
Inventor
戴晓
赵曦滨
万海
黄潇
刘悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Bond Jinke Information Technology Co ltd
Tsinghua University
Original Assignee
China Bond Jinke Information Technology Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Bond Jinke Information Technology Co ltd, Tsinghua University filed Critical China Bond Jinke Information Technology Co ltd
Priority to CN202110714071.8A priority Critical patent/CN113391907A/en
Publication of CN113391907A publication Critical patent/CN113391907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The embodiment of the invention discloses a task placement method, a task placement device, task placement equipment and a task placement medium. The method comprises the following steps: generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators; determining task embedding vectors corresponding to all task nodes in a task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes in a resource graph; the resource map is a fully-connected undirected graph. And based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to the task embedding vector and the resource embedding vector. The task placement model provided by the embodiment of the invention is suitable for heterogeneous resources, and both the throughput attribute and the delay attribute can meet the preset requirements in the actual flow processing operation process by adopting the model.

Description

Task placement method, device, equipment and medium
Technical Field
The embodiment of the invention relates to the technical field of data stream processing, in particular to a task placement method, a device, equipment and a medium.
Background
In a variety of different industrial fields, there are numerous tasks that require the use of large amounts and different types of data to make data-intensive decisions. Data is generated from streaming events such as financial transactions, sensor measurements, and the like. To be able to extract valuable information from such huge amounts of data in a timely manner, data stream processing frameworks and applications are becoming popular, which are able to continuously process unbounded data streams of arbitrary size in a near real-time manner.
The computational process of a data stream processing application is generally described by a DAG (Directed Acyclic Graph). Each node in the DAG represents an operator to perform some specific operation (e.g., mapping, filtering). The data that arrives continuously is processed by the operator and is transmitted from the source node through the directed edges in the DAG to the sink node. To fully exploit parallelism in DAGs, stream processing applications are typically deployed into distributed clusters, and in this scenario, a key issue is how to decide on which compute node to place and process each operator in the stream processing application, and to be able to optimize some relevant quality attributes. This problem is called the operator placement problem. This has long been a problem because stream processing applications typically do not stop running after they are deployed and it is difficult to make runtime adjustments without impacting performance. However, obtaining optimal operator placement is a difficult problem for NP (Non-deterministic Polynomial complex). Therefore, a number of heuristics have been devised that can solve the operator placement problem in an acceptable time. Generally, heuristics are designed manually based on the characteristics of a particular problem.
At present, the use of deep reinforcement learning to train heuristic methods becomes a research focus, and the methods based on the deep reinforcement learning are included. The current method based on deep reinforcement learning assumes that CPUs (Central Processing units), memories, networks, and the like of all resources are homogeneous, but due to continuous deployment of stream Processing applications, the actually available resources are actually heterogeneous, and the amount of available resources is also changing continuously. In this case, the task placement scheme obtained by the deep reinforcement learning-based method is not applicable to heterogeneous resources.
Disclosure of Invention
Embodiments of the present invention provide a task placement method, device, apparatus, and medium, so that a task placement scheme is applicable to heterogeneous resources, and when the scheme is deployed in a real cluster, a throughput value with higher accuracy can be obtained.
In a first aspect, the present invention provides a task placement method, applied to a stream processing task, including:
generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes of a processing unit in a resource graph; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to task embedding vectors and resource embedding vectors so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
In a second aspect, an embodiment of the present invention further provides a task placement device, which is applied to a stream processing task, and includes:
the task graph generating module is configured to generate a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
the encoding module is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
the decoding module is configured to determine a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector based on a cyclic neural network in a preset task placement model so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
In a third aspect, an embodiment of the present invention further provides a computing device, including:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program codes stored in the memory to execute the task placement method provided by any embodiment of the invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the task placement method provided in any embodiment of the present invention.
According to the technical scheme provided by the embodiment of the invention, the task graph corresponding to the stream processing operation can be generated according to the parallelism of each operator in the stream processing operation and the connection mode among the operators. The task graph and the resource graph can be coded through a graph neural network in a preset task placement model, task embedding vectors corresponding to all task nodes in the task graph are determined, and resource embedding vectors corresponding to all processing unit slot nodes in the resource graph are determined. The resource map in the embodiment of the invention is a fully-connected undirected graph, and in the training process of the preset task placement model, the CPU calculation power attribute and the available memory attribute of each slot node are considered when the iteration of the embedding vector is carried out, so that the preset task placement model provided by the embodiment can be suitable for heterogeneous resources. Through a circulating neural network in a preset task placement model, slot nodes corresponding to each task node can be determined according to the task embedding vectors and the resource embedding vectors, and therefore the deployment mode of the outflow processing operation is determined. In addition, because the throughput and the delay attribute are considered in the training process of the preset task placement model, the throughput attribute and the delay attribute can meet the preset requirements in the actual stream processing job process of the preset task placement model provided by the embodiment.
The innovation points of the embodiment of the invention comprise:
1. when the embedding vector is generated by modeling the resource graph by using the graph neural network, the CPU computing power attribute and the available memory attribute of each slot node are considered, so that the trained task placement model is suitable for heterogeneous resources, and the method is one of the innovation points of the embodiment of the invention.
2. When the throughput is estimated in the running process, the scheme of calculating the throughput by adopting iteration recovery, resource distribution and application backpressure mechanisms realizes the offline estimation of the throughput under the condition of not deploying practical applications. The setting can not only accelerate the training process of the model, but also enable the throughput and the delay of the trained task placement model to respectively meet the preset requirements in the actual running process, and is one of the innovation points of the embodiment of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1a is a flowchart of a training method for a data stream processing task placement model according to an embodiment of the present invention;
FIG. 1b is a block diagram of an encoding-decoding model according to an embodiment of the present invention;
FIG. 1c is a flowchart of an off-line throughput estimation algorithm according to an embodiment of the present invention;
FIG. 1d is a screenshot of the relationship between the estimated and actual values of throughput during the experiment;
FIG. 1e is a screenshot of the error between the estimated throughput and the actual value during the experiment;
fig. 1f is a schematic structural diagram of a throughput testing topology according to an embodiment of the present invention;
fig. 1g is a screenshot of an experimental effect of a throughput test corresponding to a throughput test topology according to an embodiment of the present invention;
fig. 1h is a schematic structural diagram of a word counting topology according to an embodiment of the present invention;
fig. 1i is a screenshot of an experimental effect of a throughput test corresponding to a word count topology according to an embodiment of the present invention;
fig. 1j is a schematic structural diagram of a log stream processing topology according to an embodiment of the present invention;
fig. 1k is an experimental effect screenshot of a throughput test corresponding to a log stream processing topology according to an embodiment of the present invention;
fig. 2a is a flowchart of a data stream processing task placement method according to a second embodiment of the present invention;
FIG. 2b is a flowchart of a stream processing job and a corresponding task chart generated according to the second embodiment of the present invention;
fig. 3 is a block diagram of a data flow processing task placing device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a method, a device, equipment and a medium for placing a data stream processing task. For the purpose of clearly and clearly describing the content of the embodiments of the present invention, the following briefly introduces the working principle of the present invention:
1. stream processing model
A stream processing job may be represented as a Directed Acyclic Graph (DAG), by Gjob=(Vjob,Ejob) To indicate. Wherein, VjobRepresenting a set of all operators, EjobRepresenting the set of all edges. Each job node j ∈ VjobIs an operator that performs a specific operation (e.g., mapping, filtering, aggregation). For each job node j ∈ VjobThere is a user-defined parallelism j.p that represents the number of subtasks for job node j. All parallel subtasks tj,k( k 1, 2.., j.p) is created and executed.
Each side (j)u,jv)∈EjobConnected with a working node juAnd job node jvIndicating that streaming data will be from job node juFlow direction operation node jvWherein u and v are node numbers. A tuple (metaancestor) may be used to describe a data item flowing in the DAG. For one edge (j)u,jv)∈EjobOperation node juThe task in (1) will communicate with the task in the job node. There are two ways of connection: direct connections and broadcast connections. The direct connection will only be at the job node juWith the same degree of parallelism as the job node, i.e. ju.p=jv.pTime of flight, each task node tu,i∈juWill only communicate with one task node tv,i∈jvAre connected. A broadcast connection means that each task node tu,i∈juWill communicate with all job nodes jvAre connected. It is assumed that each task will send tuples evenly to its downstream tasks. According to the definition of parallelism and connection mode, based on a given job graph, a corresponding task hierarchy graph can be constructed, and G is used fortask=(Vtask,Etask) To indicate. Wherein, VtaskRepresenting a collection of all task nodes, EtaskRepresenting the set of all edges. Task Placement attention is directed to task hierarchy graph Gtask. Each task t ∈ VtaskIs the smallest unit that can be placed on a resource node and can be described by two parameters: CPU (CPU utilization) and t.mem (required memory). The true CPU utilization value is very difficult to estimate, so it is described here using relative values. Mem represents the maximum memory value that task t may consume, which is a true value.
2. Resource model
A resource can be represented as a fully-connected undirected graph, through Gres=(Vres,Eres) And (4) showing. Wherein, VresIs a set of slots that represent the smallest unit that can be used to place a task. EresEach edge in (a) represents a logical connection between two slots. Each slot has two attributes: s.cpu (CPU computing power) and s.mem (available memory).
3. Task placement problem
Consider a pair of task level graphs GtaskAnd resource map GresThe DSP (Digital Signal processing) scheduler needs a placement solution P: vtask→VresIs denoted by VtaskAnd VresA suitable mapping between. In particular, it is necessary to assign each task ti∈VtaskPut to a specific slot node sj∈VresWhile simultaneously optimizing certain quality attributes, such as throughput attributes and delay attributes. In an embodiment of the invention, throughput is the quality attribute of most interest. In other words, embodiments of the present invention need to find a placement scheme that can fully utilize all available resources and support DSP applications with high throughput requirements.
Since the task placement problem is NP-hard, this means that no one algorithm can find the optimal solution in polynomial time anymore. In order to deploy DSP applications efficiently, a heuristic approach is needed that can find a better solution in the feasible time.
In fact, the tasks in the task graph have different CPU utilization rates and complex dependencies. If the scheduler places tasks on a small subset of the set of all slots, the communication delay becomes small, but this adversely affects throughput since a single slot needs to perform many tasks. Conversely, if the scheduler distributes the tasks evenly across all slots, each task will have sufficient resources, but the communication delay will become significant. Even allocation is not a good choice in view of resource heterogeneity. Therefore, an ideal scheduler should consider the information of both task and resource maps, and make a good trade-off between different quality attributes, and efficiently give a suitable allocation scheme. The stream processing task placement method provided by the embodiment of the invention fully considers the information of the task graph and the resource graph. The task placement scheme, when deployed into a real cluster, can fully utilize all available resources and support DSP applications with high throughput requirements.
The following describes in detail a specific implementation process of the stream processing task placement method provided by the embodiment of the present invention from a training phase and an application phase of a model, respectively.
Example one
Fig. 1a is a flowchart of a training method for a data stream processing task placement model according to an embodiment of the present invention, where the task placement model is applicable to a stream processing task placement process to obtain a placement scheme in which a throughput attribute and a delay attribute meet preset requirements. As shown in fig. 1a, the training method includes:
s110, acquiring an initial task embedding vector corresponding to each task node in the task graph, an initial resource embedding vector corresponding to each slot node of the processing unit in the resource graph, and a randomly generated task placement array.
The length of the task placement array represents the number of tasks in the task graph, and the value in the array is the number of the processing unit (slot).
In this embodiment, graphs with different structures are perceived by using a graph neural network, including a task graph (task graph) and a resource graph (resource graph). The graph neural network encodes the graph information into a set of initial embedding (embedding) vectors. The Graph neural Network may be a Graph Convolutional Network Graph (GCN).
And S120, respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector.
In this embodiment, a Graph neural network algorithm (Graph SAmple and aggreGatE) is adopted to iterate the embedding vector of each current node vAnd (4) updating. Wherein the initial characteristic of the current node v is defined as fvIts embedding vector at the k step is
Figure BDA0003134105370000071
When k is equal to 0, the first step is,
Figure BDA0003134105370000072
encoding task graph G due to different graph typestaskAnd resource map GresThe process of (a) may be different.
In this embodiment, for one task graph GtaskSince the upstream and downstream neighbor nodes of the current node v will have different influences on the generated placement scheme, the upstream and downstream neighbor nodes of the current node v will be aggregated respectively. Here, the sets of upstream and downstream nodes of the current node v are respectively denoted as Nu(v) And Nd(v) In that respect With Nu(v) For example, for each current node, the upstream node u ∈ Nu(v) Its embedding vector at the k step is
Figure BDA0003134105370000073
Is calculated by the following formula
Figure BDA0003134105370000074
Figure BDA0003134105370000075
Wherein the content of the first and second substances,
Figure BDA0003134105370000076
is the intermediate variable that is the variable between,
Figure BDA0003134105370000077
is a parameter matrix, the values in the matrix are all model parameters, and the formula is to multiply the parameter matrix by the input vector
Figure BDA0003134105370000078
Then calculated by the activation function ReLUA target value.
In all of
Figure BDA0003134105370000079
After calculation, the upstream perspective embedding of v is updated using the following equation:
Figure BDA00031341053700000710
wherein the content of the first and second substances,
Figure BDA00031341053700000711
is a parameter matrix, the values in the matrix are all model parameters, [:]indicating that the two vectors are to be connected.
Similarly, downstream view of v
Figure BDA0003134105370000081
Will use
Figure BDA0003134105370000082
And
Figure BDA0003134105370000083
and (4) calculating. Subsequently, v is at step k +1
Figure BDA0003134105370000084
Is the connection of the upstream and downstream viewing angles embedding:
Figure BDA0003134105370000085
in this embodiment, the resource map is an undirected graph G with edge attribute information (communication delay)res. For the resource graph, in order to sense the edge attribute of the resource graph, embedding of a resource node is connected with the edge attribute during the first transformation, which is embodied by the following formula:
Figure BDA0003134105370000086
because the resource graph is a fully-connected undirected graph, there is no difference between upstream and downstream neighbors, all other node vectors not including v need to be averaged during aggregation, and the formula after the colon means averaging, specifically, the aggregation formula can be adjusted as follows:
Figure BDA0003134105370000087
after the K iterations, the sample task embedding vectors corresponding to each task node in the task graph and the sample resource embedding vectors corresponding to all resource nodes in the resource graph are obtained through calculation. The information of the whole task graph can be obtained by calculating after entering embedding of each task into a full connection layer and a maximum pooling layer. Fig. 1b is a block diagram of an encoding-decoding model according to an embodiment of the present invention. As shown in fig. 1b, the encoded results of the task map and the resource map are input into a recurrent neural network for decoding. Hereinafter, each task node t in the task graph isiExpressed as an embedding vector of
Figure BDA0003134105370000088
Each slot node s in the resource mapiExpressed as an embedding vector of
Figure BDA0003134105370000089
And S130, sequencing the task nodes by adopting a topological sequencing method.
In this embodiment, the task placement is targeted to place each task ti∈VtaskPut into a specific slotsi∈VresThe above. For a task tiIn other words, the slot to which the upstream task is assigned will be for tiThe placement of (a) has a large impact. Thus, the present embodiment guarantees t by topologically ordering all tasksiAt tiAll previously placed. The decoder will decide for each task in order according to the result of the topological orderingAnd (4) placing.
And S140, for any sequenced current task node, calculating a current context vector corresponding to the current task node according to the sample task embedding vector corresponding to the current task node and the sample resource embedding vectors corresponding to the upstream nodes of the current task node respectively.
In this embodiment, when calculating a current context vector corresponding to a current task node, an integrating operation may be performed on resource embedding vectors corresponding to each upstream node of the current task node, so that all upstream nodes correspond to one integrated resource embedding vector; wherein the integrating operation comprises: taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension;
adding the integrated resource embedding vector and the task embedding vector of the current task child node; and inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node.
Specifically, as shown in FIG. 1b, S is applied to all upstream nodes(up)(ti) After max operation, with
Figure BDA0003134105370000091
Adding operation is carried out, the result of the adding operation is input into the attention layer, and the current context vector corresponding to the current task node is obtained
Figure BDA0003134105370000092
In this embodiment, the placement solution finally generated by the decoder is denoted as P, and the task placement problem is described formally by the following formula:
Figure BDA0003134105370000093
wherein S is(up)(ti) Finger task (t)i) Is placed toP represents the probability value that the task node is assigned to the slot node.
Figure BDA0003134105370000094
Representing a task (t)i) The corresponding slot node.
In this embodiment, a GRU (Gate Current Unit) can be used to learn a state representation
Figure BDA0003134105370000095
Thereby memorizing the dependency relationship between tasks. h istiCode S(up)(ti) And GtaskThe information of (1). As shown in FIG. 1b, at each step, a vector is input
Figure BDA0003134105370000096
Will interact with S(up)(ti) Adding the embedding vectors of the middle slot to enhance the sum tiAn understanding of the relative placement. Unlike the prior art, the present embodiment considers only tiIs received from the upstream task. The results over the GRU are expressed as:
Figure BDA0003134105370000097
to predict on which slot t is placediIn the embodiment, the attention layer of the recurrent neural network is firstly adopted to obtain the intermediate vector ci. In particular, the intermediate vector ciCan be calculated by the following formula:
Figure BDA0003134105370000098
wherein the content of the first and second substances,
Figure BDA0003134105370000099
indicating the embedding vector of the task node at the ith step
Figure BDA00031341053700000910
An attention score obtained at the passage through the attention layer; a isijDenotes eijFractional values obtained through softmax layer.
By means of the intermediate vector, a context vector can be obtained
Figure BDA0003134105370000101
Specifically, it can be calculated by the following formula:
Figure BDA0003134105370000102
wherein, WQThe method is characterized in that the method is a coefficient matrix, elements in the coefficient matrix are parameters of a task placement model, and the parameters are determined after the task placement model is trained.
S150, determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
Specifically, step S150 includes:
inputting the current context vector and the sample resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain a probability value p of the current task node allocated to each slot node, which can be specifically obtained by the following formula:
Figure BDA0003134105370000103
wherein the intermediate parameter
Figure BDA0003134105370000104
dkIs the dimension, k, of the target slot embedding vectorjIs a generalized representation, referred to as a slot embedding vector in this embodiment.
And for the probability value of the current task node distributed to each slot node, taking the slot node with the maximum probability value as a target slot node corresponding to the current task node. After all task nodes determine corresponding target slot nodes, a mapping relation between each task node and the corresponding target slot node is established.
And S160, calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node.
Where for a tuple, the delay is defined as the sum of all inter-slot communication delays that pass from source to sink. There may be many different paths from the source node to the sink node, so in this embodiment, the final delay is calculated as: average of the delays over all paths from the source node to the sink node.
Throughput represents the number of tuples that a particular DSP job can handle per second. For most DSP applications, throughput is the most important quality attribute. But unlike latency, it is very difficult to estimate throughput without deploying a job. In the present embodiment, an offline throughput estimation algorithm is provided without deploying a stream processing job, please refer to fig. 1c, which specifically includes:
s161, when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to the corresponding task node;
and S162, calculating the current throughput of each task node according to the sequence of topological sorting.
And S163, traversing each task node according to the sequence of the reverse topology sorting, and recovering the computing power corresponding to the target slot corresponding to each task node based on a back pressure mechanism.
And S164, redistributing the calculated force corresponding to each target slot to the corresponding task node until the throughput of the source node is converged, and taking the throughput corresponding to the source node as the throughput in the operation process.
The specific allocation mode of the calculation force corresponding to each target slot can be performed according to the utilization rate of the CPU.
The estimation algorithm of throughput and delay provided by the embodiment can not only accelerate the training process of the task placement model, but also guide the actual deployment of the flow processing task. The main computational logic can accommodate different deployment scenarios. A plurality of different placement schemes are generated through the task placement model, and an optimal placement scheme can be selected for final deployment by adopting the estimation algorithm provided by the embodiment.
Further, in order to verify that the throughput estimation algorithm provided by the present embodiment is applicable to heterogeneous DSP applications and resources, the present embodiment generates 100 job graphs as verification data, where the number of job nodes ranges from 1 to 10, the parallelism of each job node ranges from 1 to 6, and the job graphs are deployed into a random subset of all slots in the cluster.
The number of cycles can be a measure of the CPU utilization of different tasks, which can be controlled in a user-defined function through Flink's DataStream API.
Fig. 1d is a screenshot of the relationship between the estimated value and the actual value of the throughput during the experiment, and fig. 1e is a screenshot of the error between the estimated throughput and the actual value during the experiment. As shown in fig. 1d, a clear linear relationship is shown between the estimated value and the actual value, and the relative throughput value can be converted into the actual value by using a linear regression method, and then the error is calculated. As shown in fig. 1e, the absolute value error does not exceed 10% for 78% of the test cases. The maximum absolute value error is 19%. The average absolute value error is 6.7% for all test cases, which means that the throughput estimation tool provided by the present embodiment is feasible for estimating the quality of different placement solutions without deploying an application.
And S170, determining a target reward value according to the calculation results of the throughput and the delay.
Wherein the target bonus value is a linear combination of the throughput bonus value and the delay bonus value, and can be represented by the following formula:
Figure BDA0003134105370000121
wherein, P is a placing scheme, namely the determined mapping relation between each task node and the corresponding target slot node; λ isWith the reward factor configured according to the quality attribute requirements. The setting of the prize value is very flexible and scalable and can be easily extended to other quality attributes. r isdelay(p) a reward value representing the calculated delay using the placement scheme provided by embodiments of the present invention; r isthroughtput(p) a reward value representing the throughput calculated using the placement scheme provided by embodiments of the present invention. Specifically, the reward value of the delay and the reward value of the throughput can be calculated by the following formulas respectively:
Figure BDA0003134105370000122
Figure BDA0003134105370000123
wherein, delaypRepresents a first delay value calculated by using the placement scheme provided by the embodiment of the invention; through cpupThe first throughput value calculated by the placement scheme provided by the embodiment of the invention is shown; delayQThe second delay average value is obtained by adopting a heuristic method provided by the prior art; through cpuQIs the second throughput average obtained using the heuristic provided by the prior art. In this example, delay is adoptedQAnd through ughputQTo avoid the drawback that the task placement model only improves on a single heuristic approach, rather than optimizing the set target based on actual quality attributes (such as throughput and latency, etc.).
And S180, obtaining a preset task placement model when the target reward value reaches convergence.
The preset task placement model associates each task node in the task graph with a slot node in the resource graph.
In the practical application process, the task placement model provided by the embodiment can find a relatively suitable placement scheme for the graph with the complex structure. Specifically, the model provided by the present embodiment can be tested using three different topologies.
Fig. 1f is a schematic structural diagram of a throughput testing topology according to an embodiment of the present invention, and fig. 1g is a screenshot of an experimental effect of a throughput test corresponding to the throughput testing topology according to the embodiment of the present invention. As shown in fig. 1f, the throughput testing topology is a topology having a source node, an equivalent node and a sink node. The source node will continue to generate random strings of fixed 10K size as tuples. The equivalent node will send the string to the sink node intact. The sink node increments the counter by 1 each time it receives a tuple. As shown in FIG. 1g, the task placement model provided by this embodiment improves the throughput by at least 8.9%, 10% and 47% over the prior art schedulers I-Storm, Flink-even and Flink, respectively. Therefore, the task placement model provided by the present embodiment has significant advantages.
Fig. 1h is a schematic structural diagram of a word counting topology according to a first embodiment of the present invention, and fig. 1i is a screenshot of an experimental effect of a throughput test corresponding to the word counting topology according to the first embodiment of the present invention; as shown in FIG. 1h, the word count topology includes a source node, a split node, a count node, and a sink node. Which is used to count the number of occurrences of each word in one or more files. The source node will send a list of words of random length, one line at a time (randomly generated between 1 and 1000 in length). The splitting node will split each line into words and the counting node will increment the counter based on the input word and send the result to an empty sink node.
As shown in fig. 1i, for the word count topology, the curve fluctuates slightly due to the random length of the input data. The throughput difference between Flink (2.6K) and Flink-even (2.3K) is not large, and I-Storm (3.9K) is better than the two. The task placement model provided by the embodiment can support 4.3K throughput, and compared with I-Storm, Flink and Flink-even are respectively improved by at least 10%, 63% and 87%. Therefore, the task placement model provided by the present embodiment has significant advantages.
Fig. 1j is a schematic structural diagram of a log stream processing topology according to a first embodiment of the present invention, and fig. 1k is a screenshot of an experimental effect of a throughput test corresponding to the log stream processing topology according to the first embodiment of the present invention; as shown in fig. 1j, the source node sends one row of log records at a time. The rule application node performs a rule-based analysis and sends a log entry. The log entry is sent to two operators, which perform indexing and counting operations, respectively.
For log stream processing topologies, this topology is more complex and it is not easy to find a suitable solution. As shown in fig. 1j, the throughput performance of the different methods varies greatly. The task placement model provided by the embodiment has the highest throughput (66K), which is improved by at least 31% relative to I-Storm (50K), 75% relative to Flink-even (37K) and 143% relative to Flink (27K). The task placement model provided by the present embodiment has significant advantages.
According to the technical scheme provided by the embodiment, when the embedding vector is generated by modeling the resource graph by using the graph neural network, the CPU computing power attribute and the available memory attribute of each slot node are considered, so that the trained task placement model is suitable for heterogeneous resources. And in the training process of the task placement model, the throughput and delay in the operation process are considered. When the throughput is estimated in the running process, the scheme of calculating the throughput by adopting iterative recovery, resource allocation and application backpressure mechanisms realizes the offline estimation of the throughput under the condition of not deploying practical application. The setting can not only accelerate the training process of the model, but also enable the trained task placement model to enable the throughput and the delay to respectively meet the preset requirements so as to be used for guiding the actual deployment of the follow current processing operation.
Example two
Fig. 2a is a flowchart of a data stream processing task placement method according to a second embodiment of the present invention, which can be applied in a process of deploying a stream processing job. The method may be performed by a task placement device, which may be implemented by means of software and/or hardware, as shown in fig. 2a, and comprises:
and S210, generating a task graph corresponding to the flow processing job according to the parallelism of each operator in the flow processing job and the connection mode among the operators.
Specifically, fig. 2b is a flow processing job map and a corresponding generated task map according to the second embodiment of the present invention. As shown in FIG. 2b, the stream processing job graph includes a source operator, a map operator, an aggregate operator, a filter operator, and a sink operator. And according to the parallelism (represented by p in fig. 2 b) of each operator, the obtained nodes in the task graph comprise the number of source nodes, mapping nodes, aggregation nodes, filtering nodes and sink nodes. The connection mode between the nodes is determined by the connection mode between the operators, as shown in fig. 2b, the solid line arrow represents the broadcast connection, and the dotted line arrow represents the direct connection.
S220, determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes in the resource graph.
The resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute.
In this embodiment, the preset task placement model includes an encoding portion and a decoding portion, wherein the encoding portion includes a graph neural network, and the graph neural network may be preferably adopted as the GCN. Based on the graph neural network, after the initial embedding vectors in the task graph and the resource graph are iteratively updated, the task embedding vectors and the resource embedding vectors are obtained.
And S230, based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to the task embedding vectors and the resource embedding vectors so as to determine a deployment mode of flow processing operation.
In this embodiment, the preset task placement model further includes a decoding portion, where the decoding portion may be implemented by a recurrent neural network, and the recurrent neural network may take an embedding vector corresponding to the task graph and the resource graph as an input, and may establish a mapping relationship between each task node in the task graph and a slot in the resource graph after passing through the GRU unit, the attention layer, and the Softmax layer. Please refer to the contents of the above embodiments, which will not be described herein again. After the preset task placement model is trained, each task node in the task graph can be associated with a slot in the resource graph, and the throughput attribute and the delay attribute in the running process of the stream processing job can meet the preset requirements.
In this embodiment, based on a recurrent neural network in a preset task placement model, according to a task embedding vector and a resource embedding vector, a slot node corresponding to each task node is determined, which may specifically include:
sequencing each task node by adopting a topological sequencing method; for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to each upstream node of the current task node; and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
Specifically, according to the task embedding vector corresponding to the current task node and the resource embedding vector corresponding to each upstream node of the current task node, calculating the current context vector corresponding to the current task node, including:
integrating the resource embedding vectors corresponding to each upstream node of the current task node respectively, so that all the upstream nodes correspond to one integrated resource embedding vector; adding the integrated resource embedding vector and the task embedding vector of the current task child node; and inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node. The calculation formula of the specific context vector is the same as the calculation formula of the context vector corresponding to the above embodiment in the model training process, and is not described here again.
Wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.
Specifically, determining a probability value of each slot node allocated to the current task node according to the current context vector includes:
and inputting the current context vector and the resource encoding vectors corresponding to all the slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node, wherein the calculation formula of the specific probability value is the same as the calculation formula of the corresponding probability value in the model training process of the embodiment, and the description is omitted here.
According to the technical scheme provided by the embodiment, the task graph corresponding to the stream processing job can be generated according to the parallelism of each operator in the stream processing job and the connection mode among the operators. The task graph and the resource graph can be coded through a graph neural network in a preset task placement model, task embedding vectors corresponding to all task nodes in the task graph are determined, and resource embedding vectors corresponding to all processing unit slot nodes in the resource graph are determined. The resource map in the embodiment of the invention is a fully-connected undirected graph, and in the training process of the preset task placement model, the CPU calculation power attribute and the available memory attribute of each slot node are considered when the iteration of the embedding vector is carried out, so that the preset task placement model provided by the embodiment can be suitable for heterogeneous resources. Through a circulating neural network in a preset task placement model, slot nodes corresponding to each task node can be determined according to the task embedding vectors and the resource embedding vectors, and therefore the deployment mode of the outflow processing operation is determined. In addition, because the throughput and the delay attribute are considered in the training process of the preset task placement model, the throughput attribute and the delay attribute can meet the preset requirements in the actual stream processing job process of the preset task placement model provided by the embodiment.
EXAMPLE III
Fig. 3 is a block diagram of a data stream processing task placing device according to a third embodiment of the present invention, where the device includes: a task graph generation module 310, an encoding module 320, and a decoding module 330; wherein the content of the first and second substances,
the task graph generating module 310 is configured to generate a task graph corresponding to a stream processing job according to the parallelism of each operator in the stream processing job and the connection mode between the operators;
the encoding module 320 is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
a decoding module 330 configured to determine, based on a recurrent neural network in a preset task placement model, a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector, so as to determine a deployment manner of the stream processing job;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
Optionally, the encoding module specifically includes:
a sorting unit configured to: sequencing each task node by adopting a topological sequencing method;
a context vector calculation unit configured to: for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to upstream nodes of the current task node respectively;
a target slot node determination unit configured to: and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
Optionally, the context vector calculating unit is specifically configured to:
integrating the resource embedding vectors respectively corresponding to each upstream node of the current task node, so that all the upstream nodes correspond to one integrated resource embedding vector;
adding the integrated resource embedding vector and the task embedding vector of the current task child node;
inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node;
wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.
Optionally, the target slot node determining unit is specifically configured to:
and inputting the current context vector and the resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node.
Optionally, the preset task placement model is obtained by training in the following manner:
acquiring an initial task embedding vector corresponding to each task node in a task graph, an initial resource embedding vector corresponding to each slot node of a processing unit in a resource graph, and a randomly generated task placement array; the length of the task placement array represents the number of tasks in the task graph, and the value in the array is the slot number;
respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector;
sequencing each task node by adopting a topological sequencing method;
for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a sample task embedding vector corresponding to the current task node and sample resource embedding vectors corresponding to each upstream node of the current task node;
determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node;
calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node;
determining a target reward value according to the calculation results of the throughput and the delay respectively; wherein the target reward value is a linear combination of a throughput reward value and a delay reward value;
and when the target reward value reaches convergence, obtaining the preset task placement model, wherein the preset task placement model enables each task node in the task graph to be associated with the slot node in the resource graph.
Optionally, the throughput is calculated as follows:
when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to a corresponding task;
calculating the current throughput of each task node according to the sequence of topological sorting;
traversing each task node according to the sequence of reverse topological sorting, and recovering the computing power corresponding to the target slot node corresponding to each task node based on a back pressure mechanism;
and redistributing the calculated force corresponding to each target slot node to the corresponding task node until the throughput of the source node reaches convergence, and taking the throughput corresponding to the source node as the throughput in the operation process.
Optionally, the delay is calculated in the following manner: the average of the delays on all paths from the source node to the sink node in the task node.
The task placement device provided by the embodiment of the invention can execute the task placement method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Technical details that are not described in detail in the above embodiments may be referred to a task placement method provided in any embodiment of the present invention.
Example four
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention. As shown in fig. 4, the computing device may include:
a memory 701 in which executable program code is stored;
a processor 702 coupled to the memory 701;
wherein, the processor 702 calls the executable program code stored in the memory 701 to execute the task placement method provided by any embodiment of the present invention.
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute a task placement method provided by any embodiment of the invention.
In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A task placement method is applied to a stream processing task, and is characterized by comprising the following steps:
generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes of a processing unit in a resource graph; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to task embedding vectors and resource embedding vectors so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
2. The method of claim 1, wherein determining a slot node corresponding to each task node according to a task embedding vector and a resource embedding vector based on a recurrent neural network in a preset task placement model comprises:
sequencing each task node by adopting a topological sequencing method;
for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to upstream nodes of the current task node respectively;
and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
3. The method of claim 2, wherein calculating the current context vector corresponding to the current task node according to the task embedding vector corresponding to the current task node and the resource embedding vectors corresponding to the upstream nodes of the current task node respectively comprises:
integrating the resource embedding vectors respectively corresponding to each upstream node of the current task node, so that all the upstream nodes correspond to one integrated resource embedding vector;
adding the integrated resource embedding vector and the task embedding vector of the current task child node;
inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node;
wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.
4. The method of claim 2, wherein determining probability values assigned to the slot nodes by the current task node according to the current context vector comprises:
and inputting the current context vector and the resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node.
5. The method of claim 1, wherein the preset task placement model is trained by:
acquiring an initial task embedding vector corresponding to each task node in a task graph, an initial resource embedding vector corresponding to each slot node of a processing unit in a resource graph, and a randomly generated task placement array; the length of the task placement array represents the number of tasks in the task graph, and the value in the array is the slot number;
respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector;
sequencing each task node by adopting a topological sequencing method;
for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a sample task embedding vector corresponding to the current task node and sample resource embedding vectors corresponding to each upstream node of the current task node;
determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node;
calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node;
determining a target reward value according to the calculation results of the throughput and the delay; wherein the target reward value is a linear combination of a throughput reward value and a delay reward value;
and when the target reward value reaches convergence, obtaining the preset task placement model, wherein the preset task placement model enables each task node in the task graph to be associated with the slot node in the resource graph.
6. The method of claim 5, wherein the throughput is calculated by:
when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to the corresponding task node;
calculating the current throughput of each task node according to the sequence of topological sorting;
traversing each task node according to the sequence of reverse topological sorting, and recovering the computing power corresponding to the target slot node corresponding to each task node based on a back pressure mechanism;
and redistributing the calculated force corresponding to each target slot node to the corresponding task node until the throughput of the source node reaches convergence, and taking the throughput corresponding to the source node as the throughput in the operation process.
7. The method of claim 6, wherein the delay is calculated by: the average of the delays on all paths from the source node to the sink node in the task node.
8. A task placement device applied to a stream processing task, comprising:
the task graph generating module is configured to generate a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
the encoding module is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
the decoding module is configured to determine a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector based on a cyclic neural network in a preset task placement model so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
9. A computing device, comprising:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to perform the placement method of the task according to any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a placement method for a task according to any one of claims 1 to 7.
CN202110714071.8A 2021-06-25 2021-06-25 Task placement method, device, equipment and medium Pending CN113391907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110714071.8A CN113391907A (en) 2021-06-25 2021-06-25 Task placement method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110714071.8A CN113391907A (en) 2021-06-25 2021-06-25 Task placement method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113391907A true CN113391907A (en) 2021-09-14

Family

ID=77624004

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110714071.8A Pending CN113391907A (en) 2021-06-25 2021-06-25 Task placement method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113391907A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023178766A1 (en) * 2022-03-25 2023-09-28 北京邮电大学 Task evaluation method and apparatus based on dynamic expansion of flink engine computing node
CN116841649A (en) * 2023-08-28 2023-10-03 杭州玳数科技有限公司 Method and device for hot restarting based on flink on horn

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170017521A1 (en) * 2015-07-13 2017-01-19 Palo Alto Research Center Incorporated Dynamically adaptive, resource aware system and method for scheduling
US20190236444A1 (en) * 2018-01-30 2019-08-01 International Business Machines Corporation Functional synthesis of networks of neurosynaptic cores on neuromorphic substrates
US20200136920A1 (en) * 2019-12-20 2020-04-30 Kshitij Arun Doshi End-to-end quality of service in edge computing environments
CN111126668A (en) * 2019-11-28 2020-05-08 中国人民解放军国防科技大学 Spark operation time prediction method and device based on graph convolution network
CN111309915A (en) * 2020-03-03 2020-06-19 爱驰汽车有限公司 Method, system, device and storage medium for training natural language of joint learning
CN111444009A (en) * 2019-11-15 2020-07-24 北京邮电大学 Resource allocation method and device based on deep reinforcement learning
US20200257968A1 (en) * 2019-02-08 2020-08-13 Adobe Inc. Self-learning scheduler for application orchestration on shared compute cluster
US20210117624A1 (en) * 2019-10-18 2021-04-22 Facebook, Inc. Semantic Representations Using Structural Ontology for Assistant Systems
CN112753016A (en) * 2018-09-30 2021-05-04 华为技术有限公司 Management method and device for computing resources in data preprocessing stage in neural network

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170017521A1 (en) * 2015-07-13 2017-01-19 Palo Alto Research Center Incorporated Dynamically adaptive, resource aware system and method for scheduling
US20190236444A1 (en) * 2018-01-30 2019-08-01 International Business Machines Corporation Functional synthesis of networks of neurosynaptic cores on neuromorphic substrates
CN112753016A (en) * 2018-09-30 2021-05-04 华为技术有限公司 Management method and device for computing resources in data preprocessing stage in neural network
US20200257968A1 (en) * 2019-02-08 2020-08-13 Adobe Inc. Self-learning scheduler for application orchestration on shared compute cluster
US20210117624A1 (en) * 2019-10-18 2021-04-22 Facebook, Inc. Semantic Representations Using Structural Ontology for Assistant Systems
CN111444009A (en) * 2019-11-15 2020-07-24 北京邮电大学 Resource allocation method and device based on deep reinforcement learning
CN111126668A (en) * 2019-11-28 2020-05-08 中国人民解放军国防科技大学 Spark operation time prediction method and device based on graph convolution network
US20200136920A1 (en) * 2019-12-20 2020-04-30 Kshitij Arun Doshi End-to-end quality of service in edge computing environments
CN111309915A (en) * 2020-03-03 2020-06-19 爱驰汽车有限公司 Method, system, device and storage medium for training natural language of joint learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
卢海峰;顾春华;罗飞;丁炜超;杨婷;郑帅;: "基于深度强化学习的移动边缘计算任务卸载研究", 计算机研究与发展, no. 07 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023178766A1 (en) * 2022-03-25 2023-09-28 北京邮电大学 Task evaluation method and apparatus based on dynamic expansion of flink engine computing node
CN116841649A (en) * 2023-08-28 2023-10-03 杭州玳数科技有限公司 Method and device for hot restarting based on flink on horn
CN116841649B (en) * 2023-08-28 2023-12-08 杭州玳数科技有限公司 Method and device for hot restarting based on flink on horn

Similar Documents

Publication Publication Date Title
Bolukbasi et al. Adaptive neural networks for efficient inference
Piscitelli et al. Design space pruning through hybrid analysis in system-level design space exploration
US11681914B2 (en) Determining multivariate time series data dependencies
CN111406264A (en) Neural architecture search
CN113391907A (en) Task placement method, device, equipment and medium
Vakilinia et al. Analysis and optimization of big-data stream processing
Chen et al. $ d $ d-Simplexed: Adaptive Delaunay Triangulation for Performance Modeling and Prediction on Big Data Analytics
Ni et al. Generalizable resource allocation in stream processing via deep reinforcement learning
Garbi et al. Learning queuing networks by recurrent neural networks
Cheng et al. Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model
Geyer et al. Graph-based deep learning for fast and tight network calculus analyses
Geyer et al. Tightening network calculus delay bounds by predicting flow prolongations in the FIFO analysis
Hou et al. A machine learning enabled long-term performance evaluation framework for NoCs
Sinclair et al. Adaptive discretization in online reinforcement learning
Guan et al. Quantifying the impact of uncertainty in embedded systems mapping for NoC based architectures
Daradkeh et al. Analytical modeling and prediction of cloud workload
Tuli et al. SimTune: bridging the simulator reality gap for resource management in edge-cloud computing
CN106874215B (en) Serialized storage optimization method based on Spark operator
Johnston et al. Performance tuning of MapReduce jobs using surrogate-based modeling
Park et al. Gemma: reinforcement learning-based graph embedding and mapping for virtual network applications
Sirocchi et al. Topological network features determine convergence rate of distributed average algorithms
Chen et al. Dynamically predicting the quality of service: batch, online, and hybrid algorithms
Sinclair et al. Adaptive discretization in online reinforcement learning
Tribastone Efficient optimization of software performance models via parameter-space pruning
Grohmann Reliable Resource Demand Estimation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination