CN113391907A - Task placement method, device, equipment and medium - Google Patents
Task placement method, device, equipment and medium Download PDFInfo
- Publication number
- CN113391907A CN113391907A CN202110714071.8A CN202110714071A CN113391907A CN 113391907 A CN113391907 A CN 113391907A CN 202110714071 A CN202110714071 A CN 202110714071A CN 113391907 A CN113391907 A CN 113391907A
- Authority
- CN
- China
- Prior art keywords
- task
- node
- graph
- resource
- slot
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 87
- 239000013598 vector Substances 0.000 claims abstract description 156
- 238000012545 processing Methods 0.000 claims abstract description 77
- 230000008569 process Effects 0.000 claims abstract description 42
- 238000013528 artificial neural network Methods 0.000 claims abstract description 37
- 238000005111 flow chemistry technique Methods 0.000 claims abstract description 14
- 125000004122 cyclic group Chemical group 0.000 claims abstract description 6
- 230000015654 memory Effects 0.000 claims description 33
- 238000011144 upstream manufacturing Methods 0.000 claims description 30
- 230000000306 recurrent effect Effects 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000012163 sequencing technique Methods 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 10
- 238000012935 Averaging Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 5
- 230000007246 mechanism Effects 0.000 claims description 5
- 238000003860 storage Methods 0.000 claims description 5
- 230000001934 delay Effects 0.000 claims description 4
- 238000003062 neural network model Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 description 15
- 238000010586 diagram Methods 0.000 description 13
- 238000012360 testing method Methods 0.000 description 13
- 239000011159 matrix material Substances 0.000 description 7
- 230000000694 effects Effects 0.000 description 6
- 230000002776 aggregation Effects 0.000 description 4
- 238000004220 aggregation Methods 0.000 description 4
- 238000004891 communication Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 239000000126 substance Substances 0.000 description 4
- 238000001914 filtration Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000011084 recovery Methods 0.000 description 2
- 241000764238 Isis Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000003116 impacting effect Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5011—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
- G06F9/5016—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The embodiment of the invention discloses a task placement method, a task placement device, task placement equipment and a task placement medium. The method comprises the following steps: generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators; determining task embedding vectors corresponding to all task nodes in a task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes in a resource graph; the resource map is a fully-connected undirected graph. And based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to the task embedding vector and the resource embedding vector. The task placement model provided by the embodiment of the invention is suitable for heterogeneous resources, and both the throughput attribute and the delay attribute can meet the preset requirements in the actual flow processing operation process by adopting the model.
Description
Technical Field
The embodiment of the invention relates to the technical field of data stream processing, in particular to a task placement method, a device, equipment and a medium.
Background
In a variety of different industrial fields, there are numerous tasks that require the use of large amounts and different types of data to make data-intensive decisions. Data is generated from streaming events such as financial transactions, sensor measurements, and the like. To be able to extract valuable information from such huge amounts of data in a timely manner, data stream processing frameworks and applications are becoming popular, which are able to continuously process unbounded data streams of arbitrary size in a near real-time manner.
The computational process of a data stream processing application is generally described by a DAG (Directed Acyclic Graph). Each node in the DAG represents an operator to perform some specific operation (e.g., mapping, filtering). The data that arrives continuously is processed by the operator and is transmitted from the source node through the directed edges in the DAG to the sink node. To fully exploit parallelism in DAGs, stream processing applications are typically deployed into distributed clusters, and in this scenario, a key issue is how to decide on which compute node to place and process each operator in the stream processing application, and to be able to optimize some relevant quality attributes. This problem is called the operator placement problem. This has long been a problem because stream processing applications typically do not stop running after they are deployed and it is difficult to make runtime adjustments without impacting performance. However, obtaining optimal operator placement is a difficult problem for NP (Non-deterministic Polynomial complex). Therefore, a number of heuristics have been devised that can solve the operator placement problem in an acceptable time. Generally, heuristics are designed manually based on the characteristics of a particular problem.
At present, the use of deep reinforcement learning to train heuristic methods becomes a research focus, and the methods based on the deep reinforcement learning are included. The current method based on deep reinforcement learning assumes that CPUs (Central Processing units), memories, networks, and the like of all resources are homogeneous, but due to continuous deployment of stream Processing applications, the actually available resources are actually heterogeneous, and the amount of available resources is also changing continuously. In this case, the task placement scheme obtained by the deep reinforcement learning-based method is not applicable to heterogeneous resources.
Disclosure of Invention
Embodiments of the present invention provide a task placement method, device, apparatus, and medium, so that a task placement scheme is applicable to heterogeneous resources, and when the scheme is deployed in a real cluster, a throughput value with higher accuracy can be obtained.
In a first aspect, the present invention provides a task placement method, applied to a stream processing task, including:
generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes of a processing unit in a resource graph; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to task embedding vectors and resource embedding vectors so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
In a second aspect, an embodiment of the present invention further provides a task placement device, which is applied to a stream processing task, and includes:
the task graph generating module is configured to generate a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
the encoding module is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
the decoding module is configured to determine a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector based on a cyclic neural network in a preset task placement model so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
In a third aspect, an embodiment of the present invention further provides a computing device, including:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program codes stored in the memory to execute the task placement method provided by any embodiment of the invention.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the task placement method provided in any embodiment of the present invention.
According to the technical scheme provided by the embodiment of the invention, the task graph corresponding to the stream processing operation can be generated according to the parallelism of each operator in the stream processing operation and the connection mode among the operators. The task graph and the resource graph can be coded through a graph neural network in a preset task placement model, task embedding vectors corresponding to all task nodes in the task graph are determined, and resource embedding vectors corresponding to all processing unit slot nodes in the resource graph are determined. The resource map in the embodiment of the invention is a fully-connected undirected graph, and in the training process of the preset task placement model, the CPU calculation power attribute and the available memory attribute of each slot node are considered when the iteration of the embedding vector is carried out, so that the preset task placement model provided by the embodiment can be suitable for heterogeneous resources. Through a circulating neural network in a preset task placement model, slot nodes corresponding to each task node can be determined according to the task embedding vectors and the resource embedding vectors, and therefore the deployment mode of the outflow processing operation is determined. In addition, because the throughput and the delay attribute are considered in the training process of the preset task placement model, the throughput attribute and the delay attribute can meet the preset requirements in the actual stream processing job process of the preset task placement model provided by the embodiment.
The innovation points of the embodiment of the invention comprise:
1. when the embedding vector is generated by modeling the resource graph by using the graph neural network, the CPU computing power attribute and the available memory attribute of each slot node are considered, so that the trained task placement model is suitable for heterogeneous resources, and the method is one of the innovation points of the embodiment of the invention.
2. When the throughput is estimated in the running process, the scheme of calculating the throughput by adopting iteration recovery, resource distribution and application backpressure mechanisms realizes the offline estimation of the throughput under the condition of not deploying practical applications. The setting can not only accelerate the training process of the model, but also enable the throughput and the delay of the trained task placement model to respectively meet the preset requirements in the actual running process, and is one of the innovation points of the embodiment of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1a is a flowchart of a training method for a data stream processing task placement model according to an embodiment of the present invention;
FIG. 1b is a block diagram of an encoding-decoding model according to an embodiment of the present invention;
FIG. 1c is a flowchart of an off-line throughput estimation algorithm according to an embodiment of the present invention;
FIG. 1d is a screenshot of the relationship between the estimated and actual values of throughput during the experiment;
FIG. 1e is a screenshot of the error between the estimated throughput and the actual value during the experiment;
fig. 1f is a schematic structural diagram of a throughput testing topology according to an embodiment of the present invention;
fig. 1g is a screenshot of an experimental effect of a throughput test corresponding to a throughput test topology according to an embodiment of the present invention;
fig. 1h is a schematic structural diagram of a word counting topology according to an embodiment of the present invention;
fig. 1i is a screenshot of an experimental effect of a throughput test corresponding to a word count topology according to an embodiment of the present invention;
fig. 1j is a schematic structural diagram of a log stream processing topology according to an embodiment of the present invention;
fig. 1k is an experimental effect screenshot of a throughput test corresponding to a log stream processing topology according to an embodiment of the present invention;
fig. 2a is a flowchart of a data stream processing task placement method according to a second embodiment of the present invention;
FIG. 2b is a flowchart of a stream processing job and a corresponding task chart generated according to the second embodiment of the present invention;
fig. 3 is a block diagram of a data flow processing task placing device according to a third embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
It is to be noted that the terms "comprises" and "comprising" and any variations thereof in the embodiments and drawings of the present invention are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a method, a device, equipment and a medium for placing a data stream processing task. For the purpose of clearly and clearly describing the content of the embodiments of the present invention, the following briefly introduces the working principle of the present invention:
1. stream processing model
A stream processing job may be represented as a Directed Acyclic Graph (DAG), by Gjob=(Vjob,Ejob) To indicate. Wherein, VjobRepresenting a set of all operators, EjobRepresenting the set of all edges. Each job node j ∈ VjobIs an operator that performs a specific operation (e.g., mapping, filtering, aggregation). For each job node j ∈ VjobThere is a user-defined parallelism j.p that represents the number of subtasks for job node j. All parallel subtasks tj,k( k 1, 2.., j.p) is created and executed.
Each side (j)u,jv)∈EjobConnected with a working node juAnd job node jvIndicating that streaming data will be from job node juFlow direction operation node jvWherein u and v are node numbers. A tuple (metaancestor) may be used to describe a data item flowing in the DAG. For one edge (j)u,jv)∈EjobOperation node juThe task in (1) will communicate with the task in the job node. There are two ways of connection: direct connections and broadcast connections. The direct connection will only be at the job node juWith the same degree of parallelism as the job node, i.e. ju.p=jv.pTime of flight, each task node tu,i∈juWill only communicate with one task node tv,i∈jvAre connected. A broadcast connection means that each task node tu,i∈juWill communicate with all job nodes jvAre connected. It is assumed that each task will send tuples evenly to its downstream tasks. According to the definition of parallelism and connection mode, based on a given job graph, a corresponding task hierarchy graph can be constructed, and G is used fortask=(Vtask,Etask) To indicate. Wherein, VtaskRepresenting a collection of all task nodes, EtaskRepresenting the set of all edges. Task Placement attention is directed to task hierarchy graph Gtask. Each task t ∈ VtaskIs the smallest unit that can be placed on a resource node and can be described by two parameters: CPU (CPU utilization) and t.mem (required memory). The true CPU utilization value is very difficult to estimate, so it is described here using relative values. Mem represents the maximum memory value that task t may consume, which is a true value.
2. Resource model
A resource can be represented as a fully-connected undirected graph, through Gres=(Vres,Eres) And (4) showing. Wherein, VresIs a set of slots that represent the smallest unit that can be used to place a task. EresEach edge in (a) represents a logical connection between two slots. Each slot has two attributes: s.cpu (CPU computing power) and s.mem (available memory).
3. Task placement problem
Consider a pair of task level graphs GtaskAnd resource map GresThe DSP (Digital Signal processing) scheduler needs a placement solution P: vtask→VresIs denoted by VtaskAnd VresA suitable mapping between. In particular, it is necessary to assign each task ti∈VtaskPut to a specific slot node sj∈VresWhile simultaneously optimizing certain quality attributes, such as throughput attributes and delay attributes. In an embodiment of the invention, throughput is the quality attribute of most interest. In other words, embodiments of the present invention need to find a placement scheme that can fully utilize all available resources and support DSP applications with high throughput requirements.
Since the task placement problem is NP-hard, this means that no one algorithm can find the optimal solution in polynomial time anymore. In order to deploy DSP applications efficiently, a heuristic approach is needed that can find a better solution in the feasible time.
In fact, the tasks in the task graph have different CPU utilization rates and complex dependencies. If the scheduler places tasks on a small subset of the set of all slots, the communication delay becomes small, but this adversely affects throughput since a single slot needs to perform many tasks. Conversely, if the scheduler distributes the tasks evenly across all slots, each task will have sufficient resources, but the communication delay will become significant. Even allocation is not a good choice in view of resource heterogeneity. Therefore, an ideal scheduler should consider the information of both task and resource maps, and make a good trade-off between different quality attributes, and efficiently give a suitable allocation scheme. The stream processing task placement method provided by the embodiment of the invention fully considers the information of the task graph and the resource graph. The task placement scheme, when deployed into a real cluster, can fully utilize all available resources and support DSP applications with high throughput requirements.
The following describes in detail a specific implementation process of the stream processing task placement method provided by the embodiment of the present invention from a training phase and an application phase of a model, respectively.
Example one
Fig. 1a is a flowchart of a training method for a data stream processing task placement model according to an embodiment of the present invention, where the task placement model is applicable to a stream processing task placement process to obtain a placement scheme in which a throughput attribute and a delay attribute meet preset requirements. As shown in fig. 1a, the training method includes:
s110, acquiring an initial task embedding vector corresponding to each task node in the task graph, an initial resource embedding vector corresponding to each slot node of the processing unit in the resource graph, and a randomly generated task placement array.
The length of the task placement array represents the number of tasks in the task graph, and the value in the array is the number of the processing unit (slot).
In this embodiment, graphs with different structures are perceived by using a graph neural network, including a task graph (task graph) and a resource graph (resource graph). The graph neural network encodes the graph information into a set of initial embedding (embedding) vectors. The Graph neural Network may be a Graph Convolutional Network Graph (GCN).
And S120, respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector.
In this embodiment, a Graph neural network algorithm (Graph SAmple and aggreGatE) is adopted to iterate the embedding vector of each current node vAnd (4) updating. Wherein the initial characteristic of the current node v is defined as fvIts embedding vector at the k step isWhen k is equal to 0, the first step is,encoding task graph G due to different graph typestaskAnd resource map GresThe process of (a) may be different.
In this embodiment, for one task graph GtaskSince the upstream and downstream neighbor nodes of the current node v will have different influences on the generated placement scheme, the upstream and downstream neighbor nodes of the current node v will be aggregated respectively. Here, the sets of upstream and downstream nodes of the current node v are respectively denoted as Nu(v) And Nd(v) In that respect With Nu(v) For example, for each current node, the upstream node u ∈ Nu(v) Its embedding vector at the k step isIs calculated by the following formula
Wherein the content of the first and second substances,is the intermediate variable that is the variable between,is a parameter matrix, the values in the matrix are all model parameters, and the formula is to multiply the parameter matrix by the input vectorThen calculated by the activation function ReLUA target value.
In all ofAfter calculation, the upstream perspective embedding of v is updated using the following equation:
wherein the content of the first and second substances,is a parameter matrix, the values in the matrix are all model parameters, [:]indicating that the two vectors are to be connected.
Similarly, downstream view of vWill useAndand (4) calculating. Subsequently, v is at step k +1Is the connection of the upstream and downstream viewing angles embedding:
in this embodiment, the resource map is an undirected graph G with edge attribute information (communication delay)res. For the resource graph, in order to sense the edge attribute of the resource graph, embedding of a resource node is connected with the edge attribute during the first transformation, which is embodied by the following formula:
because the resource graph is a fully-connected undirected graph, there is no difference between upstream and downstream neighbors, all other node vectors not including v need to be averaged during aggregation, and the formula after the colon means averaging, specifically, the aggregation formula can be adjusted as follows:
after the K iterations, the sample task embedding vectors corresponding to each task node in the task graph and the sample resource embedding vectors corresponding to all resource nodes in the resource graph are obtained through calculation. The information of the whole task graph can be obtained by calculating after entering embedding of each task into a full connection layer and a maximum pooling layer. Fig. 1b is a block diagram of an encoding-decoding model according to an embodiment of the present invention. As shown in fig. 1b, the encoded results of the task map and the resource map are input into a recurrent neural network for decoding. Hereinafter, each task node t in the task graph isiExpressed as an embedding vector ofEach slot node s in the resource mapiExpressed as an embedding vector of
And S130, sequencing the task nodes by adopting a topological sequencing method.
In this embodiment, the task placement is targeted to place each task ti∈VtaskPut into a specific slotsi∈VresThe above. For a task tiIn other words, the slot to which the upstream task is assigned will be for tiThe placement of (a) has a large impact. Thus, the present embodiment guarantees t by topologically ordering all tasksiAt tiAll previously placed. The decoder will decide for each task in order according to the result of the topological orderingAnd (4) placing.
And S140, for any sequenced current task node, calculating a current context vector corresponding to the current task node according to the sample task embedding vector corresponding to the current task node and the sample resource embedding vectors corresponding to the upstream nodes of the current task node respectively.
In this embodiment, when calculating a current context vector corresponding to a current task node, an integrating operation may be performed on resource embedding vectors corresponding to each upstream node of the current task node, so that all upstream nodes correspond to one integrated resource embedding vector; wherein the integrating operation comprises: taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension;
adding the integrated resource embedding vector and the task embedding vector of the current task child node; and inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node.
Specifically, as shown in FIG. 1b, S is applied to all upstream nodes(up)(ti) After max operation, withAdding operation is carried out, the result of the adding operation is input into the attention layer, and the current context vector corresponding to the current task node is obtained
In this embodiment, the placement solution finally generated by the decoder is denoted as P, and the task placement problem is described formally by the following formula:
wherein S is(up)(ti) Finger task (t)i) Is placed toP represents the probability value that the task node is assigned to the slot node.Representing a task (t)i) The corresponding slot node.
In this embodiment, a GRU (Gate Current Unit) can be used to learn a state representationThereby memorizing the dependency relationship between tasks. h istiCode S(up)(ti) And GtaskThe information of (1). As shown in FIG. 1b, at each step, a vector is inputWill interact with S(up)(ti) Adding the embedding vectors of the middle slot to enhance the sum tiAn understanding of the relative placement. Unlike the prior art, the present embodiment considers only tiIs received from the upstream task. The results over the GRU are expressed as:
to predict on which slot t is placediIn the embodiment, the attention layer of the recurrent neural network is firstly adopted to obtain the intermediate vector ci. In particular, the intermediate vector ciCan be calculated by the following formula:
wherein the content of the first and second substances,indicating the embedding vector of the task node at the ith stepAn attention score obtained at the passage through the attention layer; a isijDenotes eijFractional values obtained through softmax layer.
By means of the intermediate vector, a context vector can be obtainedSpecifically, it can be calculated by the following formula:
wherein, WQThe method is characterized in that the method is a coefficient matrix, elements in the coefficient matrix are parameters of a task placement model, and the parameters are determined after the task placement model is trained.
S150, determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
Specifically, step S150 includes:
inputting the current context vector and the sample resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain a probability value p of the current task node allocated to each slot node, which can be specifically obtained by the following formula:
wherein the intermediate parameterdkIs the dimension, k, of the target slot embedding vectorjIs a generalized representation, referred to as a slot embedding vector in this embodiment.
And for the probability value of the current task node distributed to each slot node, taking the slot node with the maximum probability value as a target slot node corresponding to the current task node. After all task nodes determine corresponding target slot nodes, a mapping relation between each task node and the corresponding target slot node is established.
And S160, calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node.
Where for a tuple, the delay is defined as the sum of all inter-slot communication delays that pass from source to sink. There may be many different paths from the source node to the sink node, so in this embodiment, the final delay is calculated as: average of the delays over all paths from the source node to the sink node.
Throughput represents the number of tuples that a particular DSP job can handle per second. For most DSP applications, throughput is the most important quality attribute. But unlike latency, it is very difficult to estimate throughput without deploying a job. In the present embodiment, an offline throughput estimation algorithm is provided without deploying a stream processing job, please refer to fig. 1c, which specifically includes:
s161, when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to the corresponding task node;
and S162, calculating the current throughput of each task node according to the sequence of topological sorting.
And S163, traversing each task node according to the sequence of the reverse topology sorting, and recovering the computing power corresponding to the target slot corresponding to each task node based on a back pressure mechanism.
And S164, redistributing the calculated force corresponding to each target slot to the corresponding task node until the throughput of the source node is converged, and taking the throughput corresponding to the source node as the throughput in the operation process.
The specific allocation mode of the calculation force corresponding to each target slot can be performed according to the utilization rate of the CPU.
The estimation algorithm of throughput and delay provided by the embodiment can not only accelerate the training process of the task placement model, but also guide the actual deployment of the flow processing task. The main computational logic can accommodate different deployment scenarios. A plurality of different placement schemes are generated through the task placement model, and an optimal placement scheme can be selected for final deployment by adopting the estimation algorithm provided by the embodiment.
Further, in order to verify that the throughput estimation algorithm provided by the present embodiment is applicable to heterogeneous DSP applications and resources, the present embodiment generates 100 job graphs as verification data, where the number of job nodes ranges from 1 to 10, the parallelism of each job node ranges from 1 to 6, and the job graphs are deployed into a random subset of all slots in the cluster.
The number of cycles can be a measure of the CPU utilization of different tasks, which can be controlled in a user-defined function through Flink's DataStream API.
Fig. 1d is a screenshot of the relationship between the estimated value and the actual value of the throughput during the experiment, and fig. 1e is a screenshot of the error between the estimated throughput and the actual value during the experiment. As shown in fig. 1d, a clear linear relationship is shown between the estimated value and the actual value, and the relative throughput value can be converted into the actual value by using a linear regression method, and then the error is calculated. As shown in fig. 1e, the absolute value error does not exceed 10% for 78% of the test cases. The maximum absolute value error is 19%. The average absolute value error is 6.7% for all test cases, which means that the throughput estimation tool provided by the present embodiment is feasible for estimating the quality of different placement solutions without deploying an application.
And S170, determining a target reward value according to the calculation results of the throughput and the delay.
Wherein the target bonus value is a linear combination of the throughput bonus value and the delay bonus value, and can be represented by the following formula:
wherein, P is a placing scheme, namely the determined mapping relation between each task node and the corresponding target slot node; λ isWith the reward factor configured according to the quality attribute requirements. The setting of the prize value is very flexible and scalable and can be easily extended to other quality attributes. r isdelay(p) a reward value representing the calculated delay using the placement scheme provided by embodiments of the present invention; r isthroughtput(p) a reward value representing the throughput calculated using the placement scheme provided by embodiments of the present invention. Specifically, the reward value of the delay and the reward value of the throughput can be calculated by the following formulas respectively:
wherein, delaypRepresents a first delay value calculated by using the placement scheme provided by the embodiment of the invention; through cpupThe first throughput value calculated by the placement scheme provided by the embodiment of the invention is shown; delayQThe second delay average value is obtained by adopting a heuristic method provided by the prior art; through cpuQIs the second throughput average obtained using the heuristic provided by the prior art. In this example, delay is adoptedQAnd through ughputQTo avoid the drawback that the task placement model only improves on a single heuristic approach, rather than optimizing the set target based on actual quality attributes (such as throughput and latency, etc.).
And S180, obtaining a preset task placement model when the target reward value reaches convergence.
The preset task placement model associates each task node in the task graph with a slot node in the resource graph.
In the practical application process, the task placement model provided by the embodiment can find a relatively suitable placement scheme for the graph with the complex structure. Specifically, the model provided by the present embodiment can be tested using three different topologies.
Fig. 1f is a schematic structural diagram of a throughput testing topology according to an embodiment of the present invention, and fig. 1g is a screenshot of an experimental effect of a throughput test corresponding to the throughput testing topology according to the embodiment of the present invention. As shown in fig. 1f, the throughput testing topology is a topology having a source node, an equivalent node and a sink node. The source node will continue to generate random strings of fixed 10K size as tuples. The equivalent node will send the string to the sink node intact. The sink node increments the counter by 1 each time it receives a tuple. As shown in FIG. 1g, the task placement model provided by this embodiment improves the throughput by at least 8.9%, 10% and 47% over the prior art schedulers I-Storm, Flink-even and Flink, respectively. Therefore, the task placement model provided by the present embodiment has significant advantages.
Fig. 1h is a schematic structural diagram of a word counting topology according to a first embodiment of the present invention, and fig. 1i is a screenshot of an experimental effect of a throughput test corresponding to the word counting topology according to the first embodiment of the present invention; as shown in FIG. 1h, the word count topology includes a source node, a split node, a count node, and a sink node. Which is used to count the number of occurrences of each word in one or more files. The source node will send a list of words of random length, one line at a time (randomly generated between 1 and 1000 in length). The splitting node will split each line into words and the counting node will increment the counter based on the input word and send the result to an empty sink node.
As shown in fig. 1i, for the word count topology, the curve fluctuates slightly due to the random length of the input data. The throughput difference between Flink (2.6K) and Flink-even (2.3K) is not large, and I-Storm (3.9K) is better than the two. The task placement model provided by the embodiment can support 4.3K throughput, and compared with I-Storm, Flink and Flink-even are respectively improved by at least 10%, 63% and 87%. Therefore, the task placement model provided by the present embodiment has significant advantages.
Fig. 1j is a schematic structural diagram of a log stream processing topology according to a first embodiment of the present invention, and fig. 1k is a screenshot of an experimental effect of a throughput test corresponding to the log stream processing topology according to the first embodiment of the present invention; as shown in fig. 1j, the source node sends one row of log records at a time. The rule application node performs a rule-based analysis and sends a log entry. The log entry is sent to two operators, which perform indexing and counting operations, respectively.
For log stream processing topologies, this topology is more complex and it is not easy to find a suitable solution. As shown in fig. 1j, the throughput performance of the different methods varies greatly. The task placement model provided by the embodiment has the highest throughput (66K), which is improved by at least 31% relative to I-Storm (50K), 75% relative to Flink-even (37K) and 143% relative to Flink (27K). The task placement model provided by the present embodiment has significant advantages.
According to the technical scheme provided by the embodiment, when the embedding vector is generated by modeling the resource graph by using the graph neural network, the CPU computing power attribute and the available memory attribute of each slot node are considered, so that the trained task placement model is suitable for heterogeneous resources. And in the training process of the task placement model, the throughput and delay in the operation process are considered. When the throughput is estimated in the running process, the scheme of calculating the throughput by adopting iterative recovery, resource allocation and application backpressure mechanisms realizes the offline estimation of the throughput under the condition of not deploying practical application. The setting can not only accelerate the training process of the model, but also enable the trained task placement model to enable the throughput and the delay to respectively meet the preset requirements so as to be used for guiding the actual deployment of the follow current processing operation.
Example two
Fig. 2a is a flowchart of a data stream processing task placement method according to a second embodiment of the present invention, which can be applied in a process of deploying a stream processing job. The method may be performed by a task placement device, which may be implemented by means of software and/or hardware, as shown in fig. 2a, and comprises:
and S210, generating a task graph corresponding to the flow processing job according to the parallelism of each operator in the flow processing job and the connection mode among the operators.
Specifically, fig. 2b is a flow processing job map and a corresponding generated task map according to the second embodiment of the present invention. As shown in FIG. 2b, the stream processing job graph includes a source operator, a map operator, an aggregate operator, a filter operator, and a sink operator. And according to the parallelism (represented by p in fig. 2 b) of each operator, the obtained nodes in the task graph comprise the number of source nodes, mapping nodes, aggregation nodes, filtering nodes and sink nodes. The connection mode between the nodes is determined by the connection mode between the operators, as shown in fig. 2b, the solid line arrow represents the broadcast connection, and the dotted line arrow represents the direct connection.
S220, determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes in the resource graph.
The resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute.
In this embodiment, the preset task placement model includes an encoding portion and a decoding portion, wherein the encoding portion includes a graph neural network, and the graph neural network may be preferably adopted as the GCN. Based on the graph neural network, after the initial embedding vectors in the task graph and the resource graph are iteratively updated, the task embedding vectors and the resource embedding vectors are obtained.
And S230, based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to the task embedding vectors and the resource embedding vectors so as to determine a deployment mode of flow processing operation.
In this embodiment, the preset task placement model further includes a decoding portion, where the decoding portion may be implemented by a recurrent neural network, and the recurrent neural network may take an embedding vector corresponding to the task graph and the resource graph as an input, and may establish a mapping relationship between each task node in the task graph and a slot in the resource graph after passing through the GRU unit, the attention layer, and the Softmax layer. Please refer to the contents of the above embodiments, which will not be described herein again. After the preset task placement model is trained, each task node in the task graph can be associated with a slot in the resource graph, and the throughput attribute and the delay attribute in the running process of the stream processing job can meet the preset requirements.
In this embodiment, based on a recurrent neural network in a preset task placement model, according to a task embedding vector and a resource embedding vector, a slot node corresponding to each task node is determined, which may specifically include:
sequencing each task node by adopting a topological sequencing method; for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to each upstream node of the current task node; and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
Specifically, according to the task embedding vector corresponding to the current task node and the resource embedding vector corresponding to each upstream node of the current task node, calculating the current context vector corresponding to the current task node, including:
integrating the resource embedding vectors corresponding to each upstream node of the current task node respectively, so that all the upstream nodes correspond to one integrated resource embedding vector; adding the integrated resource embedding vector and the task embedding vector of the current task child node; and inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node. The calculation formula of the specific context vector is the same as the calculation formula of the context vector corresponding to the above embodiment in the model training process, and is not described here again.
Wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.
Specifically, determining a probability value of each slot node allocated to the current task node according to the current context vector includes:
and inputting the current context vector and the resource encoding vectors corresponding to all the slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node, wherein the calculation formula of the specific probability value is the same as the calculation formula of the corresponding probability value in the model training process of the embodiment, and the description is omitted here.
According to the technical scheme provided by the embodiment, the task graph corresponding to the stream processing job can be generated according to the parallelism of each operator in the stream processing job and the connection mode among the operators. The task graph and the resource graph can be coded through a graph neural network in a preset task placement model, task embedding vectors corresponding to all task nodes in the task graph are determined, and resource embedding vectors corresponding to all processing unit slot nodes in the resource graph are determined. The resource map in the embodiment of the invention is a fully-connected undirected graph, and in the training process of the preset task placement model, the CPU calculation power attribute and the available memory attribute of each slot node are considered when the iteration of the embedding vector is carried out, so that the preset task placement model provided by the embodiment can be suitable for heterogeneous resources. Through a circulating neural network in a preset task placement model, slot nodes corresponding to each task node can be determined according to the task embedding vectors and the resource embedding vectors, and therefore the deployment mode of the outflow processing operation is determined. In addition, because the throughput and the delay attribute are considered in the training process of the preset task placement model, the throughput attribute and the delay attribute can meet the preset requirements in the actual stream processing job process of the preset task placement model provided by the embodiment.
EXAMPLE III
Fig. 3 is a block diagram of a data stream processing task placing device according to a third embodiment of the present invention, where the device includes: a task graph generation module 310, an encoding module 320, and a decoding module 330; wherein the content of the first and second substances,
the task graph generating module 310 is configured to generate a task graph corresponding to a stream processing job according to the parallelism of each operator in the stream processing job and the connection mode between the operators;
the encoding module 320 is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
a decoding module 330 configured to determine, based on a recurrent neural network in a preset task placement model, a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector, so as to determine a deployment manner of the stream processing job;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
Optionally, the encoding module specifically includes:
a sorting unit configured to: sequencing each task node by adopting a topological sequencing method;
a context vector calculation unit configured to: for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to upstream nodes of the current task node respectively;
a target slot node determination unit configured to: and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
Optionally, the context vector calculating unit is specifically configured to:
integrating the resource embedding vectors respectively corresponding to each upstream node of the current task node, so that all the upstream nodes correspond to one integrated resource embedding vector;
adding the integrated resource embedding vector and the task embedding vector of the current task child node;
inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node;
wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.
Optionally, the target slot node determining unit is specifically configured to:
and inputting the current context vector and the resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node.
Optionally, the preset task placement model is obtained by training in the following manner:
acquiring an initial task embedding vector corresponding to each task node in a task graph, an initial resource embedding vector corresponding to each slot node of a processing unit in a resource graph, and a randomly generated task placement array; the length of the task placement array represents the number of tasks in the task graph, and the value in the array is the slot number;
respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector;
sequencing each task node by adopting a topological sequencing method;
for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a sample task embedding vector corresponding to the current task node and sample resource embedding vectors corresponding to each upstream node of the current task node;
determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node;
calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node;
determining a target reward value according to the calculation results of the throughput and the delay respectively; wherein the target reward value is a linear combination of a throughput reward value and a delay reward value;
and when the target reward value reaches convergence, obtaining the preset task placement model, wherein the preset task placement model enables each task node in the task graph to be associated with the slot node in the resource graph.
Optionally, the throughput is calculated as follows:
when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to a corresponding task;
calculating the current throughput of each task node according to the sequence of topological sorting;
traversing each task node according to the sequence of reverse topological sorting, and recovering the computing power corresponding to the target slot node corresponding to each task node based on a back pressure mechanism;
and redistributing the calculated force corresponding to each target slot node to the corresponding task node until the throughput of the source node reaches convergence, and taking the throughput corresponding to the source node as the throughput in the operation process.
Optionally, the delay is calculated in the following manner: the average of the delays on all paths from the source node to the sink node in the task node.
The task placement device provided by the embodiment of the invention can execute the task placement method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method. Technical details that are not described in detail in the above embodiments may be referred to a task placement method provided in any embodiment of the present invention.
Example four
Referring to fig. 4, fig. 4 is a schematic structural diagram of a computing device according to a fourth embodiment of the present invention. As shown in fig. 4, the computing device may include:
a memory 701 in which executable program code is stored;
a processor 702 coupled to the memory 701;
wherein, the processor 702 calls the executable program code stored in the memory 701 to execute the task placement method provided by any embodiment of the present invention.
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute a task placement method provided by any embodiment of the invention.
In various embodiments of the present invention, it should be understood that the sequence numbers of the above-mentioned processes do not imply an inevitable order of execution, and the execution order of the processes should be determined by their functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present invention.
In the embodiments provided herein, it should be understood that "B corresponding to A" means that B is associated with A from which B can be determined. It should also be understood, however, that determining B from a does not mean determining B from a alone, but may also be determined from a and/or other information.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated units, if implemented as software functional units and sold or used as a stand-alone product, may be stored in a computer accessible memory. Based on such understanding, the technical solution of the present invention, which is a part of or contributes to the prior art in essence, or all or part of the technical solution, can be embodied in the form of a software product, which is stored in a memory and includes several requests for causing a computer device (which may be a personal computer, a server, a network device, or the like, and may specifically be a processor in the computer device) to execute part or all of the steps of the above-described method of each embodiment of the present invention.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
Those of ordinary skill in the art will understand that: the figures are merely schematic representations of one embodiment, and the blocks or flow diagrams in the figures are not necessarily required to practice the present invention.
Those of ordinary skill in the art will understand that: modules in the devices in the embodiments may be distributed in the devices in the embodiments according to the description of the embodiments, or may be located in one or more devices different from the embodiments with corresponding changes. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A task placement method is applied to a stream processing task, and is characterized by comprising the following steps:
generating a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
determining task embedding vectors corresponding to all task nodes in the task graph based on a graph neural network in a preset task placement model, and determining resource embedding vectors corresponding to all slot nodes of a processing unit in a resource graph; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
based on a cyclic neural network in a preset task placement model, determining slot nodes corresponding to each task node according to task embedding vectors and resource embedding vectors so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
2. The method of claim 1, wherein determining a slot node corresponding to each task node according to a task embedding vector and a resource embedding vector based on a recurrent neural network in a preset task placement model comprises:
sequencing each task node by adopting a topological sequencing method;
for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a task embedding vector corresponding to the current task node and resource embedding vectors corresponding to upstream nodes of the current task node respectively;
and determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node.
3. The method of claim 2, wherein calculating the current context vector corresponding to the current task node according to the task embedding vector corresponding to the current task node and the resource embedding vectors corresponding to the upstream nodes of the current task node respectively comprises:
integrating the resource embedding vectors respectively corresponding to each upstream node of the current task node, so that all the upstream nodes correspond to one integrated resource embedding vector;
adding the integrated resource embedding vector and the task embedding vector of the current task child node;
inputting the result of the addition operation into an attention unit of the recurrent neural network to obtain a current context vector corresponding to the current task node;
wherein the integrating operation comprises: and taking the maximum value of each dimension in the resource embedding vector corresponding to each upstream node, or carrying out averaging processing on elements of each dimension.
4. The method of claim 2, wherein determining probability values assigned to the slot nodes by the current task node according to the current context vector comprises:
and inputting the current context vector and the resource encoding vectors corresponding to all slot nodes into a softmax unit of the recurrent neural network model to obtain the probability value of the current task node distributed to each slot node.
5. The method of claim 1, wherein the preset task placement model is trained by:
acquiring an initial task embedding vector corresponding to each task node in a task graph, an initial resource embedding vector corresponding to each slot node of a processing unit in a resource graph, and a randomly generated task placement array; the length of the task placement array represents the number of tasks in the task graph, and the value in the array is the slot number;
respectively carrying out iterative updating on the initial task embedding vector and the initial resource embedding vector based on a GraphSAGE algorithm to obtain a sample task embedding vector and a sample resource embedding vector;
sequencing each task node by adopting a topological sequencing method;
for any one sequenced current task node, calculating a current context vector corresponding to the current task node according to a sample task embedding vector corresponding to the current task node and sample resource embedding vectors corresponding to each upstream node of the current task node;
determining the probability value of the current task node distributed to each slot node according to the current context vector, and taking the slot node with the maximum probability value as a target slot node corresponding to the current task node;
calculating the throughput and delay in the running process of the flow processing operation based on the mapping relation between each task node and the corresponding target slot node;
determining a target reward value according to the calculation results of the throughput and the delay; wherein the target reward value is a linear combination of a throughput reward value and a delay reward value;
and when the target reward value reaches convergence, obtaining the preset task placement model, wherein the preset task placement model enables each task node in the task graph to be associated with the slot node in the resource graph.
6. The method of claim 5, wherein the throughput is calculated by:
when a source node in the task nodes sends a tuple according to the maximum sending speed, distributing the residual CPU computing power of each slot node to the corresponding task node;
calculating the current throughput of each task node according to the sequence of topological sorting;
traversing each task node according to the sequence of reverse topological sorting, and recovering the computing power corresponding to the target slot node corresponding to each task node based on a back pressure mechanism;
and redistributing the calculated force corresponding to each target slot node to the corresponding task node until the throughput of the source node reaches convergence, and taking the throughput corresponding to the source node as the throughput in the operation process.
7. The method of claim 6, wherein the delay is calculated by: the average of the delays on all paths from the source node to the sink node in the task node.
8. A task placement device applied to a stream processing task, comprising:
the task graph generating module is configured to generate a task graph corresponding to the stream processing job according to the parallelism of each operator in the stream processing job and the connection mode among the operators;
the encoding module is configured to determine task embedding vectors corresponding to task nodes in the task graph and determine resource embedding vectors corresponding to slot nodes of processing units in a resource graph based on a graph neural network in a preset task placement model; the resource graph is a fully-connected undirected graph, and each slot node in the resource graph has a CPU computing power attribute and an available memory attribute;
the decoding module is configured to determine a slot node corresponding to each task node according to the task embedding vector and the resource embedding vector based on a cyclic neural network in a preset task placement model so as to determine a deployment mode of the flow processing operation;
the preset task placement model enables each task node in the task graph to be associated with a slot in the resource graph, and enables the throughput attribute and the delay attribute in the running process of the stream processing job to meet preset requirements.
9. A computing device, comprising:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to perform the placement method of the task according to any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out a placement method for a task according to any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110714071.8A CN113391907A (en) | 2021-06-25 | 2021-06-25 | Task placement method, device, equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110714071.8A CN113391907A (en) | 2021-06-25 | 2021-06-25 | Task placement method, device, equipment and medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113391907A true CN113391907A (en) | 2021-09-14 |
Family
ID=77624004
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110714071.8A Pending CN113391907A (en) | 2021-06-25 | 2021-06-25 | Task placement method, device, equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113391907A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023178766A1 (en) * | 2022-03-25 | 2023-09-28 | 北京邮电大学 | Task evaluation method and apparatus based on dynamic expansion of flink engine computing node |
CN116841649A (en) * | 2023-08-28 | 2023-10-03 | 杭州玳数科技有限公司 | Method and device for hot restarting based on flink on horn |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170017521A1 (en) * | 2015-07-13 | 2017-01-19 | Palo Alto Research Center Incorporated | Dynamically adaptive, resource aware system and method for scheduling |
US20190236444A1 (en) * | 2018-01-30 | 2019-08-01 | International Business Machines Corporation | Functional synthesis of networks of neurosynaptic cores on neuromorphic substrates |
US20200136920A1 (en) * | 2019-12-20 | 2020-04-30 | Kshitij Arun Doshi | End-to-end quality of service in edge computing environments |
CN111126668A (en) * | 2019-11-28 | 2020-05-08 | 中国人民解放军国防科技大学 | Spark operation time prediction method and device based on graph convolution network |
CN111309915A (en) * | 2020-03-03 | 2020-06-19 | 爱驰汽车有限公司 | Method, system, device and storage medium for training natural language of joint learning |
CN111444009A (en) * | 2019-11-15 | 2020-07-24 | 北京邮电大学 | Resource allocation method and device based on deep reinforcement learning |
US20200257968A1 (en) * | 2019-02-08 | 2020-08-13 | Adobe Inc. | Self-learning scheduler for application orchestration on shared compute cluster |
US20210117624A1 (en) * | 2019-10-18 | 2021-04-22 | Facebook, Inc. | Semantic Representations Using Structural Ontology for Assistant Systems |
CN112753016A (en) * | 2018-09-30 | 2021-05-04 | 华为技术有限公司 | Management method and device for computing resources in data preprocessing stage in neural network |
-
2021
- 2021-06-25 CN CN202110714071.8A patent/CN113391907A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170017521A1 (en) * | 2015-07-13 | 2017-01-19 | Palo Alto Research Center Incorporated | Dynamically adaptive, resource aware system and method for scheduling |
US20190236444A1 (en) * | 2018-01-30 | 2019-08-01 | International Business Machines Corporation | Functional synthesis of networks of neurosynaptic cores on neuromorphic substrates |
CN112753016A (en) * | 2018-09-30 | 2021-05-04 | 华为技术有限公司 | Management method and device for computing resources in data preprocessing stage in neural network |
US20200257968A1 (en) * | 2019-02-08 | 2020-08-13 | Adobe Inc. | Self-learning scheduler for application orchestration on shared compute cluster |
US20210117624A1 (en) * | 2019-10-18 | 2021-04-22 | Facebook, Inc. | Semantic Representations Using Structural Ontology for Assistant Systems |
CN111444009A (en) * | 2019-11-15 | 2020-07-24 | 北京邮电大学 | Resource allocation method and device based on deep reinforcement learning |
CN111126668A (en) * | 2019-11-28 | 2020-05-08 | 中国人民解放军国防科技大学 | Spark operation time prediction method and device based on graph convolution network |
US20200136920A1 (en) * | 2019-12-20 | 2020-04-30 | Kshitij Arun Doshi | End-to-end quality of service in edge computing environments |
CN111309915A (en) * | 2020-03-03 | 2020-06-19 | 爱驰汽车有限公司 | Method, system, device and storage medium for training natural language of joint learning |
Non-Patent Citations (1)
Title |
---|
卢海峰;顾春华;罗飞;丁炜超;杨婷;郑帅;: "基于深度强化学习的移动边缘计算任务卸载研究", 计算机研究与发展, no. 07 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023178766A1 (en) * | 2022-03-25 | 2023-09-28 | 北京邮电大学 | Task evaluation method and apparatus based on dynamic expansion of flink engine computing node |
CN116841649A (en) * | 2023-08-28 | 2023-10-03 | 杭州玳数科技有限公司 | Method and device for hot restarting based on flink on horn |
CN116841649B (en) * | 2023-08-28 | 2023-12-08 | 杭州玳数科技有限公司 | Method and device for hot restarting based on flink on horn |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bolukbasi et al. | Adaptive neural networks for efficient inference | |
Piscitelli et al. | Design space pruning through hybrid analysis in system-level design space exploration | |
US11681914B2 (en) | Determining multivariate time series data dependencies | |
CN111406264A (en) | Neural architecture search | |
CN113391907A (en) | Task placement method, device, equipment and medium | |
Vakilinia et al. | Analysis and optimization of big-data stream processing | |
Chen et al. | $ d $ d-Simplexed: Adaptive Delaunay Triangulation for Performance Modeling and Prediction on Big Data Analytics | |
Ni et al. | Generalizable resource allocation in stream processing via deep reinforcement learning | |
Garbi et al. | Learning queuing networks by recurrent neural networks | |
Cheng et al. | Tuning configuration of apache spark on public clouds by combining multi-objective optimization and performance prediction model | |
Geyer et al. | Graph-based deep learning for fast and tight network calculus analyses | |
Geyer et al. | Tightening network calculus delay bounds by predicting flow prolongations in the FIFO analysis | |
Hou et al. | A machine learning enabled long-term performance evaluation framework for NoCs | |
Sinclair et al. | Adaptive discretization in online reinforcement learning | |
Guan et al. | Quantifying the impact of uncertainty in embedded systems mapping for NoC based architectures | |
Daradkeh et al. | Analytical modeling and prediction of cloud workload | |
Tuli et al. | SimTune: bridging the simulator reality gap for resource management in edge-cloud computing | |
CN106874215B (en) | Serialized storage optimization method based on Spark operator | |
Johnston et al. | Performance tuning of MapReduce jobs using surrogate-based modeling | |
Park et al. | Gemma: reinforcement learning-based graph embedding and mapping for virtual network applications | |
Sirocchi et al. | Topological network features determine convergence rate of distributed average algorithms | |
Chen et al. | Dynamically predicting the quality of service: batch, online, and hybrid algorithms | |
Sinclair et al. | Adaptive discretization in online reinforcement learning | |
Tribastone | Efficient optimization of software performance models via parameter-space pruning | |
Grohmann | Reliable Resource Demand Estimation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |