CN116860419B

CN116860419B - Parallel scheduling method and system for server non-perception data analysis

Info

Publication number: CN116860419B
Application number: CN202311126413.XA
Authority: CN
Inventors: 金鑫; 刘譞哲; 金超
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2023-09-04
Filing date: 2023-09-04
Publication date: 2023-11-24
Anticipated expiration: 2043-09-04
Also published as: CN116860419A

Abstract

The invention provides a parallel scheduling method and a system for server unaware data analysis, wherein the method comprises the following steps: determining a target model corresponding to the optimization target according to the optimization target; under the condition that the target model is a running time model, running time data information and parallel data information in the data analysis job are obtained, and the relation between the running time and the parallelism of each stage of the data analysis job is determined; fitting to obtain respective running time models of all phases according to the relation between the running time and the parallelism of all phases; determining a first parallel scheduling scheme according to the DAG of the data analysis job, the respective running time model of each stage and the current available resources of the computing cluster; according to the first parallel scheduling scheme, the function execution modules on the servers in the computing cluster are controlled to execute tasks in the data analysis job distributed to the servers where the function execution modules are located. The parallel scheduling performance of the server unaware data analysis is improved.

Description

Parallel scheduling method and system for server non-perception data analysis

Technical Field

The invention relates to the technical field of data analysis job scheduling, in particular to a parallel scheduling method and system for server unaware data analysis.

Background

Data analysis is widely found in web services and applications, where data analysis operations are an important load on data centers. The execution of the data analysis job is divided into a plurality of phases, each phase is executed by a plurality of tasks in parallel, and data dependence can exist among different phases. In general, different phases of a data analysis job have different resource requirements, and conventional server-centric modes require a user to pre-configure a certain number of servers to execute the job, resulting in resource waste or reduced execution efficiency. The server non-perception computing can automatically deploy user codes and expand and contract the volume of resources and charge according to the user demands, so that the data analysis operation is widely transplanted to the server non-perception computing platform to reduce the development difficulty and the running cost. The user submits the data analysis job to the server unaware computing platform, and the job scheduler allocates resources for each task in the job and performs the task in the form of a function.

Currently, a data analysis job scheduler for server-less computation (server-less analysis scheduler for short) makes a certain progress in terms of improving job execution efficiency, reducing job running cost, and the like, but most server-less analysis schedulers set a fixed number of tasks for each stage (i.e., adopting a fixed parallelism, also called degree of parallelism, doP). The server non-perception analysis scheduler of the main stream determines the parallelism of the stage according to the size of the input data quantity of the stage, and sets larger parallelism for the stage inputting more data. The method for adjusting the parallelism configuration based on the data volume improves the execution efficiency, reduces the operation cost, but has two limitations in the mode that the server does not have perception calculation.

On the one hand, the parallelism configuration method only considering the data quantity has no elasticity, and the stage parallelism cannot be flexibly adjusted to adapt to the condition that the resource environment elasticity is variable in the server non-perception computing mode. The flexible resource environment refers to that at different moments, the idle resources of the server unaware computing platform are variable, and available Central Processing Units (CPUs) and memory resources are distributed on different servers.

On the other hand, the parallelism configuration method considering only the data amount cannot accurately adapt to the optimization target of the server unaware computing and analyzing scheduler. In the server unaware computing mode, the user only needs to pay for the actual resources occupied during execution of the job, so the user is concerned with the completion time (job completion time, JCT) and running cost (cost) of the job submitted by himself. When the JCT is optimized, the parallelism configuration method only considers the data quantity, ignores the data dependence among stages, and cannot reach the optimal performance; when optimizing cost, because the user pays for the CPU and the memory occupied by the job, the data volume can not accurately reflect the actual occupation of the job on two resources, and the minimum running cost can not be achieved by only considering the parallelism configuration method of the data volume.

Disclosure of Invention

In view of the above, the present invention provides a parallel scheduling method for server unaware data analysis. The parallel scheduling performance of the server unaware data analysis is improved.

In a first aspect of an embodiment of the present invention, there is provided a parallel scheduling method for server unaware data analysis, applied to a data analysis job scheduler, the method including:

determining a target model corresponding to an optimization target according to the optimization target;

acquiring running time data information and parallelism data information in a data analysis job under the condition that the target model is a running time model, and determining the relation between the running time and parallelism of each stage of the data analysis job;

fitting to obtain respective running time models of the stages according to the relation between the running time and the parallelism of the stages;

determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running time model of each stage and the current available resources of the computing cluster;

and controlling a function execution module on each server in the computing cluster to execute tasks in the data analysis job distributed to the server where the function execution module is located according to the first parallel scheduling scheme.

Optionally, the determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective runtime model of each stage, and the current available resources of the computing cluster includes:

step S4011: obtaining a graph by weighting the DAG of the data analysis jobThe method comprises the steps of carrying out a first treatment on the surface of the Wherein each node in the graph G is +.>Representing a phase of the data analysis operation, each side of the graph G is +.>Representation phaseReading stage->The generated data, the weight value of the node represents the calculation time of the phase corresponding to the node, the edge weight is the data transmission time between two phases, including the phase +.>Write data and phase->Is a read data of (a);

step S4012: sequencing all edges in the graph G according to the total weight size sequence of the paths and the weight size sequence of the edges in the graph G to obtain sequencing results;

step S4013: determining whether two stages corresponding to the top-ranked edge in the ranking result can be placed together on available resources of the computing cluster;

step S4014: setting the edge weight of the top-ranked edge to 0 to update the graph G, in the case that two phases corresponding to the top-ranked edge can be co-placed on the available resources of the computing cluster; deleting the first side from the sorting result to update the sorting result when two stages corresponding to the first side cannot be placed together on the available resources of the computing cluster, and returning to step S4013; determining a first parallel scheduling scheme for each task in each stage of the data analysis job according to the graph G under the condition that no available resources which can be placed on the computing cluster together exist in the sequencing result, and ending the execution of the steps;

Step S4015: determining the optimal parallelism ratio between every two phases according to the relation between each phase in the graph G and the running time model of each phase;

step S4016: determining whether a sibling stage and a parent-child stage exist in the maximum depth in the graph G;

step S4017: if a sibling stage exists in the maximum depth of the graph G, merging the sibling stages to update the graph G, determining a first fitting parameter in a runtime model of the merged stage based on the optimal parallelism ratio of the sibling stages and a first algorithm, and returning to step S4016; merging the two phases corresponding to the parent-child phases under the condition that no brother phase exists in the maximum depth of the graph G and the parent-child phases exist in the maximum depth of the graph G, so as to update the graph G, determining a first fitting parameter in a runtime model of the phase obtained by merging based on the optimal parallelism ratio of the parent-child phases and a second algorithm, and returning to step S4016; obtaining the graph G including a single phase obtained by merging, in the case where there is no sibling phase and parent-child phase of the maximum depth in the graph G, performing step S4018;

Step S4018: determining total parallelism according to the current available resources of the computing cluster, and determining the number of parallelism of each stage in the graph G when the merging process is not performed according to the total parallelism and a first fitting parameter in a running time model of each stage obtained by merging;

step S4019: and determining the latest weight of each stage and each side in the graph G when the combination processing is not performed according to the parallelism of each stage in the graph G when the combination processing is not performed, so as to update the weight in the graph G when the combination processing is not performed, and returning the graph G when the combination processing is not performed after the weight update to step S4012 for execution.

Optionally, the step S4015 includes:

determining a relationship between each two phases according to the graph G;

under the condition that the relation between two phases is a brother phase, determining the optimal parallelism proportion of the two phases according to the running time models of the two phases respectively through a first proportion relation;

and under the condition that the relation between the two phases is a parent-child phase, determining the optimal parallelism proportion of the two phases according to the respective running time models of the two phases through a second proportion relation.

Optionally, acquiring the runtime data information and the parallelism data information in the data analysis job, determining a relationship between the runtime and parallelism of each stage of the data analysis job, includes:

when the data information of the data analysis job comprises historical operation information, determining the relation between the operation time and the parallelism of each stage of the data analysis job by analyzing the historical operation information;

and under the condition that the data information of the data analysis job does not comprise historical operation information, carrying out multiple operations on the data analysis job through a plurality of preset different parallelism configurations so as to obtain the relation between the operation time and the parallelism of each stage of the data analysis job.

Optionally, the method further comprises: and under the condition that the data analysis job completes the new operation, updating the respective operation time model of each stage of the data analysis job according to the operation data of the new operation.

Optionally, the method further comprises:

receiving an execution result of the task executed by the function execution module;

and controlling the function execution module to terminate the task or re-execute the task under the condition that the execution result indicates that the execution abnormality exists.

Optionally, the method further comprises:

acquiring running time data information, parallelism data information, data processing amount data information and memory occupation information in a data analysis operation under the condition that the target model is a memory occupation model, and determining the relation between the running cost and parallelism of each stage of the data analysis operation;

fitting to obtain respective operation cost models of the stages according to the relation between the operation cost and the parallelism of the stages;

determining a second parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running cost model of each stage and the current available resources of the computing cluster;

and controlling a function execution module on each server in the computing cluster to execute tasks in the data analysis job distributed to the server where the function execution module is located according to the second parallel scheduling scheme.

Optionally, the determining a second parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the running cost model of each stage, and the current available resources of the computing cluster includes:

Step S4021: obtaining a graph by weighting the DAG of the data analysis jobThe method comprises the steps of carrying out a first treatment on the surface of the Wherein each node in the graph G is +.>Representing a phase of the data analysis operation, each side of the graph G is +.>Representation phaseReading stage->The generated data, the weight of the node represents the calculation cost of the corresponding stage of the node, the edge weight is the cost of data transmission between two stages, including the stage +.>Write data and phase->Is a read data of (a);

step S4022: sequencing all edges in the graph G according to the weight sequence of the edges in the graph G to obtain sequencing results;

step S4023: determining whether two stages corresponding to the top-ranked edge in the ranking result can be placed together on available resources of the computing cluster;

step S4024: setting the edge weight of the top-ranked edge to 0 to update the graph G, in the case that two phases corresponding to the top-ranked edge can be co-placed on the available resources of the computing cluster; deleting the first side from the sorting result to update the sorting result when two stages corresponding to the first side cannot be placed together on the available resources of the computing cluster, and returning to step S4023; determining a second parallel scheduling scheme for each task in each stage of the data analysis job according to the graph G under the condition that no available resources which can be placed on the computing cluster together exist in the sequencing result, and ending the execution of the steps;

Step S4025: determining the optimal parallelism ratio between every two stages according to the respective running cost models of the stages in the graph G;

step S4026: determining total parallelism according to the current available resources of the computing cluster, and determining the number of parallelism of each stage in the graph G according to the total parallelism and the optimal parallelism proportion between every two stages;

step S4027: and determining the latest weight of each stage and each side in the graph G according to the parallelism quantity of each stage in the graph G so as to update the weight in the graph G, and returning the graph G after the weight update to the step S4022 for execution.

A second aspect of the present invention provides a parallel scheduling system for server unaware data analysis, the system comprising:

the target model determining module is used for determining a target model corresponding to the optimization target according to the optimization target;

the target relation determining module is used for acquiring the running time data information and the parallelism data information in the data analysis job under the condition that the target model is a running time model and determining the relation between the running time and the parallelism of each stage of the data analysis job;

The running time model building module is used for obtaining the running time model of each stage by fitting according to the relation between the running time and the parallelism of each stage;

the first parallel scheduling scheme determining module is used for determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the running time model of each stage and the current available resources of the computing cluster;

and the task execution module is used for controlling the function execution module on each server in the computing cluster to execute the tasks distributed to the data analysis job on the server of the computing cluster according to the first parallel scheduling scheme.

Aiming at the prior art, the invention has the following advantages:

according to the parallel scheduling method for server non-perception data analysis, firstly, a target model corresponding to an optimization target is determined according to the optimization target; acquiring running time data information and parallelism data information in a data analysis job under the condition that the target model is a running time model, and determining the relation between the running time and parallelism of each stage of the data analysis job; fitting to obtain respective running time models of the stages according to the relation between the running time and the parallelism of the stages; determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running time model of each stage and the current available resources of the computing cluster; and controlling a function execution module on each server in the computing cluster to execute tasks in the data analysis job distributed to the server where the function execution module is located according to the first parallel scheduling scheme. Therefore, the parallel scheduling performance of the server without perceived data analysis can be effectively improved.

The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

FIG. 1 is a flow chart of a parallel scheduling method for server unaware data analysis provided by an embodiment of the invention;

fig. 2 is a schematic diagram of weighting DAGs of data analysis jobs in a parallel scheduling method for server non-aware data analysis according to an embodiment of the present invention;

fig. 3 is a schematic diagram of merging stages in a graph G of a data analysis job in a parallel scheduling method for server non-aware data analysis according to an embodiment of the present invention;

fig. 4 is another schematic diagram of merging stages in a graph G of a data analysis job in a parallel scheduling method for server non-aware data analysis according to an embodiment of the present invention;

Fig. 5 is a schematic diagram of determining parallelism of each stage of a data analysis job in a parallel scheduling method for server non-aware data analysis according to an embodiment of the present invention;

fig. 6 is another schematic diagram of determining parallelism of each stage of a data analysis job in a parallel scheduling method for server non-aware data analysis according to an embodiment of the present invention;

fig. 7 is a schematic diagram of a parallel scheduling system with server unaware data analysis according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a parallel scheduling method for server unaware data analysis, which is provided in an embodiment of the present invention, as shown in fig. 1, where the method is applied to a data analysis job scheduler, and the method includes:

step S101: determining a target model corresponding to an optimization target according to the optimization target;

step S102: acquiring running time data information and parallelism data information in a data analysis job under the condition that the target model is a running time model, and determining the relation between the running time and parallelism of each stage of the data analysis job;

Step S103: fitting to obtain respective running time models of the stages according to the relation between the running time and the parallelism of the stages;

step S104: determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running time model of each stage and the current available resources of the computing cluster;

step S105: and controlling a function execution module on each server in the computing cluster to execute tasks in the data analysis job distributed to the server where the function execution module is located according to the first parallel scheduling scheme.

In the embodiment of the invention, the execution of the data analysis job comprises a plurality of stages, each stage is executed by a plurality of tasks in parallel, and the tasks executed in parallel in the same stage are identical, namely one stage is divided into a plurality of tasks to be executed in parallel, the task content of the plurality of tasks to be executed is identical, and the related resources and time used for executing the plurality of tasks are identical. The parallel scheduling method without perceived data analysis of the server is applied to a data analysis job scheduler, and a user submits data analysis jobs to the data analysis job scheduler. The user designates an optimization target of the data analysis job, and a corresponding target model is determined according to the optimization target designated by the user. In the case where the user-specified optimization target is the completion time (job completion time, JCT), that is, the completion time of the data analysis job submitted by the user focusing on himself, it is desirable to reduce the completion time of the data analysis job as much as possible to improve the completion efficiency of the data analysis job. In the case where the user-specified optimization objective is a completion time (job completion time, JCT), the objective model corresponding to the completion time optimization objective is determined to be a runtime model. The completion time optimization targets and the running time models are in one-to-one correspondence, and when the user-specified optimization targets are the completion time, the target model corresponding to the completion time optimization targets is the running time model.

After the target model is determined to be the running time model, the running time data information and the parallel data information in the data analysis job submitted by the user are acquired at the moment, the two data information are analyzed, and the relation between the running time and the parallelism of each stage in the data analysis job is determined. And for any one of all phases in the data analysis operation, fitting a plurality of relations between the determined running time belonging to the any one phase and the parallelism into a function curve, wherein the function curve is a running time model of the any one phase. Wherein the expression of the runtime model is. Wherein (1)>Run time for the corresponding phase; />Parallelism of the corresponding stages; />And->The first fitting parameter and the second fitting parameter of the corresponding stage are parameters needed to be fitted respectively, and represent the part and the part which decrease with the increase of the parallelism in the running time respectivelyThe inherently unchanged portion, such as the task processing time in the corresponding stage, belongs to the portion decreasing with increasing parallelism because the number of tasks into which the corresponding stage is to be divided increases, and the amount of data to be executed by one task in the corresponding stage decreases, so the processing time of the task, such as the task start time and the task initialization time in the corresponding stage, belongs to the portion inherently unchanged with increasing parallelism.

Illustratively, the data analysis job submitted by the user comprises a stage 1, a stage 2 and a stage 3, and by analyzing the obtained running time data information and parallelism information in the data analysis job, the relation between the running time and the parallelism which belong to the stage 1 is determined to comprise that the running time t11 corresponds to the parallelism d11, the running time t12 corresponds to the parallelism d12, the running time t13 corresponds to the parallelism d13 and … …, and the running time t1m corresponds to the parallelism d1m; the relationship between the run time and the parallelism for the phase 2 includes that run time t21 corresponds to parallelism d21, run time t22 corresponds to parallelism d22, run time t23 corresponds to parallelism d23, … …, and run time t2n corresponds to parallelism d2n; the relationship between the run time and the parallelism for belonging to phase 3 includes that run time t31 corresponds to parallelism d31, run time t32 corresponds to parallelism d32, run time t33 corresponds to parallelism d33, … …, and run time t3p corresponds to parallelism d3p. By fitting the relationships between the run times belonging to phase 1 and the parallelism, such as the run time t11 corresponding to the parallelism d11, the run time t12 corresponding to the parallelism d12, the run time t13 corresponding to the parallelism d13, … …, and the run time t1m corresponding to the parallelism d1m, a function curve of phase 1 is obtained, i.e. a run time model belonging to phase 1 The method comprises the steps of carrying out a first treatment on the surface of the By fitting the relationships between the run times belonging to phase 2, run time a21 corresponding to parallelism b21, run time a22 corresponding to parallelism b22, run time a23 corresponding to parallelism b23, … …, run time a2n corresponding to parallelism b2n, one of phase 2 will be obtainedA function curve, i.e. a runtime model belonging to this phase 2 +.>The method comprises the steps of carrying out a first treatment on the surface of the By fitting the relationships between the running times belonging to phase 3 and the parallelism, such as the running time t31 corresponding to the parallelism d31, the running time t32 corresponding to the parallelism d32, the running time t33 corresponding to the parallelism d33, … …, and the running time t3p corresponding to the parallelism d3p, a function curve of phase 3, i.e. the running time model>。

In an embodiment of the present invention, after obtaining the runtime model of each stage of the data analysis job submitted by the user, a first parallel scheduling scheme for each task in each stage of the data analysis job is determined by the data analysis job scheduler according to the DAG (Directed Acyclic Graph directed acyclic graph) of the data analysis job submitted by the user and the runtime model of each stage of the data analysis job and the current available resources of the computing cluster. After the data analysis job scheduler determines the first parallel scheduling scheme, the first parallel scheduling scheme is sent to each server in the computing cluster. Each server analyzes the received first parallel scheduling scheme to determine tasks in the data analysis job which are distributed to the server to be executed. And executing the tasks in the data analysis job distributed to the self server through the function execution model of the self server according to the determined tasks in the data analysis job distributed to the self server for execution.

In the invention, the determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running time model of each stage and the current available resources of the computing cluster comprises the following steps:

In an embodiment of the present invention, one implementation of step S104 is:

step S4011: after determining the respective run time models of the respective phases of the data analysis job submitted to the user through step S103, the calculation time to the respective phases in the data analysis job and the data transfer time between the phases in the data analysis job are determined according to the respective run time models of the respective phases of the data analysis job. Weighting the DAG of the data analysis operation according to the determined calculation time of each stage and the data transmission time between stages to obtain a graph . Wherein each node in the obtained graph G is +.>Representing one stage of the data analysis operation, each side +.>Representation phase->Reading stage->The generated data, the weight of the node represents the calculation time of the phase corresponding to the node, the edge weight of the edge is the data transmission time between the two phases corresponding to the edge, and the node comprises the phases->Write data and phase->Is a read of the data. As shown in FIG. 2, FIG. 2 shows that the original DAG of the data analysis job is entitled to obtain a graph +.>In fig. 2, the weight of phase 1 is 3, indicating that the calculation time of phase 1 is 3, the edge weight of the edge formed by phase 1 and phase 10 is 2, indicating that the data transmission time between phase 1 and phase 10 is 2; in fig. 2, the weight of phase 3 is 2, which indicates that the calculation time of phase 3 is 2, the edge weight of the edge formed by phase 3 and phase 9 is 5, and the data transmission time between phase 3 and phase 9 is 5.

After executing step S4011, obtaining a graph G corresponding to the data analysis job submitted by the user, executing step S4012: after the weighted graph G is obtained, all edges in the graph G are ordered. First, a plurality of paths are formed starting from a stage of maximum depth and proceeding to a stage of minimum depth. Taking fig. 2 as an example, starting with the stage of maximum depth, one path consisting of stage 1, stage 10, stage 11, stage 13 will be obtained up to the stage of minimum depth, one path consisting of stage 2, stage 10, stage 11, stage 13 will be obtained, one path consisting of stage 3, stage 9, stage 11, stage 13 will be obtained, one path consisting of stage 4, stage 9, stage 11, stage 13 will be obtained, one path consisting of stage 5, stage 8, stage 12, stage 13 will be obtained, one path consisting of stage 6, stage 7, stage 12, stage 13 will be obtained, whereby 6 paths will be obtained. For each of the obtained paths, calculating a total weight of the path, wherein the total weight comprises the sum of the weight of the stage on the path and the edge weight of the edge. After the respective total weight of each path is obtained, the paths are ranked, and the greater the total weight is, the more front the ranking is. Firstly, sequencing edges in a path with the maximum total weight, wherein the sequencing order is that the edges with the larger edge weight value are sequenced more forward; and then ordering the edges in the path with the second largest total weight, wherein the ordering order is that the edges with larger edge weight are ordered more forward until all edges in the graph G participate in ordering, so that an ordering result is obtained.

In an embodiment of the present invention, since the ordering is to sequentially take edges from the ordering result, it is determined whether two phases corresponding to the edges can be co-placed on the available resources of the computing cluster. Thus, when it is determined that two phases corresponding to one edge cannot be placed on the available resources of the computing cluster, the one edge cannot be placed together on the available resources of the computing cluster, no matter how many times the one edge appears in the ordering result. Therefore, in order to improve the scheduling efficiency, for one edge (such as the edge formed by the stage 11 and the stage 13, the edge formed by the stage 12 and the stage 13, the edge formed by the stage 10 and the stage 11, and the like in fig. 2) in the multiple paths in the ordering process, the ordering of the one edge in the path with the top ordering in the multiple paths is taken as the final ordering of the one edge, and the one edge will not participate in the ordering in all paths with the back ordering in the multiple paths. For example, taking fig. 2 as an example, regarding the edges formed by the phases 10 and 11, the edges formed by the phases 1, 10, 11 and 13 will appear in the paths formed by the phases 2, 10, 11 and 13, and the edges formed by the phases 10, 11 and 13 will not participate in the sorting of the edges formed by the phases 10 and 11 when the edges formed by the phases 1, 10, 11 and 13 are sorted, because the total weight of the paths formed by the phases 1, 10, 11 and 13 is 25, and the sorting of the edges formed by the phases 10 and 11 takes the sorting of the paths formed by the phases 2, 10, 11 and 13 as the final sorting result. It should be appreciated that each edge in the graph G of the data analysis job will appear once in the ordering result, since one edge only participates in the ordering of the edge in the top-ordered one of the paths when that edge appears in the paths simultaneously.

In the embodiment of the present invention, this embodiment of step S104 of the present invention is a process of loop-optimized scheduling scheme, step S4012 is performed in each loop round, and step S4012 only ranks the weighted graphs G corresponding to the data analysis job submitted by the user when being performed for the first time. In the subsequent round, the ranking graph G used in step S4012 is the graph G obtained by updating the weight in step S4019 in the previous round.

After executing step S4012, a corresponding sorting result is obtained, step S4013 is executed: it is determined whether two phases corresponding to a top-ranked edge of the ranking results of the current round of rotation can be co-placed on the available resources of the computing cluster. The decision condition for determining whether two phases can be co-placed on the available resources of the computing cluster is: whether there is available computing resources of one server among the available computing resources of the computing cluster may satisfy the total computing resources required for the two phases. If the total computing resource required for the two phases is 100, and the server capable of providing the largest computing resource among the available resources of the computing cluster can only provide 90 computing resources at most, it can be determined that the two phases cannot be co-located on the available resources of the computing cluster, that is, the two phases cannot be co-located on the same server.

After obtaining the result of whether the two phases corresponding to the top-ranked one of the ranked results of the current round of rotation can be co-placed on the available resources of the computing cluster, step S4014 is executed: when the two phases corresponding to the first-ordered edge represented by the result can be placed together on the available resources of the computing cluster, the edge weight of the first-ordered edge is set to 0, and at the moment, the graph G of the data analysis job submitted by the user is updated, wherein the updating refers to that the edge weight of the first-ordered edge in the graph G is set to 0. Then, the subsequent step S4015 of the current round is continued to be executed based on the graph G updated at step S4014 of the current round.

And when the result indicates that two stages corresponding to the first side of the sequence cannot be placed together on the available resources of the computing cluster, deleting the first side of the sequence from the sequence result, wherein the next side of the first side of the sequence in the sequence result of the current cycle becomes the first side of the current sequence, returning to the step S4013 by using the new sequence result, and continuously determining whether the two stages corresponding to the first side of the current sequence in the new sequence result can be placed together on the available resources of the computing cluster. If the two phases cannot be co-placed, continuing to delete the side with the highest current ranking in the new ranking result to further update the new ranking result, then continuing to return to step S4013 with the further updated ranking result, continuing to determine whether the two phases corresponding to the side with the highest current ranking in the further updated ranking result can be co-placed on the available resources of the computing cluster, until it is determined that the side with the highest current ranking in the ranking result can be co-placed on the available resources of the computing cluster, and continuing to execute the subsequent step S4015 of the current round.

If all edges in the ordering result are deleted, and two stages corresponding to one edge cannot be found, the two stages can be placed together in the available resources of the computing cluster, the current graph G of the data analysis job is indicated to be the graph G which cannot be subjected to continuous scheduling optimization, and at the moment, a first parallel scheduling scheme for each task in each stage in the data analysis job is determined directly according to the graph G. Specifically: according to the side weight of the side between the stages in the graph G, two stages corresponding to the side with the side weight value of 0 are commonly distributed to the same server capable of accommodating the next two stages, and the respective task number of each stage is determined according to the parallelism of each stage, so that a final first parallel scheduling scheme is obtained. The number of parallelism of the stages is the same as the number of tasks of the stages.

Since it is a loop-optimized process in determining the first parallel scheduling scheme for each task in each stage in the data analysis job, the determination of the parallelism of each stage will be determined in the subsequent steps. In the loop optimization process, the loop round that returns to step S4014 again occurs, and the subsequent steps of step S4014 will be executed at least once, where the parallelism of each phase will be known. In the embodiment of the present invention, the determination of the parallelism of each stage will be described in the subsequent steps.

For example, the ordering result in the current round includes an edge a, an edge B, an edge C, and an edge D in the ordering order, and it is first determined whether two phases corresponding to the edge a with the forefront ordering can be co-placed on the available resources of the computing cluster, and when two phases corresponding to the edge a cannot be co-placed on the available resources of the computing cluster, the edge a in the ordering result is deleted. At this time, the ordering result only includes the edge B, the edge C, and the edge D, and then it is continuously determined whether two phases corresponding to the edge B with the forefront ordering can be co-placed on the available resources of the computing cluster, and when two phases corresponding to the edge B cannot be co-placed on the available resources of the computing cluster, the edge B in the ordering result is deleted. At this time, the ordering result only includes the edge C and the edge D, and then it is continuously determined whether the two phases corresponding to the edge C with the forefront ordering can be placed together on the available resources of the computing cluster, and when the two phases corresponding to the edge C cannot be placed together on the available resources of the computing cluster, the edge C in the ordering result is deleted. At this time, the ordering result includes only the edge D, and then it is continuously determined whether two phases corresponding to the edge D with the highest ordering can be co-placed on the available resources of the computing cluster, and when two phases corresponding to the edge D can be co-placed on the available resources of the computing cluster, the subsequent step S4015 of the current round of circulation is continuously performed at this time.

After step S4014 of the current round is performed, step S4015 of the current round is performed: step S4015 of the current round is performed based on the updated graph G obtained in step S4014 of the current round.

For example, as shown in fig. 2, the graph G of the data analysis job is originally the graph G obtained after weighting as shown in fig. 2, when two phases corresponding to the first-ordered edge are obtained and are the phases 10 and 11 in fig. 2, the edge weight 6 of the edge formed by the phases 10 and 11 is set to 0, so that an updated graph G is obtained, that is, the node structure and the weight of other parts are unchanged, and only the updated graph G obtained by setting the edge weight 6 of the edge formed by the phases 10 and 11 to 0 is obtained.

The optimal parallelism ratio between every two phases can be determined according to the relationship between each phase in the updated graph G obtained in step S4014 in the current cycle and the respective running time model of each phase.

In the present invention, the step S4015 includes: determining a relationship between each two phases according to the graph G; under the condition that the relation between two phases is a brother phase, determining the optimal parallelism proportion of the two phases according to the running time models of the two phases respectively through a first proportion relation; and under the condition that the relation between the two phases is a parent-child phase, determining the optimal parallelism proportion of the two phases according to the respective running time models of the two phases through a second proportion relation.

In an embodiment of the present invention, one implementation of step S4015 is; from graph G, relationships to various phases can be determined, including whether a sibling phase relationship or a parent-child phase relationship between two phases.

Illustratively, taking FIG. 2 as an example, for phase 1 and phase 2 in FIG. 2, both belong to sibling phases; for stage 3 and stage 4 in FIG. 2, both belong to sibling stages; for stage 5 and stage 8 in FIG. 2, both belong to the parent-child stage; for phase 1 and phase 12 in fig. 2, both belong to sibling phases. It should be understood that the specific structure of the sibling and parent-child phases is only exemplary and not all of the sibling and parent-child phases in fig. 2 are listed here.

In the case that the relationship between two phases is a sibling phase, determining the optimal parallelism ratio of the two phases according to the respective runtime models of the two phases through a first ratio relationship, wherein the first ratio relationship is in the respective runtime model of the two phasesThe ratio between them. Illustratively, as shown in FIG. 2, phase 1 and phase 2 are sibling phases, the first proportional relationship between phase 1 and phase 2 is +. >，/>As part of the run-time model of phase 1, which decreases with increasing parallelism, +.>For the part of the run-time model of phase 2 that decreases with increasing parallelism, the optimal parallelism ratio between phase 1 and phase 2 is determined to be +_ according to the first ratio relation between phase 1 and phase 2>。

Under the condition that the relation between two phases is a parent-child phase, determining the optimal parallelism proportion of the two phases according to the respective operation time models of the two phases through a second proportion relation, wherein the second proportion relation is in the operation time model of the corresponding two phases under the root numberThe ratio between them. Illustratively, as shown in FIG. 2, phases 5 and 8 are sibling phases, phases 5 and 8The second proportional relationship between the sections 8 is +.>，/>For the part of the run-time model of phase 5 that decreases with increasing parallelism, +.>For the part of the run-time model of phase 8 that decreases with increasing parallelism, the optimal parallelism ratio between phase 5 and phase 8 is determined to be +_ according to the second ratio relationship between phase 5 and phase 8>。

After the current cycle has been performed in step S4015, obtaining the optimal parallelism ratio between each two phases, step S4016 of the current cycle is performed: determining whether a sibling stage and a parent-child stage exist in the maximum depth in the updated graph G obtained in step S4014 in the current round. Step S4017 of the current round of the loop is performed according to the obtained result.

In step S4017 of the current round, if the result obtained in step S4016 of the current round characterizes that there is a sibling stage at the maximum depth in the updated graph G, the sibling stages are combined to further update the updated graph G. At the same time, a first fitting parameter in the runtime model of the phases obtained by merging is determined based on the optimal parallelism ratio of the sibling phases and the first algorithm. The first algorithm isWherein->And->Run-time model for phase i, respectively, the part decreasing with increasing parallelism and the run-time model for phase jThe part of the line time model decreasing with the increase of parallelism, stage i and stage j are sibling stages, and the part is a block of the line time model>The first fitting parameters in the run-time model of the phase ij obtained for the combination of the phase i and the phase j are the part of the run-time model corresponding to the phase ij obtained for the combination, which decreases with increasing parallelism.

Taking fig. 2 as an example and continuing to use the above example, in the above example, after setting the edge weights of the edges corresponding to the stage 10 and the stage 11 to 0 in the current cycle, an updated graph G is obtained, and for the updated graph G, in step S4016 in the current cycle, it is determined that a sibling stage exists to the maximum depth in the updated graph G, which is a sibling stage consisting of the stage 1 and the stage 2, and a sibling stage consisting of the stage 3 and the stage 4.

At this time, the sibling phases formed by the phase 1 and the phase 2 are combined into one phaseThen based on the run-time models of phase 1 and phase 2, and the first algorithm, determine the phase +.>The first fitting parameter in the runtime model of (a) is +.>Wherein->As part of the run-time model of phase 1, which decreases with increasing parallelism, +.>For the part of the phase 2 runtime model that decreases with increasing parallelism, +.>Stage of obtaining for merger->The part of the corresponding runtime model that decreases with increasing parallelism, i.e. the phase of merging obtained +.>Is used to determine the first fitting parameters in the runtime model. At the same time, the sibling phases of phase 3 and phase 4 are combined into one phase +.>Then based on the run-time models of stage 3 and stage 4, and the first algorithm, determine the stage +.>The first fitting parameter in the runtime model of (a) is +.>Wherein->For the part of the phase 3 runtime model that decreases with increasing parallelism, +.>For the part of the run-time model of phase 4 that decreases with increasing parallelism, +.>Stage of obtaining for merger->The part of the corresponding runtime model that decreases with increasing parallelism, i.e. the phase of merging obtained +. >Is used to determine the first fitting parameters in the runtime model. The updated graph G will thus be further updated in step S4017 of the current round, as shown in fig. 3.

After calculating the first fitting parameter in the runtime model of the stage obtained by merging the sibling stages with the maximum depth, based on the further updated graph G, step S4016 of returning to the current cycle continues to perform: it is determined whether there are sibling and parent-child phases for the maximum depth in the further updated graph G. Step S4017 of the current round of the loop is performed according to the obtained result.

In step S4017 of the current round, the result obtained in step S4016 of the current round characterizes that no sibling stage exists at the maximum depth in the further updated graph G, and if a parent-child stage exists, the parent-child stages are combined to further update the further updated graph G. At the same time, a first fitting parameter in the run-time model of the phase obtained by merging is determined based on the optimal parallelism ratio of the parent-child phases and the second algorithm. The second algorithm isWherein- >And->The part of the run-time model of the stage x, which decreases with increasing parallelism, and the part of the run-time model of the stage y, which decreases with increasing parallelism, are respectively, the stages x and y being parent-child stages, the +.>The part of the run-time model corresponding to the phase xy obtained for the combination of the phase x and the phase y, which decreases with increasing parallelism, is the first fitting parameter in the run-time model of the phase xy obtained for the combination.

Illustratively, taking FIG. 3 as an example and continuing with the above example, a further updated graph G as shown in FIG. 3 will be obtained for which it is determined in step S4016 of the current round that there is no sibling stage to the maximum depth in the further updated graph G, but there is a parentSub-stages, respectively stages12 and phase 10, phase 9 and phase +.>A male-female phase consisting of phase 5 and phase 8, and a male-female phase consisting of phase 6 and phase 7.

At this time will stageThe parent-child phases of 12 and phase 10 are combined into one phase +.>Then based on stage->And the run-time model of phase 10, and a second algorithm determining the phases +. >The first fitting parameter in the runtime model of (a) is +.>Wherein->For stage->Part of the runtime model of (1) that decreases with increasing parallelism,/v>For the part of the run-time model of phase 10 that decreases with increasing parallelism, +.>Stage of obtaining for merger->The part of the corresponding runtime model that decreases with increasing parallelism, i.e. the phase of merging obtained +.>Is used to determine the first fitting parameters in the runtime model. Based on the same embodiments as above, the get will stage +.>The parent-child phases consisting of phase 9 are combined into phase +.>Stage->Is->That is, stageIs used to determine the first fitting parameters in the runtime model. At the same time, the parent-child phases of phase 5 and phase 8 are combined into one phase +.>Then based on the run-time models of stage 5 and stage 8, and a second algorithm, determine the stage +.>The first fitting parameter in the runtime model of (a) is +.>Wherein->For the part of the run-time model of phase 5 that decreases with increasing parallelism, +.>Decreasing with increasing parallelism in the run-time model for phase 8Part (S)>Stage of obtaining for merger->The part of the corresponding runtime model that decreases with increasing parallelism, i.e. the phase of merging obtained +. >Is used to determine the first fitting parameters in the runtime model. Based on the same embodiments as described above, it is possible to calculate the merging of the parent-child phases consisting of phase 6 and phase 7 into phase +.>Stage->Is->That is, stage->Is used to determine the first fitting parameters in the runtime model. Whereby the above-mentioned further updated graph G will be updated again, as shown in fig. 4.

After calculating the first fitting parameters in the runtime model of the phase obtained by merging the parent-child phases with the maximum depth, based on the updated graph G again, the step S4016 of the current round is returned again to be executed until the graph G merges into a single phase where no sibling phase and parent-child phase exist any more, at which time the next step S4018 of the current round is executed.

Step S4018 of the current cycle: and determining the total parallelism which is currently available according to the current available resources of the computing cluster, wherein the total parallelism is fully participated in the parallelism scheduling in the graph G with only a single stage in the current round. And determining the optimal parallelism distribution result to distribute the total parallelism to all stages in the graph G when the combination processing is not performed through reverse deduction according to the obtained total parallelism, a first fitting parameter in a running time model of the stages obtained when the combination processing is performed each time and the optimal parallelism proportion of each stage in the graph G when the combination processing is not performed in the current circulation round. The allocation result includes the optimal parallelism number of each stage in the graph G when the merging process is not performed in the current cycle. Wherein the graph G when the merging process is not performed in the current round represents the graph G obtained when the execution of step S4014 is completed in the current round.

Illustratively, taking FIG. 5 as an example, the stage of maximum depth in the graph GAnd stage->Belongs to brother stage, stage +.>And stage->The constituent sibling phases are combined into one phase +.>Then based on stage->And stage->And a first algorithm determining the stage to merge acquired +.>The first fitting parameters in the runtime model of (a) areWherein->For stage->Part of the runtime model of (1) that decreases with increasing parallelism,/v>For stage->Part of the runtime model of (1) that decreases with increasing parallelism,/v>Stage of obtaining for merger->The part of the corresponding runtime model that decreases with increasing parallelism, i.e. the phase of merging obtained +.>Is used to determine the first fitting parameters in the runtime model. Stage in the diagram G +.>And stage->Carry out the merge get stage->Thereafter, a new graph G is formed, which includes only the phases +.>And stage->Stage of maximum depth in this new graph G +.>Stage->Constitutes a parent-child phase, so that the phase is continued at this point +.>And stage->The combined parent-child phases are combined into one phase +.>Then based on stage->And stage->And a second algorithm determining the stage to merge acquired +. >The first fitting parameter in the runtime model of (a) is +.>Wherein->For stage->Part of the runtime model of (1) that decreases with increasing parallelism,/v>For stage->Part of the runtime model of (1) that decreases with increasing parallelism,/v>Stage of obtaining for merger->The part of the corresponding runtime model that decreases with increasing parallelism, i.e. the phase of merging obtained +.>Is used to determine the first fitting parameters in the runtime model. The total parallelism, denoted N in this example, to the current availability of the computing cluster can then be determined from the current availability of resources to the computing cluster.

Since the total parallelism N is known, the simultaneous phasesIs made up of stages->And stage->Obtained by combining, thus in the current round of the cycle, stage +.>And stage->The sum of the parallelism of (2) is the same as the total parallelism N. Stage->And stage->And constitutes the parent-child stage, so that the optimal parallelism ratio between the two is known, i.e. +.>. But->The first fitting parameters, which are the stages in which the original exists in the graph G for which the weight acquisition was originally performed, corresponding to the data analysis job submitted by the user, have been calculated in step S103, and thus +.>Belonging to a known quantity. But->In step S4017 in the current round of the cycle, already based on +. >And->Calculated, thus->Belonging to a known quantity. Thus, stage->And stage->The sum of the parallelism is the same as the total parallelism N, which is known, stage +.>And stage->The optimal parallelism ratio between them is known, from which the acquisition phase +.>And stage->The number of optimal parallelism in the current round of the loop, respectively.

Due toThe optimal number of parallel degrees in the current round of the cycle has been calculated, phase +.>Is made up of stages->And stage->Obtained by combining, thus in the current round of the cycle, stage +.>And stage->Sum and stage of parallelism of (c)The number of parallelism of (c) is the same. At the same time, stage->And stage->Composition sibling stage, optimal parallelism ratio of the twoAnd->And->The first fitting parameters, which are the stages in which the original exists in the graph G for which the weight acquisition was originally performed, corresponding to the data analysis job submitted by the user, have been calculated in step S103, and thus +.>And->Belonging to a known quantity. Thus, stage->And stage->Sum and phase of parallelism>The same number of parallelism, stage->Has been calculated, stage +.>And stage->Optimal parallelism ratio between->From this, the acquisition phase can be calculated as known >And stage->The number of optimal parallelism in the current round of the loop, respectively. Thus, the respective optimal parallelism amounts of the respective stages originally existing in the graph G when the merging process is not performed in the current cycle are calculated and obtained.

After obtaining the respective optimal parallelism amounts of the respective stages originally present in the graph G when the merging process is not performed in the current round by executing step S4018 in the current round, step S4019 in the current round is executed: and carrying out calculation on the respective optimal parallelism quantity of each stage in the graph G when the combination processing is not performed in the step S4018 in the current circulation round, carrying out calculation on each operation time model of each stage in the graph G when the combination processing is not performed in the corresponding current circulation round, obtaining a new operation time for each stage by calculation, and updating the weight of each stage and the edge weight of each edge in the graph G when the combination processing is not performed in the current circulation round according to each new operation time of each stage. And then returning the graph G with the updated weight and the updated edge weight to the step S4012 for a new round of circulation by the current round of circulation to further optimize the parallelism of the stages and the co-placement between the stages until two stages which can be co-placed on the available resources of the computing cluster no longer exist in the final graph G. That is, in the new round of the cycle, the graph G used in executing step S4012 is the graph G finally obtained in the previous round of the new round of the cycle.

In the embodiment of the invention, the method and the device perform joint iterative optimization by performing parallelism configuration and stage co-placement on each stage of the data analysis operation. In the process of each iteration optimization, two stages are subjected to optimization for joint placement, and then after joint placement, the parallelism of each stage is optimized until the two stages which can be jointly placed are not included in the obtained graph G in the final iteration round, and at the moment, the optimal scheduling scheme corresponding to the data analysis operation can be obtained based on the current graph G and the respective parallelism quantity of each stage obtained through calculation.

In the present invention, acquiring runtime data information and parallelism data information in a data analysis job, determining a relationship between runtime and parallelism of each stage of the data analysis job, includes: when the data information of the data analysis job comprises historical operation information, determining the relation between the operation time and the parallelism of each stage of the data analysis job by analyzing the historical operation information; and under the condition that the data information of the data analysis job does not comprise historical operation information, carrying out multiple operations on the data analysis job through a plurality of preset different parallelism configurations so as to obtain the relation between the operation time and the parallelism of each stage of the data analysis job.

In an embodiment of the present invention, acquiring runtime data information and parallelism data information in a data analysis job, one implementation of determining a relationship between runtime and parallelism of each stage of the data analysis job is: when the data analysis job submitted by the user is the data analysis job processed by the data analysis job scheduler, the data information of the data analysis job comprises the historical operation information, and the data analysis job scheduler can directly acquire the historical operation information in the data information of the data analysis job submitted by the user and analyze the acquired historical operation information to acquire the relation between the operation time and the parallelism of each stage of the data analysis job. In the case that the data analysis job submitted by the user is an unprocessed data analysis job, the data information of the data analysis job does not include historical operation information, and at this time, a plurality of different parallelism configurations are preset, and the data analysis job is controlled by the data analysis job scheduler to be executed for a plurality of times in the preset plurality of different parallelism configurations, so as to obtain the relationship between the operation time and the parallelism of each stage of the data analysis job. It should be appreciated that the number of relationships between the respective run times and parallelism for each stage of the data analysis job is the same as the number of preset different parallelism configurations.

In the present invention, the method further comprises: and under the condition that the data analysis job completes the new operation, updating the respective operation time model of each stage of the data analysis job according to the operation data of the new operation.

In the embodiment of the invention, after the same data analysis job is submitted and executed for a plurality of times, the newly submitted data analysis job can generate new operation data after being executed, and based on the obtained new operation data, the operation time models of the stages in the data analysis job are updated so as to ensure that the operation time models of the stages in the data analysis job are in the latest state.

In the present invention, the method further comprises: receiving an execution result of the task executed by the function execution module; and controlling the function execution module to terminate the task or re-execute the task under the condition that the execution result indicates that the execution abnormality exists.

In the embodiment of the invention, the function execution model of the server in the computing cluster executes the task, the function execution result and the corresponding abnormal information are reported to the monitoring module of the data analysis job scheduler in the execution process, and when the abnormality occurs, the function execution module of the server assists the monitoring module to terminate the execution of the corresponding task or re-execute the task.

In the present invention, the method further comprises: acquiring running time data information, parallelism data information, data processing amount data information and memory occupation information in a data analysis operation under the condition that the target model is a memory occupation model, and determining the relation between the running cost and parallelism of each stage of the data analysis operation; fitting to obtain respective operation cost models of the stages according to the relation between the operation cost and the parallelism of the stages; determining a second parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running cost model of each stage and the current available resources of the computing cluster; and controlling a function execution module on each server in the computing cluster to execute tasks in the data analysis job distributed to the server where the function execution module is located according to the second parallel scheduling scheme.

In the embodiment of the invention, when the optimization target specified by the user is the operation cost, the target model corresponding to the operation cost optimization target is determined to be the operation cost model. Wherein the running cost of the data analysis job is proportional to the product between the resource consumption of the data analysis job and the running time, wherein the memory size occupied by the data analysis job represents the resource consumption of the data analysis job.

After the target model is determined to be the running cost model, running time data information, parallelism data information, data processing capacity data information and memory occupation information in the data analysis job submitted by the user are acquired at the moment, the four data information are analyzed, and the relation between the running cost and the parallelism of each stage in the data analysis job is determined. For any one of all phases in the data analysis operation, the data analysis operation is performed by determining the data to belong to the any one phaseAnd fitting a plurality of relations between the running cost of each stage and the parallelism into a function curve, wherein the function curve is the running cost model of any stage. Wherein the expression of the operation cost model is. Wherein (1)>For the running costs of the corresponding phases +.>For the memory occupation of the corresponding stage, +.>For the run time of the corresponding phase +.>For the data throughput of the corresponding phase, +.>Memory overhead inherent for each task in the corresponding phase, < >>Is the parallelism. Thus, the first and second substrates are bonded together,. Wherein due to the inherent time->Relative to->Negligible memory overhead inherent to all tasks +.>Memory occupied with respect to the amount of data processed in the corresponding phase >Can be ignored, thusRelative to->Can be ignored. Thus, the expression of the final running cost model can be expressed asDue to->Is constant, thus the running cost model of each stageIs similar to the runtime model of the individual phases, wherein +.>Third fitting parameter indicating the need for fitting in the running cost model,/->And a fourth fitting parameter which represents the fitting required in the running cost model. Since the total running cost of the data analysis job is the sum of the running costs of all tasks in all stages in the data analysis job, the problem of minimizing the running cost is similar to the problem of minimizing the running time based on the graph G described above.

In an embodiment of the present invention, after obtaining the running cost model of each stage of the data analysis job submitted by the user, determining, by the data analysis job scheduler, a second parallel scheduling scheme for each task in each stage of the data analysis job according to the DAG (Directed Acyclic Graph directed acyclic graph) of the data analysis job submitted by the user and the running cost model of each stage of the data analysis job and the current available resources of the computing cluster, the scheduling purpose of the second parallel scheduling scheme being to minimize the running cost of the data analysis job. After the data analysis job scheduler determines the second parallel scheduling scheme, the second parallel scheduling scheme is sent to each server in the computing cluster. Each server analyzes the received second parallel scheduling scheme to determine tasks in the data analysis job which are distributed to the server to be executed. And executing the tasks in the data analysis job distributed to the self server through the function execution model of the self server according to the determined tasks in the data analysis job distributed to the self server for execution.

In the invention, the determining a second parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the running cost model of each stage and the current available resources of the computing cluster comprises the following steps:

In the embodiment of the present invention, determining, according to the DAG of the data analysis job, the running cost model of each stage, and the current available resources of the computing cluster, an implementation manner of the second parallel scheduling scheme for each task in each stage of the data analysis job is similar to the above-described determining, according to the DAG of the data analysis job, the running time model of each stage, and the current available resources of the computing cluster, the first parallel scheduling scheme for each task in each stage of the data analysis job.

Determining an implementation manner of the second parallel scheduling scheme for each task in each stage of the data analysis job according to the DAG of the data analysis job, the running cost model of each stage, and the current available resources of the computing cluster, and determining an implementation manner of the first parallel scheduling scheme for each task in each stage of the data analysis job according to the DAG of the data analysis job, the running time model of each stage, and the current available resources of the computing cluster, wherein the implementation manner includes the following two points:

at point 1, in step S4025, the target model used to determine the optimal parallelism ratio between each two phases in the data analysis job is an operation cost model, unlike the target model used to determine the optimal parallelism ratio between each two phases in the data analysis job in step S4015, which is a runtime model.

In the present invention, the step S4025 includes: and determining the optimal parallelism ratio between every two stages according to the respective running cost models of the stages in the graph G.

In the embodiment of the present invention, the implementation of step S4025 in the present invention is: the optimal parallelism ratio between every two stages can be determined according to the respective operation cost models of the stages in the graph G, and any two stages in the graph G And->The optimal parallelism ratio between them is +.>。

At the 2 nd point, in the implementation of the second parallel scheduling scheme, the number of parallelism of each stage in the graph G can be obtained directly according to the optimal parallelism ratio between any two stages in the graph G and the total parallelism determined according to the current available resources of the computing cluster.

Illustratively, FIG. 6 includes stages in the graph GStage->And stage->. The optimal parallelism ratio between every two arbitrary phases can be determined by the operation cost model of each phase obtained through calculation, which is respectively +.>、/>And->. The total parallelism, denoted N in this example, to the current availability of the computing cluster can then be determined from the current availability of resources to the computing cluster.

Since the total parallelism N is known, all phases included in the graph G are phases at the same timeStage->And stage->Thus, it is. Wherein N is a known quantity, +.>、/>And->The third fitting parameters for the running cost model in the respective corresponding stage, which have been calculated in the above steps and thus also belong to the known quantity, are in +.>、/>Andin the case of a known quantity, the optimal parallelism ratio between any two phases in the graph G is also of a known quantity, and is therefore based on the formula +. >、/>、And->The number of optimal parallelism for each stage in the current round of the cycle can be calculated. Thus, the respective optimal parallelism amounts of the respective stages originally existing in the graph G when the merging process is not performed in the current cycle are calculated and obtained.

At point 3, after obtaining the respective optimal parallelism amounts of the respective stages originally existing in the graph G when the merging process is not performed in the current round by executing step S4026 in the current round, step S4027 in the current round is executed: and (3) carrying out calculation on the respective optimal parallelism quantity of each stage in the graph G when the combination processing is not performed, which is obtained by calculation in the step S4026 in the current circulation, into the respective operation cost model of each stage in the graph G when the combination processing is not performed in the corresponding current circulation, wherein each stage can be calculated to obtain a new operation cost, and the weight of each stage and the edge weight of each edge in the graph G when the combination processing is not performed in the current circulation are updated according to the respective new operation cost of each stage. And then returning the graph G with the updated weight and side weight in the current cycle to the step S4022 to perform a new cycle of the graph G, so as to further optimize the parallelism of the stages and the co-placement between the stages until two stages which can be co-placed on the available resources of the computing cluster no longer exist in the final graph G. That is, in the new round of the cycle, the graph G used in executing step S4022 is the graph G finally obtained in the previous round of the new round of the cycle.

The flexible parallel scheduling method for server non-perception data analysis provided by the invention can optimize task placement of each stage according to resource conditions while automatically configuring parallelism of each stage of data analysis operation, thereby greatly reducing time and expense of data transmission and further reducing operation completion time and operation cost for users.

A second aspect of the present invention provides a flexible parallel scheduling system for server-agnostic data analysis, as shown in fig. 7, the system 700 includes:

the target model determining module 701 is configured to determine, according to an optimization target, a target model corresponding to the optimization target;

a target relationship determining module 702, configured to obtain, when the target model is a runtime model, runtime data information and parallelism data information in a data analysis job, and determine a relationship between runtime and parallelism of each stage of the data analysis job;

a runtime model construction module 703, configured to obtain, by fitting, a respective runtime model of each stage according to a relationship between the runtime and parallelism of each stage;

a first parallel scheduling scheme determining module 704, configured to determine a first parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective runtime model of each stage, and the current available resources of the computing cluster;

And the task execution module 705 is configured to control, according to the first parallel scheduling scheme, a function execution module on each server in the computing cluster to execute a task in the data analysis job allocated to the server where the task execution module is located.

Optionally, the first parallel scheduling scheme determining module 704 includes:

a graph construction module for obtaining a graph by weighting the DAG of the data analysis operationThe method comprises the steps of carrying out a first treatment on the surface of the Wherein each node in the graph G is +.>Representing a phase of the data analysis operation, each side of the graph G is +.>Representation phase->Reading stage->The generated data, the weight value of the node represents the calculation time of the phase corresponding to the node, the edge weight is the data transmission time between two phases, including the phase +.>Write data and phase->Is a read data of (a);

the ordering module is used for ordering all edges in the graph G according to the total weight size sequence of the paths and the weight size sequence of the edges in the graph G to obtain an ordering result;

the co-placement determining module is used for determining whether two stages corresponding to the first side in the sorting result can be co-placed on available resources of the computing cluster;

An edge weight updating module, configured to set an edge weight of the top-ranked edge to 0 to update the graph G, in a case where two phases corresponding to the top-ranked edge can be co-placed on an available resource of the computing cluster; and the processing module is used for deleting the first side from the sorting result to update the sorting result under the condition that two stages corresponding to the first side cannot be placed on the available resources of the computing cluster together, and controlling the co-placement determining module to execute; and determining a first parallel scheduling scheme for each task in each stage in the data analysis job according to the graph G if no available resources are co-located on the computing cluster in the ordering result;

the optimal parallelism ratio determining module is used for determining the optimal parallelism ratio between every two stages according to the relation between the stages in the graph G and the running time model of each stage;

the phase relation determining module is used for determining whether a brother phase and a father-son phase exist in the maximum depth in the graph G;

A stage merging module, configured to, in a case where a sibling stage exists at a maximum depth in the graph G, merge the sibling stages to update the graph G, and determine a first fitting parameter in a runtime model of a stage obtained by merging based on an optimal parallelism ratio of the sibling stages and a first algorithm, so as to control the stage relationship determining module to execute; and the step of merging the two phases corresponding to the parent-child phases under the condition that no brother phase exists at the maximum depth in the graph G and the parent-child phase exists at the maximum depth in the graph G so as to update the graph G, and determining a first fitting parameter in a runtime model of the phase obtained by merging based on the optimal parallelism ratio of the parent-child phases and a second algorithm, and controlling the phase relation determining module to execute; and obtaining the graph G including a single stage obtained by merging, in the case where there is no sibling stage and parent-child stage of maximum depth in the graph G;

the parallelism quantity determining module is used for determining total parallelism according to the current available resources of the computing cluster and determining the quantity of parallelism of each stage in the graph G when the merging process is not performed according to the total parallelism and a first fitting parameter in a running time model of each merging obtained stage;

And the weight updating module is used for determining the latest weight of each stage and each side in the graph G when the combination processing is not performed according to the parallelism quantity of each stage in the graph G when the combination processing is not performed so as to update the weight in the graph G when the combination processing is not performed, and controlling the sequencing module to execute based on the graph G when the combination processing is not performed after the weight updating.

Optionally, the optimal parallelism ratio determining module includes:

a first stage relationship determining module, configured to determine a relationship between every two stages according to the graph G;

the first optimal parallelism ratio determining module is used for determining the optimal parallelism ratio of the two phases according to the respective running time models of the two phases through a first ratio relation under the condition that the relation between the two phases is a brother phase;

and the second optimal parallelism ratio determining module is used for determining the optimal parallelism ratio of the two phases according to the respective running time models of the two phases through a second ratio relation under the condition that the relation between the two phases is a parent-child phase.

Optionally, the target relationship determination module 702 includes:

A first target relationship determining module, configured to determine, when the data information of the data analysis job includes historical operation information, a relationship between an operation time and a parallelism of each stage of the data analysis job by analyzing the historical operation information;

and the second target relation determining module is used for operating the data analysis job for a plurality of times through a plurality of preset different parallelism configurations under the condition that the data information of the data analysis job does not comprise historical operation information so as to obtain the relation between the operation time and the parallelism of each stage of the data analysis job.

Optionally, the system further comprises:

and the running time model updating module is used for updating the running time models of each stage of the data analysis job according to the running data of the new running when the data analysis job finishes the new running.

Optionally, the system further comprises:

the execution result receiving module is used for receiving the execution result of the task executed by the function executing module;

and the monitoring sub-module is used for controlling the function execution module to terminate the task or re-execute the task under the condition that the execution result represents that the execution abnormality exists.

Optionally, the system further comprises:

the third target relation determining module is used for acquiring running time data information, parallelism data information, data processing amount data information and memory occupation information in the data analysis operation under the condition that the target model is a memory occupation model, and determining the relation between the running cost and the parallelism of each stage of the data analysis operation;

the operation cost model construction module is used for obtaining respective operation cost models of the stages by fitting according to the relation between the operation cost and the parallelism of the stages;

the second parallel scheduling scheme determining module is used for determining a second parallel scheduling scheme for each task in each stage in the data analysis job according to the DAG of the data analysis job, the respective running cost model of each stage and the current available resources of the computing cluster;

and the first task execution module is used for controlling the function execution modules on all servers in the computing cluster to execute the tasks in the data analysis job distributed to the servers where the first task execution module is located according to the second parallel scheduling scheme.

Optionally, the second parallel scheduling scheme determining module includes:

A first graph construction module for obtaining a graph by weighting the DAG of the data analysis operationThe method comprises the steps of carrying out a first treatment on the surface of the Wherein each node in the graph G is +.>Representing a phase of the data analysis operation, each side of the graph G is +.>Representation phase->Reading stage->The generated data, the weight of the node represents the calculation cost of the corresponding stage of the node, the edge weight is the cost of data transmission between two stages, including the stage +.>Write data and phase->Is a read data of (a);

the first ordering module is used for ordering all edges in the graph G according to the weight sequence of the edges in the graph G to obtain an ordering result;

the first common placement determining module is used for determining whether two stages corresponding to the first side in the sorting result can be placed on the available resources of the computing cluster together;

a first edge weight updating module, configured to set an edge weight of the top-ranked edge to 0 to update the graph G, in a case where two phases corresponding to the top-ranked edge can be co-placed on an available resource of the computing cluster; and the processing module is used for deleting the first side from the sorting result to update the sorting result under the condition that two stages corresponding to the first side cannot be placed on the available resources of the computing cluster together, and controlling the co-placement determining module to execute; and determining a second parallel scheduling scheme for each task in each stage in the data analysis job according to the graph G if no resources are available in the ordering result that can be co-placed on the computing cluster;

The third optimal parallelism ratio determining module is used for determining the optimal parallelism ratio between every two stages according to the respective running cost models of the stages in the graph G;

the first parallelism quantity determining module is used for determining total parallelism according to the current available resources of the computing cluster and determining the parallelism quantity of each stage in the graph G according to the total parallelism and the optimal parallelism proportion between every two stages;

the first weight updating module is used for determining the latest weight of each stage and each side in the graph G according to the parallelism quantity of each stage in the graph G so as to update the weight in the graph G, and controlling the first sequencing module to execute based on the graph G which is subjected to weight updating and is not subjected to merging processing.

In the embodiment of the invention, the parallel scheduling system for server non-perception data analysis mainly comprises three parts, namely a data analysis job scheduler, a calculation cluster and a storage cluster. The data analysis job scheduling module is further divided into a job analysis module, an elastic parallel scheduling module and a monitoring module. The operation module is used for analyzing the relevant operation data information of each stage of the data analysis operation and establishing respective target models of each stage. Therefore, the above-mentioned object model determining module, object relation determining module, runtime model constructing module, runtime model updating module, third object relation determining module, and runtime cost model constructing module all belong to the job analyzing module. The flexible parallel scheduling module is used for formulating a parallel scheduling scheme according to the target model established by the job analysis module, the DAG of the data analysis job and the current available resources of the computing cluster. Therefore, the first parallel scheduling scheme determining module and the second parallel scheduling scheme determining module both belong to the flexible parallel scheduling module. The monitoring module is used for monitoring available resources on each server reported by the function execution module, including idle CPU, memory and the like. In addition, the monitoring module tracks the execution condition of the function corresponding to each task in the data analysis job, acquires information such as task state, execution result, execution time, running cost and the like through communication with the function execution module, and processes abnormal events. When an exception occurs, the monitoring module notifies the function execution module to terminate the exception task and re-execute the task. Therefore, the execution result receiving module and the monitoring sub-module both belong to the monitoring module.

In the embodiment of the invention, the computing cluster comprises a plurality of servers, and the function execution module on each server is responsible for executing the tasks in the data analysis job according to the dispatching method of the data analysis job dispatcher and reporting the execution conditions of the functions corresponding to the tasks to the monitoring module of the data analysis job dispatcher. When the executing process is abnormal, the function executing module reports abnormal information to the monitoring module of the scheduler and assists in processing the abnormality. The functions on the same server can share high-speed data with approximately zero time consumption through a shared memory, and the functions on different servers transmit data through a remote storage cluster. Therefore, the task execution module and the first task execution module belong to function execution modules in the server. The storage cluster provides semantics of object storage for the computing cluster, and a server of the computing cluster can access the storage cluster through a network and read and write data. The storage cluster is responsible for providing data sharing for functions on different servers in the computing cluster, and the functions on the different servers write data to be transmitted to the storage cluster and read the storage cluster in sequence to realize data sharing.

The invention tests the query operation of four SQL databases by utilizing the open source TPC-DS data analysis reference frame to carry out experiments on the parallel scheduling system without perceived data analysis of the server, and the performance evaluation is mainly carried out in the aspects of operation completion time, operation cost, scheduling time and the like. The result shows that compared with the existing baseline method, the parallel scheduling system without perceived data analysis of the server provided by the invention can reduce the completion time of data analysis operation by 1.26-2.5 times, reduce the running cost by 1.09-1.83 times, and the scheduling time is negligible.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A parallel scheduling method for server unaware data analysis, characterized in that the method is applied to a data analysis job scheduler, and comprises the following steps:

determining a target model corresponding to an optimization target according to the optimization target, wherein the target model comprises: a run-time model and a memory footprint model;

According to the first parallel scheduling scheme, controlling a function execution module on each server in the computing cluster to execute tasks in the data analysis job distributed to the server where the function execution module is located;

2. The parallel scheduling method of server unaware data analysis according to claim 1, wherein the determining a first parallel scheduling scheme for each task in each stage of the data analysis job according to the DAG of the data analysis job, the respective runtime model of each stage, and the current available resources of the computing cluster includes:

3. The parallel scheduling method of server unaware data analysis according to claim 2, wherein the step S4015 comprises:

determining a relationship between each two phases according to the graph G;

4. The parallel scheduling method of server unaware data analysis according to claim 1, wherein acquiring the runtime data information and the parallel data information in the data analysis job, determining the relationship between the runtime and the parallelism of each stage of the data analysis job, comprises:

5. The parallel scheduling method of server unaware data analysis of claim 4, further comprising: and under the condition that the data analysis job completes the new operation, updating the respective operation time model of each stage of the data analysis job according to the operation data of the new operation.

6. The parallel scheduling method of server unaware data analysis of claim 1, further comprising:

7. The parallel scheduling method of server unaware data analysis according to claim 1, wherein the determining a second parallel scheduling scheme for each task in each stage of the data analysis job according to the DAG of the data analysis job, the running cost model of each stage, and the current available resources of the computing cluster includes:

step S4021: obtaining a graph by weighting the DAG of the data analysis jobThe method comprises the steps of carrying out a first treatment on the surface of the Wherein each node in the graph G is +.>Representing a phase of the data analysis operation, each side of the graph G is +.>Representation phaseReading stage->The weight of the node represents the generated dataThe computing cost of the phase corresponding to the node, the side weight is the cost of data transmission between two phases, including the phase ∈ ->Write data and phase- >Is a read data of (a);

8. A flexible parallel scheduling system for server unaware data analysis, the system comprising:

the target model determining module is used for determining a target model corresponding to an optimization target according to the optimization target, wherein the target model comprises: a run-time model and a memory footprint model;

the task execution module is used for controlling the function execution module on each server in the computing cluster to execute the tasks distributed to the data analysis job on the server per se according to the first parallel scheduling scheme;