CN111126668A - Spark operation time prediction method and device based on graph convolution network - Google Patents

Spark operation time prediction method and device based on graph convolution network Download PDF

Info

Publication number
CN111126668A
CN111126668A CN201911187393.0A CN201911187393A CN111126668A CN 111126668 A CN111126668 A CN 111126668A CN 201911187393 A CN201911187393 A CN 201911187393A CN 111126668 A CN111126668 A CN 111126668A
Authority
CN
China
Prior art keywords
operator
graph
graph convolution
convolution network
characteristic value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911187393.0A
Other languages
Chinese (zh)
Other versions
CN111126668B (en
Inventor
李东升
胡智尧
赖志权
梅松竹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National University of Defense Technology
Original Assignee
National University of Defense Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National University of Defense Technology filed Critical National University of Defense Technology
Priority to CN201911187393.0A priority Critical patent/CN111126668B/en
Publication of CN111126668A publication Critical patent/CN111126668A/en
Application granted granted Critical
Publication of CN111126668B publication Critical patent/CN111126668B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Human Resources & Organizations (AREA)
  • Game Theory and Decision Science (AREA)
  • Development Economics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application relates to a Spark operation time prediction method and device based on a graph convolution network. The method comprises the following steps: the method comprises the steps of obtaining a directed acyclic graph of Spark operation, constructing a multivariate vector of each operator according to operation information of each operator in the directed acyclic graph, obtaining a node attribute matrix according to the multivariate vector, inputting the node attribute matrix into a graph convolution network, outputting operator execution time, obtaining a loss function of the graph convolution network according to the operator execution time and actual execution time of each operator, reversely propagating and training the graph convolution network according to the loss function, inputting the node attribute matrix into the trained graph convolution network, extracting convolution layer output, obtaining a dependency characteristic value of a graph-like dependency relationship of an operator, extracting an explicit characteristic value in Spark operation, splicing the explicit characteristic value and the dependency characteristic value to obtain sample characteristics, training according to the sample characteristics and the loss function to obtain a prediction model, and predicting Spark operation time according to the prediction model. The method can improve the accuracy of time prediction.

Description

Spark operation time prediction method and device based on graph convolution network
Technical Field
The present application relates to the field of computer technologies, and in particular, to a Spark operation time prediction method and apparatus based on a graph convolution network.
Background
Programmers of large data jobs optimize the execution of the job by adjusting configuration parameters of the job (e.g., the number of computing tasks, etc.), ultimately reducing job completion time. Among the many alternative configurations is an optimal configuration in which the completion time of the job is minimal. The existing prediction method can judge the optimal configuration and the suboptimal configuration by predicting the job completion time under different configurations.
Currently, there are several main prediction methods: (1) ernest is a numerical fitting modeling method. The method analyzes the network communication process of three different modes in the execution process of the data parallel operation, and models the relation function among the number of machines, the data size and the operation completion time. The mathematical form of the relationship function is fixed, but the parameters therein need to be estimated by the sample data of the data parallel job. Ernest estimates these parameters using a non-negative least squares method. This may place high demands on the sample taken. For example, if a large data job with an input data size of 100GB is to be predicted, Ernest will test the execution time of the job under different input data sizes. This limits the usable range of Ernest: if another job is to be predicted, the sample needs to be re-collected. Therefore, Ernest can only be used for one data parallel job, not one type of application. (2) The random forest model method can respectively model a map task and a reduce task in the data parallel operation. However, this method is difficult to extend to complex data parallel operation, for example, the data parallel operation under Spark platform may involve more operators besides map and reduce. Moreover, there are graph-like dependencies between operators. (3) The hierarchical modeling approach may use multiple sub-models to reduce prediction error. Each sub-model is a regression tree model. The submodels are organized in a hierarchical manner. The hierarchical modeling approach does not analyze the underlying execution of data-parallel jobs as done by the Ernest and random forest approaches. The hierarchical modeling method considers various configuration parameters of the Spark platform, and the parameters belong to the displayed characteristics; the execution process of the data parallel job is not sufficiently considered. In general, the prediction model described above has a low accuracy of prediction.
Disclosure of Invention
In view of the above, it is necessary to provide a Spark operation time prediction method and device based on a graph convolution network, which can solve the problem of low accuracy of prediction model prediction.
A Spark job time prediction method based on a graph convolution network, the method comprising:
acquiring a directed acyclic graph of Spark operation;
constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector;
inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator;
according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
extracting an explicit characteristic value in Spark operation, and splicing the explicit characteristic value and the dependent characteristic value to obtain sample characteristics;
and training according to the sample characteristics and the loss function to obtain a prediction model, and predicting Spark operation time according to the prediction model.
In one embodiment, the method further comprises the following steps: constructing a multivariate vector of each operator according to the operator type, the data partition size, the memory resource quantity, the CPU core quantity and the calculation task quantity of each operator in the directed acyclic graph; wherein the operator type adopts a word vector to be embedded into the multivariate vector; and carrying out topological sorting on operators in the directed acyclic graph according to width-first search, and splicing the multivariate vectors according to sorting results of the operators to obtain a node attribute matrix.
In one embodiment, the method further comprises the following steps: and calculating the square sum of the difference between the operator execution time and the actual execution time of each operator to obtain a loss function of the graph convolution network.
In one embodiment, the method further comprises the following steps: the graph convolution network is a graph convolution neural network created by a directed acyclic graph convolution function based on a propagation rule; the graph convolution neural network includes: there are acyclic graph convolution layers and regression layers.
In one embodiment, the method further comprises the following steps: and inputting the node attribute matrix into the trained graph convolution network, and taking out the output of the convolution layer of the graph convolution network through a forward propagation algorithm to obtain a dependency characteristic value of the graph dependency relationship of the operator.
In one embodiment, the method further comprises the following steps: extracting the size of input data in Spark operation, the amount of memory resources allocated to Spark operation and the amount of computing resources allocated to Spark operation as display characteristic values; and splicing the explicit characteristic value and the dependent characteristic value to obtain the sample characteristic.
In one embodiment, the method further comprises the following steps: the prediction model is a fully connected neural network model trained by adopting a Bayesian regularization back propagation function.
A Spark job time prediction apparatus based on a graph convolution network, the apparatus comprising:
the implicit characteristic acquisition module is used for acquiring a directed acyclic graph of Spark operation; constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector; inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator; according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
the splicing module is used for extracting an explicit characteristic value in Spark operation and splicing the explicit characteristic value with the dependent characteristic value to obtain sample characteristics;
and the time prediction module is used for obtaining a prediction model according to the sample characteristics and the loss function training and predicting Spark operation time according to the prediction model.
A computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:
acquiring a directed acyclic graph of Spark operation;
constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector;
inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator;
according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
extracting an explicit characteristic value in Spark operation, and splicing the explicit characteristic value and the dependent characteristic value to obtain sample characteristics;
and training according to the sample characteristics and the loss function to obtain a prediction model, and predicting Spark operation time according to the prediction model.
A computer-readable storage medium, on which a computer program is stored which, when executed by a processor, carries out the steps of:
acquiring a directed acyclic graph of Spark operation;
constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector;
inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator;
according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
extracting an explicit characteristic value in Spark operation, and splicing the explicit characteristic value and the dependent characteristic value to obtain sample characteristics;
and training according to the sample characteristics and the loss function to obtain a prediction model, and predicting Spark operation time according to the prediction model.
According to the Spark operation time prediction method, the Spark operation time prediction device, the computer equipment and the storage medium based on the graph convolution network, the node attribute matrix is extracted from the directed acyclic graph of Spark operation, so that the graph-like dependency relationship between operators is analyzed through the graph convolution network to serve as an implicit characteristic, then the explicit characteristic of Spark operation is combined for predicting the completion time of the operation, and compared with a traditional prediction model, the Spark operation time prediction method, the Spark operation time prediction device, the computer equipment and the storage medium can achieve higher prediction accuracy by combining the implicit characteristic and the explicit characteristic prediction model.
Drawings
FIG. 1 is a flow chart illustrating a method for predicting Spark job time based on graph convolution network according to an embodiment;
FIG. 2 is a schematic block diagram illustrating a convolutional network in one embodiment;
FIG. 3 is a flowchart illustrating a node update step according to an embodiment;
FIG. 4 is a block diagram of a prediction module in one embodiment;
fig. 5 is a block diagram illustrating a structure of a Spark operation time prediction apparatus based on a graph convolution network according to an embodiment;
FIG. 6 is a diagram illustrating an internal structure of a computer device according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a Spark job time prediction method based on a graph volume network is provided, where the method is applicable to a terminal, the terminal has an operating environment of a Spark platform, and when the Spark job time prediction method based on the graph volume network is executed in the terminal, the method includes the following steps:
step 102, acquiring a directed acyclic graph of Spark operation.
For a complex big data job, the process of data parallel computing cannot achieve the intended purpose of processing data at a time, in this case, the big data job is divided into data parallel jobs including a plurality of computing stages, each computing stage includes a batch of parallel computing tasks, a fixed execution sequence exists between the computing stages, and the output of the previous computing computation is used as the input of the next computing node, so the fixed execution sequence between the computing stages is called as a dependency relationship, and according to the dependency relationship, it can be represented as a Directed Acyclic Graph (DAG).
And 104, constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector.
In a directed acyclic graph, big data jobs are processed one by different computation stages along directed edges representing data flows, and finally big data analysis results are generated, and within each computation stage, data is partitioned and distributed to a batch of computation tasks executed in parallel, and the computation tasks involve a series of operations, such as map and reduce operations in Hadoop and Spark. These operations are referred to as operators, each of which is a node in the directed acyclic graph.
In the node attribute matrix, each row element represents all attribute values of one node, and the attribute values can be determined according to the operation information.
And 106, inputting the node attribute matrix into the graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator.
The research object corresponding to the graph convolution network is graph data, so that the research on the directed acyclic graph is facilitated. The graph convolution network outputs an operator execution time for each operator.
And 108, reversely propagating the training graph convolution network according to the loss function, inputting the node attribute matrix into the trained graph convolution network, extracting convolution layer output, and obtaining a dependency characteristic value of the graph-like dependency relationship of the operator.
And step 110, extracting the explicit characteristic value in the Spark operation, and splicing the explicit characteristic value and the dependent characteristic value to obtain the sample characteristic.
Explicit feature values in Spark jobs refer to features that can be manually extracted, such as the number of computing tasks, the number of memory resources, and the like.
And 112, training according to the sample characteristics and the loss function to obtain a prediction model, and predicting Spark operation time according to the prediction model.
In the Spark operation time prediction method based on the graph convolution network, the node attribute matrix is extracted from the directed acyclic graph of Spark operation, so that the graph-like dependency relationship between operators is analyzed through the graph convolution network as an implicit characteristic, then the explicit characteristic of Spark operation is combined for predicting the completion time of the operation, and compared with the traditional prediction model, the method can realize higher prediction accuracy by combining the prediction models of the implicit characteristic and the explicit characteristic.
In one embodiment, the step of constructing the node property matrix comprises: and constructing a multivariate vector of each operator according to the operator type, the data partition size, the memory resource quantity, the CPU core quantity and the calculation task quantity of each operator in the directed acyclic graph, wherein the operator type adopts word vectors to embed the multivariate vector, the operators in the directed acyclic graph are subjected to topological sorting according to width-first search, and the multivariate vector is spliced according to the sorting result of the operators to obtain a node attribute matrix.
In another embodiment, the step of deriving the loss function comprises: and calculating the square sum of the difference between the operator execution time and the actual execution time of each operator to obtain a loss function of the graph convolution network.
In one embodiment, the graph convolution network is a graph convolution neural network created based on a directed acyclic graph convolution function of a propagation rule, and the graph convolution neural network comprises: there are acyclic graph convolution layers and regression layers.
In one embodiment, the step of obtaining the dependency feature value comprises: and inputting the node attribute matrix into the trained graph convolution network, and taking out the output of convolution layers of the graph convolution network through a forward propagation algorithm to obtain a dependency characteristic value of the graph dependency relationship of the operator.
Specifically, the structure of the graph convolution neural network is shown in fig. 2, the first layer is a DAG convolution layer, in this layer, a graph convolution neural network is created by using a DAG convolution function based on propagation rules, the second layer is a regression layer including ten neurons (the number of which can be configured as required), the input of the graph convolution neural network is a node attribute matrix, each row element of the matrix represents all attribute values of one node, specifically, the type of an operator, the size of a data partition, the number of memory resources, the number of CPU cores, and the number of calculation tasks, and it is noted that the type of the operator is not a numerical value, and thus word vector embedding is performed.
In a DAG convolution layer, node attributes in the DAG are transmitted to neighboring nodes along DAG dependencies (i.e., directed edges), this transmission process is used for all nodes, after a node receives node attributes of the neighboring nodes, a node representation of the node is calculated, in each iteration process of neural network training, the representation of a node is continuously updated, as shown in fig. 3, when an ith node is updated, an upstream node on which the node depends sends its own node attributes to an ith node, and the ith node may be represented by the following formula:
Figure BDA0002292728980000071
wherein, viRepresenting the ith node representation, theta representing the network parameters of the DAG convolutional layer, Ni representing the dependent node set of the ith node, cijRepresents a normalized coefficient having a value of
Figure BDA0002292728980000072
D% represents the sum of the diagonal matrix D and the identity matrix I. It can be seen that the complexity of the iterative process described above is related to the number of edges of the DAG.
After the node attribute matrix is processed by the DAG convolutional layer, through the forward propagation function of the DAG convolutional layer, the attribute of the node, DAG dependency and other information are converted into the representation of the node, the node representation is the hidden feature in the DAG which needs to be obtained, and after the DAG convolutional layer is trained, the dependency feature value of the graph-like dependency relationship can be obtained.
For the training process, in the graph convolution neural network, the output of the DAG convolution layer is used as the input of a regression layer, and the regression layer maps the node representation of the DAG into the execution time of the corresponding operator, which is marked as Top. The role of the regression layer is to model the functional relationship of the operator mapping to the execution time. The output of the convolutional neural network is ToutIn the process of training the graph convolution neural network, sigma is adoptedi∈N(Tout-Top)2As a loss function, the training process updates the network parameters using a standard stochastic gradient descent algorithm.
In one embodiment, the step of obtaining the sample features comprises: and extracting the size of input data in the Spark operation, the number of memory resources allocated to the Spark operation and the number of computing resources allocated to the Spark operation as display characteristic values, and splicing the display characteristic values and the dependence characteristic values to obtain sample characteristics.
In one embodiment, the predictive model is a fully-connected neural network model trained using a Bayesian regularized back-propagation function.
Specifically, the prediction module for predicting Spark operation time includes a convolution neural network and a fully-connected neural network model, the specific structure is as shown in fig. 4, the convolution neural network is used to obtain implicit features (dependent feature values) included in a DAG, then the implicit features of the DAG and other explicit features are input to the prediction model together, the prediction period is used for predicting the completion time of data parallel operation, the predictor adopts the fully-connected neural network model and includes an input layer, five hidden layers (the number of which can be configured as required) and an output layer, only one neuron is needed to be used in the output layer, the output of the neuron is the predicted value of the operation completion time, the neurons adopt a fully-connected mode, and a bayesian regularization back propagation function is adopted to train the fully-connected neural network model.
It should be understood that, although the various steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
In one embodiment, as shown in fig. 5, there is provided a Spark operation time prediction apparatus based on a graph convolution network, including: an implicit feature acquisition module 502, a stitching module 504, and a temporal prediction module 506, wherein:
an implicit feature obtaining module 502, configured to obtain a directed acyclic graph of Spark operation; constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector; inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator; according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
a splicing module 504, configured to extract an explicit feature value in Spark operation, and splice the explicit feature value and the dependent feature value to obtain a sample feature;
and the time prediction module 506 is configured to obtain a prediction model according to the sample characteristics and the loss function training, and predict Spark operation time according to the prediction model.
In one embodiment, the implicit feature obtaining module 502 is further configured to construct a multivariate vector of each operator according to an operator type, a data partition size, a memory resource number, a CPU core number, and a computation task number of each operator in the directed acyclic graph; wherein the operator type adopts a word vector to be embedded into the multivariate vector; and carrying out topological sorting on operators in the directed acyclic graph according to width-first search, and splicing the multivariate vectors according to sorting results of the operators to obtain a node attribute matrix.
In one embodiment, the implicit feature obtaining module 502 is further configured to calculate a sum of squares of differences between the operator execution time and actual execution time of each operator, so as to obtain a loss function of the graph convolution network.
In one embodiment, the graph convolution network involved in the implicit feature acquisition module 502 is a graph convolution neural network created based on a directed acyclic graph convolution function of a propagation rule; the graph convolution neural network includes: there are acyclic graph convolution layers and regression layers.
In one embodiment, the implicit feature obtaining module 502 is further configured to input the node attribute matrix into a trained graph convolution network, and extract an output of a convolution layer of the graph convolution network through a forward propagation algorithm to obtain a dependency feature value of a graph dependency relationship of an operator.
In one embodiment, the splicing module 504 is further configured to extract the size of the input data in the Spark job, the number of memory resources allocated to the Spark job, and the number of computing resources allocated to the Spark job as display characteristic values; and splicing the explicit characteristic value and the dependent characteristic value to obtain the sample characteristic.
In one embodiment, the prediction model involved in the temporal prediction module 506 is a fully-connected neural network model trained using a bayesian regularized back-propagation function.
For specific limitations of the Spark operation time prediction device based on the graph convolution network, reference may be made to the above limitations of the Spark operation time prediction method based on the graph convolution network, and details thereof are not repeated herein. The modules in the above Spark operation time prediction device based on the graph convolution network may be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a terminal, and its internal structure diagram may be as shown in fig. 6. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a Spark job time prediction method based on a graph and volume network. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
Those skilled in the art will appreciate that the architecture shown in fig. 6 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In an embodiment, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method in the above embodiments when the processor executes the computer program.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the method in the above-mentioned embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A Spark job time prediction method based on a graph convolution network, the method comprising:
acquiring a directed acyclic graph of Spark operation;
constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector;
inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator;
according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
extracting an explicit characteristic value in Spark operation, and splicing the explicit characteristic value and the dependent characteristic value to obtain sample characteristics;
and training according to the sample characteristics and the loss function to obtain a prediction model, and predicting Spark operation time according to the prediction model.
2. The method according to claim 1, wherein constructing a multivariate vector of each operator according to operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector comprises:
constructing a multivariate vector of each operator according to the operator type, the data partition size, the memory resource quantity, the CPU core quantity and the calculation task quantity of each operator in the directed acyclic graph; wherein the operator type adopts a word vector to be embedded into the multivariate vector;
and carrying out topological sorting on operators in the directed acyclic graph according to width-first search, and splicing the multivariate vectors according to sorting results of the operators to obtain a node attribute matrix.
3. The method of claim 1, wherein obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator comprises:
and calculating the square sum of the difference between the operator execution time and the actual execution time of each operator to obtain a loss function of the graph convolution network.
4. The method of claim 1, wherein the graph convolution network is a graph convolution neural network created based on a directed acyclic graph convolution function of a propagation rule; the graph convolution neural network includes: there are acyclic graph convolution layers and regression layers.
5. The method according to any one of claims 1 to 4, wherein inputting the node attribute matrix into a trained graph convolution network, extracting convolution layer output, and obtaining a dependent characteristic value of a graph-like dependency relationship of an operator comprises:
and inputting the node attribute matrix into the trained graph convolution network, and taking out the output of the convolution layer of the graph convolution network through a forward propagation algorithm to obtain a dependency characteristic value of the graph dependency relationship of the operator.
6. The method according to any one of claims 1 to 4, wherein extracting an explicit feature value in a Spark job, and concatenating the explicit feature value with the dependent feature value to obtain a sample feature comprises:
extracting the size of input data in Spark operation, the amount of memory resources allocated to Spark operation and the amount of computing resources allocated to Spark operation as display characteristic values;
and splicing the explicit characteristic value and the dependent characteristic value to obtain the sample characteristic.
7. The method according to any one of claims 1 to 4, wherein the predictive model is a fully-connected neural network model trained using a Bayesian regularized back-propagation function.
8. A Spark operation time prediction apparatus based on a graph convolution network, the apparatus comprising:
the implicit characteristic acquisition module is used for acquiring a directed acyclic graph of Spark operation; constructing a multivariate vector of each operator according to the operation information of each operator in the directed acyclic graph, and obtaining a node attribute matrix according to the multivariate vector; inputting the node attribute matrix into a graph convolution network, outputting operator execution time, and obtaining a loss function of the graph convolution network according to the operator execution time and the actual execution time of each operator; according to the loss function, the graph convolution network is trained through back propagation, the node attribute matrix is input into the trained graph convolution network, the convolution layer output is extracted, and the dependency characteristic value of the graph-like dependency relationship of the operator is obtained;
the splicing module is used for extracting an explicit characteristic value in Spark operation and splicing the explicit characteristic value with the dependent characteristic value to obtain sample characteristics;
and the time prediction module is used for obtaining a prediction model according to the sample characteristics and the loss function training and predicting Spark operation time according to the prediction model.
9. A computer device comprising a memory and a processor, the memory storing a computer program, wherein the processor implements the steps of the method of any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN201911187393.0A 2019-11-28 2019-11-28 Spark operation time prediction method and device based on graph convolution network Active CN111126668B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911187393.0A CN111126668B (en) 2019-11-28 2019-11-28 Spark operation time prediction method and device based on graph convolution network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911187393.0A CN111126668B (en) 2019-11-28 2019-11-28 Spark operation time prediction method and device based on graph convolution network

Publications (2)

Publication Number Publication Date
CN111126668A true CN111126668A (en) 2020-05-08
CN111126668B CN111126668B (en) 2022-06-21

Family

ID=70497289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911187393.0A Active CN111126668B (en) 2019-11-28 2019-11-28 Spark operation time prediction method and device based on graph convolution network

Country Status (1)

Country Link
CN (1) CN111126668B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708923A (en) * 2020-06-24 2020-09-25 北京松鼠山科技有限公司 Method and device for determining graph data storage structure
CN112101538A (en) * 2020-09-23 2020-12-18 成都市深思创芯科技有限公司 Graph neural network hardware computing system and method based on memory computing
CN112287603A (en) * 2020-10-29 2021-01-29 上海淇玥信息技术有限公司 Prediction model construction method and device based on machine learning and electronic equipment
CN112286990A (en) * 2020-10-29 2021-01-29 上海淇玥信息技术有限公司 Method and device for predicting platform operation execution time and electronic equipment
CN112633516A (en) * 2020-12-18 2021-04-09 上海壁仞智能科技有限公司 Performance prediction and machine learning compilation optimization method and device
CN113095491A (en) * 2021-06-09 2021-07-09 北京星天科技有限公司 Sea chart drawing prediction model training and sea chart drawing workload prediction method and device
CN113391907A (en) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 Task placement method, device, equipment and medium
CN113703741A (en) * 2021-10-29 2021-11-26 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
EP3975060A1 (en) * 2020-09-29 2022-03-30 Samsung Electronics Co., Ltd. Method and apparatus for analysing neural network performance
CN114565001A (en) * 2020-11-27 2022-05-31 深圳先进技术研究院 Automatic tuning method for graph data processing framework based on random forest

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065336A1 (en) * 2017-08-24 2019-02-28 Tata Consultancy Services Limited System and method for predicting application performance for large data size on big data cluster
CN110263869A (en) * 2019-06-25 2019-09-20 咪咕文化科技有限公司 Method and device for predicting duration of Spark task
CN110321222A (en) * 2019-07-01 2019-10-11 中国人民解放军国防科技大学 Decision tree prediction-based data parallel operation resource allocation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190065336A1 (en) * 2017-08-24 2019-02-28 Tata Consultancy Services Limited System and method for predicting application performance for large data size on big data cluster
CN110263869A (en) * 2019-06-25 2019-09-20 咪咕文化科技有限公司 Method and device for predicting duration of Spark task
CN110321222A (en) * 2019-07-01 2019-10-11 中国人民解放军国防科技大学 Decision tree prediction-based data parallel operation resource allocation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘思宇等: "Spark平台中任务执行时间预测方法研究", 《软件导刊》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111708923A (en) * 2020-06-24 2020-09-25 北京松鼠山科技有限公司 Method and device for determining graph data storage structure
CN112101538A (en) * 2020-09-23 2020-12-18 成都市深思创芯科技有限公司 Graph neural network hardware computing system and method based on memory computing
CN112101538B (en) * 2020-09-23 2023-11-17 成都市深思创芯科技有限公司 Graphic neural network hardware computing system and method based on memory computing
EP3975060A1 (en) * 2020-09-29 2022-03-30 Samsung Electronics Co., Ltd. Method and apparatus for analysing neural network performance
CN112286990A (en) * 2020-10-29 2021-01-29 上海淇玥信息技术有限公司 Method and device for predicting platform operation execution time and electronic equipment
CN112287603A (en) * 2020-10-29 2021-01-29 上海淇玥信息技术有限公司 Prediction model construction method and device based on machine learning and electronic equipment
CN114565001A (en) * 2020-11-27 2022-05-31 深圳先进技术研究院 Automatic tuning method for graph data processing framework based on random forest
CN112633516A (en) * 2020-12-18 2021-04-09 上海壁仞智能科技有限公司 Performance prediction and machine learning compilation optimization method and device
CN112633516B (en) * 2020-12-18 2023-06-27 上海壁仞智能科技有限公司 Performance prediction and machine learning compiling optimization method and device
CN113095491A (en) * 2021-06-09 2021-07-09 北京星天科技有限公司 Sea chart drawing prediction model training and sea chart drawing workload prediction method and device
CN113391907A (en) * 2021-06-25 2021-09-14 中债金科信息技术有限公司 Task placement method, device, equipment and medium
CN113703741A (en) * 2021-10-29 2021-11-26 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium
CN113703741B (en) * 2021-10-29 2022-02-22 深圳思谋信息科技有限公司 Neural network compiler configuration method and device, computer equipment and storage medium

Also Published As

Publication number Publication date
CN111126668B (en) 2022-06-21

Similar Documents

Publication Publication Date Title
CN111126668B (en) Spark operation time prediction method and device based on graph convolution network
JP7413580B2 (en) Generating integrated circuit floorplans using neural networks
CN112434448B (en) Proxy model constraint optimization method and device based on multipoint adding
CN110991649A (en) Deep learning model building method, device, equipment and storage medium
CN109360105A (en) Product risks method for early warning, device, computer equipment and storage medium
CN109614231A (en) Idle server resource discovery method, device, computer equipment and storage medium
CN113703741B (en) Neural network compiler configuration method and device, computer equipment and storage medium
US20140330758A1 (en) Formal verification result prediction
Lehký et al. Reliability calculation of time-consuming problems using a small-sample artificial neural network-based response surface method
CN112749495A (en) Multipoint-point-adding-based proxy model optimization method and device and computer equipment
CN111339724B (en) Method, apparatus and storage medium for generating data processing model and layout
CN110766145A (en) Learning task compiling method of artificial intelligence processor and related product
CN110990135A (en) Spark operation time prediction method and device based on deep migration learning
CN110909975B (en) Scientific research platform benefit evaluation method and device
CN114997036A (en) Network topology reconstruction method, device and equipment based on deep learning
Specking et al. Evaluating a set-based design tradespace exploration process
Zakharova et al. Evaluating State Effectiveness in Control Model of a Generalized Computational Experiment
CN113222014A (en) Image classification model training method and device, computer equipment and storage medium
CN112734008A (en) Classification network construction method and classification method based on classification network
CN110766146B (en) Learning task compiling method of artificial intelligence processor and related product
EP4246375A1 (en) Model processing method and related device
CN112925723B (en) Test service recommendation method and device, computer equipment and storage medium
US20240103920A1 (en) Method and system for accelerating the convergence of an iterative computation code of physical parameters of a multi-parameter system
CN110599377A (en) Knowledge point ordering method and device for online learning
JP7424373B2 (en) Analytical equipment, analytical methods and analytical programs

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant