CN109189572B - Resource estimation method and system, electronic equipment and storage medium - Google Patents

Resource estimation method and system, electronic equipment and storage medium Download PDF

Info

Publication number
CN109189572B
CN109189572B CN201810868916.7A CN201810868916A CN109189572B CN 109189572 B CN109189572 B CN 109189572B CN 201810868916 A CN201810868916 A CN 201810868916A CN 109189572 B CN109189572 B CN 109189572B
Authority
CN
China
Prior art keywords
execution graph
cost
physical execution
physical
graph
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810868916.7A
Other languages
Chinese (zh)
Other versions
CN109189572A (en
Inventor
严欢
夏正勋
吕阿斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yi Tai Fei Liu Information Technology LLC
Original Assignee
Yi Tai Fei Liu Information Technology LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yi Tai Fei Liu Information Technology LLC filed Critical Yi Tai Fei Liu Information Technology LLC
Priority to CN201810868916.7A priority Critical patent/CN109189572B/en
Publication of CN109189572A publication Critical patent/CN109189572A/en
Application granted granted Critical
Publication of CN109189572B publication Critical patent/CN109189572B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3442Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for planning or managing the needed capacity
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5017Task decomposition

Abstract

The embodiment of the invention relates to the technical field of big data, and discloses a resource estimation method and system, electronic equipment and a storage medium. The resource estimation method comprises the following steps: generating all physical execution graphs according to the acquired logic execution graphs; wherein the logic execution diagram corresponds to at least one physical execution diagram; determining a cost value of each physical execution graph according to a cost estimation model; matching the cost value of each physical execution graph with the obtained performance strategy to determine an optimal physical execution graph; wherein the cost value of the optimal physical execution graph is minimal. The resource estimation method avoids the adoption of an estimation method to obtain the data volume in the cost estimation process, obtains the index requirement of a user in the operation process and ensures that the physical execution plan obtained by calculation meets the requirement of the user.

Description

Resource estimation method and system, electronic equipment and storage medium
Technical Field
The embodiment of the invention relates to the technical field of big data, in particular to a resource estimation method and system, electronic equipment and a storage medium.
Background
DAG is an abbreviation for Directed Acyclic Graph (Directed Acyclic Graph). In big data processing, DAG computation often refers to internally decomposing a computation task into several subtasks, and constructing a DAG (directed acyclic graph) structure according to the logical relationship or order between the several subtasks.
DAGs are a very common structure in distributed computing, with applications in various sub-division areas, such as: dryad (microsoft parallel software platform), flumejva (concurrent programming framework) and Tez, etc., the DAG is generally divided into a logical execution graph and a physical execution graph. The core of the logic execution graph is the convenience of expression, and is mainly convenient for application developers to rapidly describe or construct applications. The physical execution diagram is a DAG execution engine layer, and the main purpose is to deploy a DAG computing task expressed in a special way by an upper layer into a physical machine cluster of a lower layer for operation through conversion and mapping, wherein the layer is a core component of DAG computing, the scheduling of the computing task, the fault tolerance of bottom hardware, the transmission of data and management information, the management and normal operation of the whole system and the like are required to be completed by the layer. The physical execution graph is eventually distributed across the physical cluster.
In the process of converting the logic execution diagram into the physical execution diagram, the distribution strategy and the execution mode of the data need to be distinguished according to the characteristics of the data, and the process is called as "physical execution plan optimization". For example, in general, when data is operated, data should be prevented from being transmitted between different computing nodes as much as possible, and delay of IO (Input/Output) is reduced. At this time, data is often transferred between partitions between the same compute nodes. However, when the data size is small and cannot be uniformly distributed in each Partition, the above data distribution strategy is not efficient, and the data should be rebalanced on each Partition and then be subjected to other processing, so that each Partition can be fully utilized for parallel processing. As can be seen from the above example, the optimization of the physical execution is related to some external conditions such as the attributes of the data, and is not invariable, and the optimization is crucial to the performance of the entire distributed execution.
The current mainstream big data system is based on a Cost estimation (Cost Estimate) model as the basis of the optimization of a physical execution plan. It encapsulates some of the cost estimation factors and provides some computational methods (addition, subtraction, multiplication, division) for the cost object and identification and verification of unknown values of these factors. They generally divide the factors of cost estimation into two broad categories: quantifiable cost estimation factors: refers to cost estimation factors (such as the number of bytes of network or I/O) that can be calculated by tracking a quantifiable measurement index; heuristic cost estimation factors: refers to those cost estimation factors that are not quantifiable, and therefore only gives some qualitative empirical values. The factors that are also included in the cost estimate are as follows: network cost; disk I/O cost; central Processing Unit (CPU) cost; heuristic network cost; heuristic disk cost; heuristic CPU cost.
The inventor finds that at least the following problems exist in the prior art: although network IO, disk IO, CPU and other indexes are included in the cost estimation, the indexes are measured by 'input data volume' and have inaccuracy problems, and the indexes are obtained by 'estimation' before DAG execution. For example, when the network bandwidth of a few nodes in a cluster has a bottleneck, the network IO of the nodes inevitably has a performance problem, and the bottleneck of the network is difficult to estimate by the index of "input data volume". In addition, for the same business logic, the indicators of interest may be different for different users, for example, some users may want "low latency" and other users may want "high throughput", in which case the physical execution plan calculated by the "data volume" prediction is less likely to completely meet the user requirements.
Disclosure of Invention
The embodiment of the invention aims to provide a resource estimation method and system, electronic equipment and a storage medium, so that the data volume is prevented from being acquired by adopting an estimation method in the cost estimation process, the index requirement of a user is acquired in the operation process, and the physical execution plan obtained by calculation is ensured to meet the requirement of the user.
In order to solve the above technical problem, an embodiment of the present invention provides a resource estimation method, including the following steps:
generating all physical execution graphs according to the acquired logic execution graphs; wherein, the logic execution diagram corresponds to at least one physical execution diagram;
determining a cost value of each physical execution graph according to the cost estimation model;
matching the cost value of each physical execution graph with the obtained performance strategy to determine an optimal physical execution graph; wherein the cost value of the optimal physical execution graph is minimal.
The embodiment of the present invention further provides a resource estimation system, including: the device comprises a generating module, a first determining module and a second determining module;
the generating module is used for generating all the physical execution graphs according to the acquired logic execution graphs; wherein, the logic execution diagram corresponds to at least one physical execution diagram;
the first determining module is used for determining a cost value of each physical execution graph according to the cost estimation model;
the second determining module is used for matching the cost value of each physical execution graph with the performance strategy to determine the optimal physical execution graph; wherein the cost value of the optimal physical execution graph is minimal.
An embodiment of the present invention also provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor to enable the at least one processor to execute the resource estimation method.
The embodiment of the invention also provides a computer readable storage medium, which stores a computer program, wherein the computer program realizes the resource estimation method when being executed by a processor.
Compared with the prior art, the method and the device have the advantages that after all the physical execution graphs are generated, the cost value of each physical execution graph is determined according to the cost estimation model, the problem that index data are inaccurate due to the fact that the physical execution cost is determined by directly using the preset data volume is avoided, the cost value of each physical execution graph is matched with the obtained performance strategy, the performance strategy can timely feed back the requirements of a user on the resource estimation system, the determined optimal physical execution graph can meet the requirements of the user, and user experience is improved.
In addition, the cost estimation model comprises at least one evaluation index;
determining a cost value for each physical execution graph according to a cost estimation model, comprising: running each physical execution graph within a preset time length; acquiring each evaluation index value in the running process of each physical execution graph; and determining a cost value of the physical execution graph according to each evaluation index value.
In the embodiment, the cost estimation model comprises at least one estimation index, and the estimation index value is determined by adopting an operation mode within a preset time length, so that the accuracy of resource estimation is improved, and the requirements of users can be better met.
In addition, the performance policy includes a mapping relationship between the cost data and the performance requirement;
matching the cost value of each physical execution graph with the obtained performance strategy to determine an optimal physical execution graph, wherein the method comprises the following steps: determining corresponding cost data according to the acquired performance requirements; matching the cost data with the cost value of each physical execution graph; and determining the physical execution graph with the minimum cost value meeting the performance strategy.
In the embodiment, the concrete cost data corresponding to the performance requirement of the user can convert the abstract performance requirement into the quantitative requirement, so that the user experience is improved, and the accuracy of resource estimation is further improved.
In addition, before the cost value of each physical execution graph is matched with the performance strategy and the optimal physical execution graph is determined, the resource estimation method further comprises the following steps: and acquiring the performance strategy input by the user.
In the embodiment, the performance strategy input by the user is obtained in the resource estimation process, so that the finally determined physical execution diagram better meets the user requirements, the use requirements of the user are met, and the user experience and the accuracy of resource estimation are further improved.
In addition, the evaluation index includes: network input/output cost, disk input/output cost, Central Processing Unit (CPU) cost, heuristic network input/output cost, heuristic disk cost, and heuristic CPU cost.
In addition, before generating all the physical execution diagrams according to the acquired logic execution diagram, the resource estimation method further includes: generating an operator according to the written program; generating a logic execution graph of the directed acyclic graph DAG according to the operator; wherein the logic execution graph comprises operators.
In addition, generating all the physical execution diagrams according to the acquired logic execution diagrams comprises the following steps: determining all partition data distribution strategies according to operators in the logic execution graph; the data distribution of at least two partitions in different partition data distribution strategies is different; and determining the physical execution graph corresponding to each partition data distribution strategy to obtain all the physical execution graphs.
Drawings
One or more embodiments are illustrated by the corresponding figures in the drawings, which are not meant to be limiting.
FIG. 1 is a flowchart illustrating a resource estimation method according to a first embodiment of the present invention;
FIG. 2 is a flowchart illustrating a resource estimation method according to a second embodiment of the present invention;
FIG. 3 is a diagram illustrating a resource estimation system according to a third embodiment of the present invention;
FIG. 4 is a diagram illustrating a resource estimation system according to a fourth embodiment of the present invention;
fig. 5 is a block diagram of an electronic apparatus according to a fifth embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. However, it will be appreciated by those of ordinary skill in the art that numerous technical details are set forth in order to provide a better understanding of the present application in various embodiments of the present invention. However, the technical solution claimed in the present application can be implemented without these technical details and various changes and modifications based on the following embodiments.
A first embodiment of the present invention relates to a resource estimation method. The specific flow is shown in figure 1. The method comprises the following steps:
step 101: and generating all physical execution diagrams according to the acquired logic execution diagram.
Wherein, the logic execution diagram corresponds to at least one physical execution diagram. Specifically, after the program written by the user is acquired, a logic execution diagram is generated according to the program written by the user, and then all physical execution diagrams are determined.
Specifically, step 101 specifically includes: determining all partition data distribution strategies according to operators in the logic execution graph; the data distribution of at least two partitions in different partition data distribution strategies is different; and determining the physical execution graph corresponding to each partition data distribution strategy to obtain all the physical execution graphs.
In a specific implementation, a program written by a user is also referred to as a service logic written by the user, and generally, the user writes the program through an Application Programming Interface (API) provided by a system for writing the program, where the API includes a preset definition function, so that the user writes the service logic, where the preset definition function includes an operator, and the operator is a mapping relationship in the function, and therefore, after the user writes the service logic, a logic execution diagram based on the operator is generated, where the logic execution diagram focuses on describing the service logic written by the user through the operator.
It is worth mentioning that, when writing business logic, the user uses API to process data, and the specific processing method includes but is not limited to: map (data conversion process): processing the single element, and outputting the single element; FlatMap (data conversion process): processing a single element, and outputting a plurality of elements; filter (filtration treatment): filtering the data; count (statistical treatment): the data were counted statistically. Specifically, after the user edits the business logic, a logic execution diagram is generated according to the logic sequence of the API called in the process of writing the program by the user.
Specifically, a possible partition distribution strategy is determined according to an operator in the logic execution diagram, and all physical execution plans are determined as much as possible according to the data distribution strategy, and the determination of the partition distribution strategy includes, but is not limited to, the following: data are transmitted between the partitions with the same operator number in the same computing node, and the transmission efficiency is high because network transmission is not involved; the transmission of a target partition is carried out among all the computing nodes according to the Hash value (Hash) of the attribute (Key) in the data, and the data pair of the same Key is ensured to be transmitted to the same partition, so that the data transmission among all the nodes is related to the output of a network; data is randomly distributed among partitions on all compute nodes. And generating all the physical execution diagrams according to all the possibilities of data distribution in each partition, and further determining all the physical execution diagrams.
Step 102: and determining a cost value of each physical execution graph according to the cost estimation model.
Specifically, the cost estimation model comprises at least one evaluation index; wherein the evaluation index includes: network input/output cost, disk input/output cost, Central Processing Unit (CPU) cost, heuristic network input/output cost, heuristic disk cost, and heuristic CPU cost.
Specifically, the evaluation index further includes a utilization rate of a CPU in the execution process of the physical execution graph, a network Input/Output (I/O), a disk I/O network delay, and the like, specifically, the utilization rate of the CPU represents an average CPU utilization rate of a computing node in the execution process of the physical execution graph, the network I/O represents an average value of network I/O in the execution process of the physical execution graph, the disk I/O represents an average value of disk I/O in the execution process of the physical execution graph, and the network delay represents an end-to-end time delay of a network channel in the execution process of the physical execution graph.
It should be noted that the data for obtaining the evaluation index during the execution of the physical execution diagram can specify the cost value of the physical execution diagram. The more data of the evaluation index is obtained, the more cost values in the physical execution diagram can be embodied, and the evaluation index mentioned above is only an exemplary illustration and is not particularly limited here.
Specifically, step 102 specifically includes: running each physical execution graph within a preset time length; acquiring each evaluation index value in the running process of each physical execution graph; and determining a cost value of the physical execution graph according to each evaluation index value.
After the execution according to the physical execution plan within the preset time length, for example, the evaluation values of the evaluation indexes such as the CPU utilization, the network I/O, the disk I/O, the network latency, etc. are obtained, and a total cost value, for example, Cn, is obtained according to the obtained evaluation values of the evaluation indexes, where n represents the nth physical execution plan.
It should be noted that, determining a cost value according to a physical execution graph executed within a preset time can ensure that a cost value of a current system to the physical execution graph is obtained in a resource estimation process, and the process can be understood as dynamically obtaining a cost value of a current system running the physical execution graph of a user.
Step 103: and matching the cost value of each physical execution graph with the obtained performance strategy to determine the optimal physical execution graph.
Wherein the cost value of the optimal physical execution graph is minimal.
In particular, the performance policy includes a mapping between cost data and performance requirements. Then what needs to be performed before proceeding to step 103 is: and acquiring the performance strategy input by the user. For example, the performance policy of a user may be: high throughput or low latency. Wherein, LOW LATENCY (LOW _ LATENCY) means that the end-to-end delay in the network channel is guaranteed to be the minimum as possible under the performance policy; HIGH throughput (HIGH _ throughput), which means that the maximum amount of data processed in a unit time is guaranteed under the performance policy, maximizes the utilization of the CPU, the disk I/O, the network I/O, and the like in the corresponding performance index. The performance policies described above are merely exemplary.
Specifically, after the user sets the performance policy, the system needs to convert the performance requirement of the user into a numerical limit, and also needs to convert the performance index into a requirement of a specific evaluation index, for example, under the policy of LOW _ LATENCY, the utilization rate of the CPU is not greater than 10%, the average read of the disk I/O is not higher than 10000 bytes per second, the delay of the network I/O is 0.001ms (64 bytes per message), and the like. In addition, when the user sets the performance policy, the requirement of the evaluation index may be added according to the requirement, which is not specifically limited herein.
Specifically, step 103 specifically includes: determining corresponding cost data according to the acquired performance requirements; matching the cost data with the cost value of each physical execution graph; and determining the physical execution graph with the minimum cost value meeting the performance strategy.
After the performance requirements of the users are obtained, the values of corresponding cost values are determined according to the mapping relation between the performance requirements and the cost data, matching the cost value data and the cost values of each physical execution graph is to screen out the physical execution graphs meeting the performance requirements of the users, the cost values of the physical execution graphs meeting the requirements of the users are arranged from small to large, and the physical execution graph with the minimum cost value is selected as the final physical execution graph.
It should be noted that, in the process of determining the optimal physical execution diagram, the performance requirement of the user is obtained, and the search is performed in all the physical execution diagrams, and each physical execution diagram is operated once within the preset time to obtain the cost value of each physical execution diagram, so that the local optimization is avoided, and compared with general open-source data processing, the method can better meet the requirement of the user, and has higher execution efficiency.
Compared with the prior art, after all the physical execution graphs are generated, the cost value of each physical execution graph is determined according to the cost estimation model, the problem that index data are inaccurate due to the fact that the preset data volume is directly used for determining the physical execution cost is avoided, and the cost value of each physical execution graph is matched with the obtained performance strategy, wherein the performance strategy can timely feed back the requirements of a user on the resource estimation system, so that the determined optimal physical execution graph meets the requirements of the user, and the user experience is improved.
A second embodiment of the present invention relates to a resource estimation method. The second embodiment is substantially the same as the first embodiment, and mainly differs therefrom in that: in the second embodiment of the present invention, the execution steps are performed before all the physical execution diagrams are generated. The specific flow is shown in fig. 2.
Specifically, the resource estimation method comprises the following steps:
it should be noted that steps 203 to 205 are the same as steps 101 to 103 in the first embodiment, and are not described again here.
Step 201: and generating an operator according to the written program.
Step 202: generating a logic execution graph of the directed acyclic graph DAG according to the operator; wherein the logic execution graph comprises operators.
Specifically, before determining the physical execution graph, a corresponding DAG logic execution graph needs to be generated according to a program written by a user, where the logic execution graph of the DAG structure refers to a logic execution graph in which a computing task can be understood as a program written by a user, and is internally decomposed into a plurality of sub-tasks, and a logical relationship or an order between the sub-tasks is constructed into the DAG structure.
It should be noted that, the relationship between the user-written program and the corresponding operator has already been described in step 101 in the first embodiment, and is not described herein again.
The steps of the above methods are divided for clarity, and the implementation may be combined into one step or split some steps, and the steps are divided into multiple steps, so long as the same logical relationship is included, which are all within the protection scope of the present patent; it is within the scope of the patent to add insignificant modifications to the algorithms or processes or to introduce insignificant design changes to the core design without changing the algorithms or processes.
A third embodiment of the present invention relates to a resource estimation system, as shown in fig. 3, including: a generating module 301, a first determining module 302 and a second determining module 303;
a generating module 301, configured to generate all physical execution graphs according to the acquired logic execution graph; wherein, the logic execution diagram corresponds to at least one physical execution diagram;
a first determining module 302, configured to determine a cost value of each physical execution graph according to a cost estimation model;
a second determining module 303, configured to match the cost value of each physical execution graph with the performance policy, and determine an optimal physical execution graph; wherein the cost value of the optimal physical execution graph is minimal.
It should be noted that this embodiment is a system embodiment corresponding to the first embodiment, and this embodiment can be implemented in cooperation with the first embodiment. The related technical details mentioned in the first embodiment are still valid in this embodiment, and are not described herein again in order to reduce repetition.
It should be noted that each module referred to in this embodiment is a logical module, and in practical applications, one logical unit may be one physical unit, may be a part of one physical unit, and may be implemented by a combination of multiple physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems proposed by the present invention are not introduced in the present embodiment, but this does not indicate that other elements are not present in the present embodiment.
The fourth embodiment of the present invention relates to a resource estimation system. The fourth embodiment is substantially the same as the third embodiment, and mainly differs therefrom in that: in the fourth embodiment of the present invention, it is specifically described that the resource prediction system further includes an operator generation module and a DAG generation module, and a specific structure is shown in fig. 4.
It should be noted that only the added modules are described in this embodiment, and the modules described in the third embodiment are not described again.
The resource prediction system comprises an operator generation module 401 and a DAG generation module 402.
And an operator generating module 401, configured to generate an operator according to the written program.
A DAG generation module 402, configured to generate a logic execution graph of the directed acyclic graph DAG according to the operator; wherein the logic execution graph comprises operators.
Since the second embodiment corresponds to the present embodiment, the present embodiment can be implemented in cooperation with the second embodiment. The related technical details mentioned in the second embodiment are still valid in this embodiment, and the technical effects that can be achieved in the second embodiment can also be achieved in this embodiment, and are not described herein again in order to reduce the repetition.
A fifth embodiment of the present invention relates to an electronic device, and a specific structure thereof is shown in fig. 5. Comprises at least one processor 501; and a memory 502 communicatively coupled to the at least one processor 501. The memory 502 stores instructions executable by the at least one processor 501, and the instructions are executed by the at least one processor 501 to enable the at least one processor 501 to perform the resource estimation method.
In this embodiment, the processor 501 is a Central Processing Unit (CPU), and the Memory 502 is a Random Access Memory (RAM). The processor 501 and the memory 502 may be connected by a bus or other means, and fig. 5 illustrates the connection by the bus as an example. The memory 502 is a non-volatile computer-readable storage medium, which can be used to store non-volatile software programs, non-volatile computer-executable programs, and modules, such as the programs implementing the resource estimation method in the embodiments of the present application, stored in the memory 502. The processor 501 executes various functional applications and data processing of the device by running nonvolatile software programs, instructions, and modules stored in the memory 502, that is, the resource estimation method is implemented.
The memory 502 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, memory 502 may optionally include memory located remotely from processor 501, which may be connected to an external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
One or more program modules are stored in the memory 502 and, when executed by the one or more processors 501, perform the resource estimation method of the first or second method embodiments described above.
The product can execute the resource estimation method provided by the embodiment of the application, has the corresponding functional modules and beneficial effects of the execution method, does not describe the technical details in the embodiment in detail, and can refer to the resource estimation method provided by the embodiment of the application.
A sixth embodiment of the present invention relates to a computer-readable storage medium, which is a computer-readable storage medium having computer instructions stored therein, the computer instructions enabling a computer to execute the resource estimation method according to the first or second method embodiment of the present application.
Those skilled in the art can understand that all or part of the steps in the method of the foregoing embodiments may be implemented by a program to instruct related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, etc.) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory, a magnetic disk, or an optical disk.
It will be understood by those of ordinary skill in the art that the foregoing embodiments are specific embodiments for practicing the invention, and that various changes in form and details may be made therein without departing from the spirit and scope of the invention in practice.

Claims (7)

1. A resource prediction method applies a directed acyclic graph DAG, the DAG is divided into a logic execution graph and a physical execution graph, wherein the logic execution graph decomposes a computing task into a DAG structure formed by a plurality of subtasks, the physical execution graph deploys the DAG computing task expressed at an upper layer into a physical machine cluster at a lower layer through conversion and mapping for operation, and the method comprises the following steps of:
generating an operator according to the written program; generating a logic execution graph of a directed acyclic graph DAG according to the operator; wherein the logic execution graph comprises the operator;
generating all physical execution graphs according to the acquired logic execution graphs; wherein the logic execution diagram corresponds to at least one physical execution diagram; the method specifically comprises the following steps: determining all partition data distribution strategies according to operators in the logic execution graph; the data distribution of at least two partitions in different partition data distribution strategies is different; determining a physical execution graph corresponding to each partition data distribution strategy to obtain all the physical execution graphs;
determining a cost value of each physical execution graph according to a cost estimation model;
matching the cost value of each physical execution graph with the obtained performance strategy to determine an optimal physical execution graph; wherein the cost value of the optimal physical execution graph is minimal;
wherein the performance policy comprises a mapping relationship between cost data and performance requirements;
matching the cost value of each physical execution graph with the obtained performance strategy to determine an optimal physical execution graph, wherein the method comprises the following steps: determining corresponding cost data according to the acquired performance requirements; matching the cost data with the cost value of each physical execution graph; and determining the physical execution graph with the minimum cost value meeting the performance strategy.
2. The resource estimation method according to claim 1, wherein the cost estimation model includes at least one evaluation index;
determining a cost value for each of the physical execution graphs according to a cost estimation model, comprising:
running each physical execution graph within a preset time length;
acquiring each evaluation index value in the running process of each physical execution graph;
and determining a cost value of the physical execution graph according to each evaluation index value.
3. The resource estimation method according to claim 1, wherein the resource estimation method further includes, before matching the cost value of each physical execution graph with the performance policy and determining the optimal physical execution graph:
and acquiring the performance strategy input by the user.
4. The resource estimation method according to claim 2, wherein the evaluation index includes: network input/output cost, disk input/output cost, Central Processing Unit (CPU) cost, heuristic network input/output cost, heuristic disk cost, and heuristic CPU cost.
5. A resource prediction system is applied to a Directed Acyclic Graph (DAG) which is divided into a logic execution graph and a physical execution graph, wherein the logic execution graph decomposes a computing task into a DAG structure formed by a plurality of subtasks, and the physical execution graph deploys the DAG computing task expressed at an upper layer into a physical machine cluster at a lower layer through conversion and mapping for operation;
the generating module is used for generating an operator according to the written program; generating a logic execution graph of a directed acyclic graph DAG according to the operator; wherein the logic execution graph comprises the operator; generating all physical execution graphs according to the acquired logic execution graphs; wherein the logic execution diagram corresponds to at least one physical execution diagram; the method specifically comprises the following steps: determining all partition data distribution strategies according to operators in the logic execution graph; the data distribution of at least two partitions in different partition data distribution strategies is different; determining a physical execution graph corresponding to each partition data distribution strategy to obtain all the physical execution graphs;
the first determining module is used for determining a cost value of each physical execution graph according to a cost estimation model;
the second determining module is configured to match the cost value of each physical execution graph with a performance policy to determine an optimal physical execution graph; wherein the cost value of the optimal physical execution graph is minimal;
wherein the performance policy comprises a mapping relationship between cost data and performance requirements;
matching the cost value of each physical execution graph with the obtained performance strategy to determine an optimal physical execution graph, wherein the method comprises the following steps: determining corresponding cost data according to the acquired performance requirements; matching the cost data with the cost value of each physical execution graph; and determining the physical execution graph with the minimum cost value meeting the performance strategy.
6. An electronic device, comprising:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the resource projection method of any one of claims 1-4.
7. A computer-readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the resource estimation method of any one of claims 1 to 4.
CN201810868916.7A 2018-08-02 2018-08-02 Resource estimation method and system, electronic equipment and storage medium Active CN109189572B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810868916.7A CN109189572B (en) 2018-08-02 2018-08-02 Resource estimation method and system, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810868916.7A CN109189572B (en) 2018-08-02 2018-08-02 Resource estimation method and system, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN109189572A CN109189572A (en) 2019-01-11
CN109189572B true CN109189572B (en) 2021-06-04

Family

ID=64920460

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810868916.7A Active CN109189572B (en) 2018-08-02 2018-08-02 Resource estimation method and system, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN109189572B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111796917A (en) * 2019-04-09 2020-10-20 华为技术有限公司 Operator operation scheduling method and device
CN111158901A (en) * 2019-12-09 2020-05-15 北京迈格威科技有限公司 Optimization method and device of computation graph, computer equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117286A (en) * 2015-09-22 2015-12-02 北京大学 Task scheduling and pipelining executing method in MapReduce
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN107038070A (en) * 2017-04-10 2017-08-11 郑州轻工业学院 The Parallel Task Scheduling method that reliability is perceived is performed under a kind of cloud environment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8607188B2 (en) * 2011-09-06 2013-12-10 International Business Machines Corporation Modeling task-site allocation networks
US9626227B2 (en) * 2015-03-27 2017-04-18 Intel Corporation Technologies for offloading and on-loading data for processor/coprocessor arrangements

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117286A (en) * 2015-09-22 2015-12-02 北京大学 Task scheduling and pipelining executing method in MapReduce
CN105868019A (en) * 2016-02-01 2016-08-17 中国科学院大学 Automatic optimization method for performance of Spark platform
CN107038070A (en) * 2017-04-10 2017-08-11 郑州轻工业学院 The Parallel Task Scheduling method that reliability is perceived is performed under a kind of cloud environment

Also Published As

Publication number Publication date
CN109189572A (en) 2019-01-11

Similar Documents

Publication Publication Date Title
US20170255496A1 (en) Method for scheduling data flow task and apparatus
US8799916B2 (en) Determining an allocation of resources for a job
US8434088B2 (en) Optimized capacity planning
WO2016082693A1 (en) Method and device for scheduling computation tasks in cluster
US20070250630A1 (en) Method and a system of generating and evaluating potential resource allocations for an application
US8479181B2 (en) Interactive capacity planning
US20130339972A1 (en) Determining an allocation of resources to a program having concurrent jobs
CN109189572B (en) Resource estimation method and system, electronic equipment and storage medium
US20140019987A1 (en) Scheduling map and reduce tasks for jobs execution according to performance goals
US8458334B2 (en) Optimized capacity planning
Gibilisco et al. Stage aware performance modeling of dag based in memory analytic platforms
US20130318538A1 (en) Estimating a performance characteristic of a job using a performance model
Burkimsher et al. A survey of scheduling metrics and an improved ordering policy for list schedulers operating on workloads with dependencies and a wide variation in execution times
CN102281290A (en) Emulation system and method for a PaaS (Platform-as-a-service) cloud platform
Nagarajan et al. Flowflex: Malleable scheduling for flows of mapreduce jobs
CN104035747B (en) Method and apparatus for parallel computation
Imai et al. Accurate resource prediction for hybrid IaaS clouds using workload-tailored elastic compute units
Della Vedova et al. Probabilistic provisioning and scheduling in uncertain cloud environments
Kroß et al. Model-based performance evaluation of batch and stream applications for big data
US20200104230A1 (en) Methods, apparatuses, and systems for workflow run-time prediction in a distributed computing system
Richthammer et al. Search-Space decomposition for system-level design space exploration of embedded systems
Rauchecker et al. Using high performance computing for unrelated parallel machine scheduling with sequence-dependent setup times: Development and computational evaluation of a parallel branch-and-price algorithm
Geyer et al. Graph-based deep learning for fast and tight network calculus analyses
CN1783121A (en) Method and system for executing design automation
CN109992408B (en) Resource allocation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant