WO2022110446A1 - Procédé et appareil de simulation pour planification de grappes hétérogènes, dispositif informatique et support de stockage - Google Patents

Procédé et appareil de simulation pour planification de grappes hétérogènes, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2022110446A1
WO2022110446A1 PCT/CN2020/139683 CN2020139683W WO2022110446A1 WO 2022110446 A1 WO2022110446 A1 WO 2022110446A1 CN 2020139683 W CN2020139683 W CN 2020139683W WO 2022110446 A1 WO2022110446 A1 WO 2022110446A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
scheduling
learning load
heterogeneous
load
Prior art date
Application number
PCT/CN2020/139683
Other languages
English (en)
Chinese (zh)
Inventor
叶可江
陈文艳
须成忠
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2022110446A1 publication Critical patent/WO2022110446A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the present application relates to the field of information technology, and in particular, to a simulation method, apparatus, computer equipment and storage medium for scheduling of heterogeneous clusters.
  • a MapReduce simulator can simulate a large-scale cluster with a small number of nodes, and accurately simulate the running process of the job, thus providing a complete Hadoop cluster performance test platform to help solve the test problem of large-scale clusters.
  • Apache also provides a Yarn Scheduler Load Simulator (SLS), it is a tool that can load applications on a machine to simulate a large-scale YARN cluster.
  • SLS Yarn Scheduler Load Simulator
  • the simulator uses the actual YARN ResourceManager, within the same JAVA virtual machine, to remove the network factor by processing and scheduling NM/AMs heartbeat events, simulating NodeManager and ApplicationMaster.
  • the purpose of the embodiments of the present application is to propose a simulation method, device, computer equipment and storage medium for heterogeneous cluster scheduling, so as to at least solve the high complexity of experimental environment preparation and the cost of hardware resource purchase for the traditional heterogeneous cluster scheduling method Expensive issues.
  • an embodiment of the present application provides a simulation method for heterogeneous cluster scheduling, which adopts the following technical solutions:
  • the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling policy
  • the running deep learning workload is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning workload is simulated, and the running behavior characteristics and running status data of the deep learning workload are obtained.
  • the method also includes:
  • the basic deep learning load benchmark is trained based on the training data set, and the trained deep learning load benchmark is obtained;
  • Making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions includes:
  • the deep learning workload benchmark is run according to the operation mode and the instruction scheduling policy.
  • the method also includes:
  • the method also includes:
  • the scheduling performance data is evaluated based on the preset performance index, and the scheduling performance evaluation result is obtained.
  • the embodiment of the present application also provides a simulation device for heterogeneous cluster scheduling, which adopts the following technical solutions:
  • the request receiving module is used to receive the simulation running request sent by the user terminal;
  • the information reading module is used to respond to the simulation operation request and read the local database to obtain the historical heterogeneous resource configuration information
  • the instruction setting module is used to set the operation mode of executing the instruction and the instruction scheduling policy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
  • the load operation module is used to make the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instruction;
  • the simulation operation module is used to simulate and expand the running deep learning load based on the kubernetes virtualization technology.
  • the device also includes:
  • the data set acquisition module is used to read the local database and obtain the training data set;
  • the load training module is used to train the basic deep learning load benchmark based on the training data set to obtain the trained deep learning load benchmark;
  • the load operation module includes:
  • the load running unit is used to make the deep learning load benchmark run according to the running mode and the instruction scheduling policy based on the execution instruction.
  • the device also includes:
  • the device also includes:
  • the performance indicator collection module is used to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
  • the performance evaluation module is used to evaluate the scheduling performance data based on the preset performance index to obtain the scheduling performance evaluation result.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • It includes a memory and a processor, where a computer program is stored in the memory, and when the processor executes the computer program, the steps of the above-mentioned simulation method for scheduling a heterogeneous cluster are implemented.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, the steps of the above-mentioned simulation method for scheduling a heterogeneous cluster are implemented.
  • the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information;
  • the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling policy; based on the execution instruction, the deep learning load runs according to the operation mode and the instruction scheduling policy; based on the kubernetes virtualization technology, the running deep learning load is clustered Node simulation expansion, simulating the operation of large-scale heterogeneous clusters of deep learning workloads, and obtaining the running behavior characteristics and running status data of deep learning workloads.
  • the cluster nodes are simulated and expanded for the running deep learning workload, so as to simulate the large-scale heterogeneous cluster operation of the deep learning workload to obtain the running status data, which can not only realize Simulate the large-scale cluster deployment mode of kubernetes, and can also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the evaluation time of heterogeneous clusters to a certain extent. cost.
  • FIG. 1 is a schematic diagram of an exemplary principle to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for simulating heterogeneous cluster scheduling according to the present application
  • 3 is a flowchart of deep learning load training according to the simulation method of heterogeneous cluster scheduling of the present application
  • FIG. 4 is a flowchart of performance data evaluation of the simulation method of heterogeneous cluster scheduling according to the present application
  • FIG. 5 is a schematic structural diagram of an embodiment of a simulation apparatus for scheduling heterogeneous clusters according to the present application
  • FIG. 6 is a schematic structural diagram of deep learning load training of a simulation device scheduled by a heterogeneous cluster according to the present application
  • FIG. 7 is a schematic structural diagram of performance data evaluation of a simulation device for heterogeneous cluster scheduling according to the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • FIGs. 1-2 a flow chart of an embodiment of a method for simulating scheduling of a heterogeneous cluster according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S1 a simulation running request sent by the user terminal is received.
  • the simulated operation request is an operation request sent by the scheduling R&D personnel in order to obtain the runtime characteristics of the deep learning load under different high-performance heterogeneous hardware configurations and need to provide environmental support.
  • step S2 in response to the simulation running request, the local database is read to obtain historical heterogeneous resource configuration information.
  • the historical heterogeneous resource configuration information is collected before the simulation runs, in order to be able to set the operation mode and API calling relationship of the application program of the deep learning load under different architecture chips in a targeted manner.
  • architecture data can specifically include CPU models (AtomicSimple, TimingSimple, InOrder, O3, etc.) and GPU architectures (Tesla, Fermi, Kepler, Volta, Turing, etc.) and FPGA, TPU and other data.
  • step S3 based on the historical heterogeneous resource configuration information, the pre-trained deep learning load is configured to execute an instruction operation mode and an instruction scheduling policy.
  • the operation mode refers to the operation mode, such as serial operation and parallel operation, when an application program implementing a deep learning load under different architectures executes an instruction.
  • the instruction scheduling policy refers to a scheduling policy when an application implementing a deep learning load under different architectures executes an instruction, and a parameter configuration file of different hardware combinations capable of executing the scheduling policy.
  • the operation mode of executing the instruction on the pre-trained deep learning load based on the historical heterogeneous resource configuration information and the setting of the instruction scheduling policy may specifically be the setting of the existing hardware type based on the historical heterogeneous resource configuration information.
  • Analyze with the architecture so as to set the operation mode such as serial operation and parallel operation when the application program that can realize the deep learning load under different architectures executes the instruction, and the scheduling when the application program that can realize the deep learning load under the different architecture executes the instruction strategy, and the parameter configuration files of different hardware combinations that can execute the scheduling strategy, so that the application running state of the deep learning load under different architectures can be simulated in the future, and the runtime characteristics of the deep learning load under different hardware configurations can be further analyzed.
  • the autonomous configuration function of different hardware architecture combinations simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • step S4 the deep learning load is made to run according to the operation mode and the instruction scheduling strategy based on the execution instruction.
  • making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions may specifically be to make the application program of the deep learning load execute the operation modes such as serial operation and parallel operation under different architectures, and in the Under different architectures, the application of the deep learning load executes the scheduling strategy based on the parameter configuration files of different hardware combinations, which can simulate the running state of the application of the deep learning load under different architectures, and then analyze the deep learning load in different hardware configurations. In this way, it can realize the independent configuration function of different hardware architecture combinations, simulate the underlying resource configuration strategy of heterogeneous clusters, simplify the configuration of heterogeneous environments, and save the cost of purchasing physical hardware.
  • step S5 based on the kubernetes virtualization technology, cluster nodes are simulated and expanded for the running deep learning load, and the large-scale heterogeneous cluster operation of the deep learning load is simulated to obtain the running behavior characteristics and running status data of the deep learning load.
  • Kubernetes has become a container orchestration tool widely used in industry and academia due to its features of portability, scalability and self-healing.
  • the cluster node simulation and expansion of the running deep learning load based on the kubernetes virtualization technology may be implemented by using k8s to deploy the cluster, and based on the k8s container orchestration tool to simulate the characteristics of a small number of nodes by simulating physical machine characteristics.
  • the cluster of large-scale nodes can simulate the operation of large-scale heterogeneous clusters of deep learning loads, so as to accurately obtain the running status data that can intuitively reflect the running status of the cluster. environment support.
  • the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information;
  • the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling strategy; based on the execution instruction, the deep learning load runs according to the operation mode and the instruction scheduling strategy; based on the kubernetes virtualization technology, the running deep learning load is clustered Node simulation expansion, simulating the operation of large-scale heterogeneous clusters of deep learning workloads, and obtaining the running behavior characteristics and running status data of deep learning workloads.
  • the cluster nodes are simulated and expanded for the running deep learning workload, so as to simulate the large-scale heterogeneous cluster operation of the deep learning workload to obtain the running status data, which can not only realize Simulate the large-scale cluster deployment mode of kubernetes, and can also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the evaluation time of heterogeneous clusters to a certain extent. cost.
  • FIG. 3 a flowchart of deep learning load training according to the simulation method for heterogeneous cluster scheduling of the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the method before the above step S4, the method further includes: step S301 and step S302; the above step S4 includes: step S303.
  • step S301 a local database is read to obtain a training data set.
  • step S302 the basic deep learning load benchmark is trained based on the training data set to obtain a trained deep learning load benchmark.
  • step S303 the deep learning load benchmark is made to run according to the operation mode and the instruction scheduling policy based on the execution instruction.
  • the basic deep learning workload benchmark is to select representative benchmarks from several different types of existing deep learning workloads, and is mainly used to test the execution time, transmission speed, throughput, and resource occupancy of the workload. etc. data.
  • the training data set is a training data set of different scales corresponding to each benchmark.
  • the selected basic deep learning load benchmark is trained by collecting training data sets of different scales corresponding to each benchmark, so as to obtain an application program that can be used to test the simulated deep learning load and run under different architectures
  • the deep learning load benchmark and then when the application of the deep learning load benchmark under different architectures executes operation modes such as serial operation and parallel operation, and the application of the deep learning load benchmark under different architectures is based on different hardware combinations
  • the scheduling strategy is executed by the parameter configuration file, it can simulate the running state of the application program of the deep learning load benchmark under different architectures, and then analyze the runtime characteristics of the deep learning load benchmark under different hardware configurations, so as to realize the combination of different hardware architectures.
  • the self-configuration function of the system simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • the method further includes:
  • script editing refers to setting of a configuration file that can freely select performance indicators and sampling intervals.
  • the performance indicators of the application layer of the deep learning load benchmark include CPU utilization, memory utilization, disk IO size, network bandwidth, and the like.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark are data such as IPC (Instructions per Cycle), branch prediction (Branch Predict), and cache misses (Cache misses).
  • the performance indicators of the application layer of the deep learning load benchmark such as CPU utilization, Memory utilization, disk IO size, network bandwidth, etc.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark such as IPC (Instructions per Cycle), branch prediction (Branch Predict) and cache misses (Cache misses) and other data
  • script Set to obtain an indicator collection configuration file that includes the function of freely selecting performance indicators and sampling interval, so that the subsequent collection of performance indicators can be scheduled for the running deep learning load based on the collection configuration file, and the performance of the collected data can be evaluated.
  • the performance evaluation results so as to effectively evaluate the performance of the scheduling algorithm, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • FIG. 4 a flow chart of performance data evaluation of the simulation method for heterogeneous cluster scheduling according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the method further includes: step S401 and step S402.
  • step S401 scheduling performance indicators are collected for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data.
  • step S402 the scheduling performance data is evaluated based on a preset performance index to obtain a scheduling performance evaluation result.
  • the preset performance indicators are set according to actual application requirements and can be used to select key indicators for evaluating scheduling performance, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc. , there is no specific restriction here.
  • the scheduling performance evaluation result is an evaluation index that can be used to intuitively reflect the scheduling performance when the deep learning load benchmark is simulated and run.
  • the scheduling performance indicators are collected for the deep learning load in the simulation operation, and the scheduling performance data that can be used to intuitively reflect the scheduling performance of the deep learning load benchmark simulation operation is obtained. Furthermore, the key indicators that meet the preset performance index requirements are screened out from the scheduling performance data, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc., so as to obtain information that can be used to intuitively reflect The scheduling performance evaluation results of the deep learning load benchmark simulation runtime scheduling performance, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information; Configure resource configuration information to set the operation mode and instruction scheduling strategy of pre-trained deep learning workloads to execute instructions; make deep learning workloads run according to the running mode and instruction scheduling strategy based on execution instructions; based on kubernetes virtualization technology
  • the deep learning load is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning load is simulated, and the running behavior characteristics and running status data of the deep learning load are obtained.
  • the basic deep learning workload benchmark is trained based on the training data set to obtain the trained deep learning workload benchmark, and based on the pre-collected historical heterogeneous resource configuration information, the execution mode of the command and the command scheduling strategy are executed for the deep learning workload benchmark. Setting; and then make the deep learning load benchmark run according to the operation mode and instruction scheduling strategy based on the execution instructions to accurately obtain the running behavior characteristics of the deep learning load; and then simulate and expand the running deep learning load based on the kubernetes virtualization technology.
  • the large-scale heterogeneous cluster operation that simulates the deep learning load to obtain the running status data; then, based on the performance indicators of the application layer of the deep learning load benchmark and the performance indicators of the micro-architecture layer, the index collection configuration file is set, and the deep learning load in operation is analyzed. Collect scheduling performance indicators, and evaluate the collected scheduling performance data to quickly obtain scheduling performance evaluation results. It can not only simulate the large-scale cluster deployment of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the cost of heterogeneous clusters to a certain extent. Assess time and cost.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
  • the present application provides an embodiment of a simulation apparatus for heterogeneous cluster scheduling, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 ,
  • the device can be specifically applied to various electronic devices.
  • the simulation apparatus 100 for heterogeneous cluster scheduling includes: a request receiving module 101 , an information reading module 102 , an instruction setting module 103 , a load running module 104 and a simulation running module 105 . in:
  • a request receiving module 101 configured to receive a simulation running request sent by a user terminal
  • the simulated operation request is an operation request sent by the scheduling R&D personnel to obtain the runtime characteristics of the deep learning load under different high-performance heterogeneous hardware configurations and need to provide environmental support.
  • an information reading module 102 configured to read a local database to obtain historical heterogeneous resource configuration information in response to the simulation operation request;
  • the historical heterogeneous resource configuration information is collected before the simulation operation, in order to be able to set the operation mode and API calling relationship of the application program of the deep learning load under different architecture chips in a targeted manner.
  • architecture data can specifically include CPU models (AtomicSimple, TimingSimple, InOrder, O3, etc.) and GPU architectures (Tesla, Fermi, Kepler, Volta, Turing, etc.) and FPGA, TPU and other data.
  • the instruction setting module 103 is used for setting the operation mode of executing the instruction and the instruction scheduling strategy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
  • the operation mode refers to the operation mode, such as serial operation and parallel operation, when an application program implementing a deep learning load under different architectures executes an instruction.
  • the instruction scheduling policy refers to a scheduling policy when an application implementing a deep learning load under different architectures executes an instruction, and a parameter configuration file of different hardware combinations capable of executing the scheduling policy.
  • the operation mode of executing the instruction on the pre-trained deep learning load based on the historical heterogeneous resource configuration information and the setting of the instruction scheduling policy may specifically be the setting of the existing hardware type based on the historical heterogeneous resource configuration information.
  • Analyze with the architecture so as to set the operation mode such as serial operation and parallel operation when the application program that can realize the deep learning load under different architectures executes the instruction, and the scheduling when the application program that can realize the deep learning load under the different architecture executes the instruction strategy, and the parameter configuration files of different hardware combinations that can execute the scheduling strategy, so that the application running state of the deep learning load under different architectures can be simulated in the future, and the runtime characteristics of the deep learning load under different hardware configurations can be further analyzed.
  • the autonomous configuration function of different hardware architecture combinations simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • a load running module 104 configured to make the deep learning load run according to the running mode and the instruction scheduling policy based on the execution instruction
  • making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions may specifically be to make the application program of the deep learning load execute the operation modes such as serial operation and parallel operation under different architectures, and in the Under different architectures, the application of the deep learning load executes the scheduling strategy based on the parameter configuration files of different hardware combinations, which can simulate the running state of the application of the deep learning load under different architectures, and then analyze the deep learning load in different hardware configurations. In this way, it can realize the independent configuration function of different hardware architecture combinations, simulate the underlying resource configuration strategy of heterogeneous clusters, simplify the configuration of heterogeneous environments, and save the cost of purchasing physical hardware.
  • the simulation operation module 105 is used for performing cluster node simulation expansion on the running deep learning load based on the kubernetes virtualization technology, simulating the large-scale heterogeneous cluster operation of the deep learning load, and obtaining the operation behavior of the deep learning load characteristics and operating status data.
  • Kubernetes has become a container orchestration tool widely used in industry and academia due to its features of portability, scalability and self-healing.
  • the cluster node simulation and expansion of the running deep learning load based on the kubernetes virtualization technology may be implemented by using k8s to deploy the cluster, and based on the k8s container orchestration tool to simulate a small number of nodes by simulating the characteristics of the physical machine.
  • the cluster of large-scale nodes can simulate the operation of large-scale heterogeneous clusters of deep learning loads, so as to accurately obtain the running status data that can intuitively reflect the running status of the cluster. environment support.
  • the present application provides a simulation device for heterogeneous cluster scheduling, which includes: based on pre-collected historical heterogeneous resource configuration information, an operation mode of executing an instruction for a pre-trained deep learning load and setting of an instruction scheduling strategy; Execute the instructions to make the deep learning load run according to the operation mode and the instruction scheduling strategy to accurately obtain the operating behavior characteristics of the deep learning load; further, based on the kubernetes virtualization technology, the running deep learning load is simulated and expanded by cluster nodes to simulate the deep learning load.
  • FIG. 6 there is shown a schematic structural diagram of deep learning load training of a simulation device for heterogeneous cluster scheduling according to the present application. For the convenience of description, only parts related to the present application are shown.
  • the apparatus further includes: a data set acquisition module 601 and a load training module 602 ; the above-mentioned load operation module 104 includes: a load operation unit 603 .
  • the load training module 602 is used to train the basic deep learning load benchmark based on the training data set to obtain the trained deep learning load benchmark;
  • the load running unit 603 is configured to run the deep learning load benchmark according to the running mode and the instruction scheduling policy based on the execution instruction.
  • the basic deep learning workload benchmark is to select representative benchmarks from several different types of existing deep learning workloads, and is mainly used to test the execution time, transmission speed, throughput, and resource occupancy of the workload. etc. data.
  • the training data set is a training data set of different scales corresponding to each benchmark.
  • the selected basic deep learning load benchmark is trained by collecting training data sets of different scales corresponding to each benchmark, so as to obtain an application program that can be used to test the simulated deep learning load and run under different architectures
  • the deep learning load benchmark and then when the application of the deep learning load benchmark under different architectures executes operation modes such as serial operation and parallel operation, and the application of the deep learning load benchmark under different architectures is based on different hardware combinations
  • the scheduling strategy is executed by the parameter configuration file, it can simulate the running state of the application program of the deep learning load benchmark under different architectures, and then analyze the runtime characteristics of the deep learning load benchmark under different hardware configurations, so as to realize the combination of different hardware architectures.
  • the self-configuration function of the system simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • the device further includes:
  • script editing refers to setting of a configuration file that can freely select performance indicators and sampling intervals.
  • the performance indicators of the application layer of the deep learning load benchmark include CPU utilization, memory utilization, disk IO size, network bandwidth, and the like.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark are data such as IPC (Instructions per Cycle), branch prediction (Branch Predict), and cache misses (Cache misses).
  • the performance indicators of the application layer of the deep learning load benchmark such as CPU utilization, Memory utilization, disk IO size, network bandwidth, etc.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark such as IPC (Instructions per Cycle), branch prediction (Branch Predict) and cache misses (Cache misses) and other data
  • script Set to obtain an indicator collection configuration file that includes the function of freely selecting performance indicators and sampling interval, so that the subsequent collection of performance indicators can be scheduled for the running deep learning load based on the collection configuration file, and the performance of the collected data can be evaluated.
  • the performance evaluation results so as to effectively evaluate the performance of the scheduling algorithm, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • FIG. 7 a schematic structural diagram of the performance data evaluation of the simulation apparatus for heterogeneous cluster scheduling according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the apparatus further includes: a performance index collection module 701 and a performance evaluation module 702 .
  • a performance indicator collection module 701 configured to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data
  • the performance evaluation module 702 is configured to evaluate the scheduling performance data based on a preset performance index to obtain a scheduling performance evaluation result.
  • the preset performance indicators are set according to actual application requirements and can be used to select key indicators for evaluating scheduling performance, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc. , there is no specific restriction here.
  • the scheduling performance evaluation result is an evaluation index that can be used to intuitively reflect the scheduling performance when the deep learning load benchmark is simulated and run.
  • the scheduling performance indicators are collected for the deep learning load in the simulation operation, and the scheduling performance data that can be used to intuitively reflect the scheduling performance of the deep learning load benchmark simulation operation is obtained. Furthermore, the key indicators that meet the preset performance index requirements are screened out from the scheduling performance data, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc., so as to obtain information that can be used to intuitively reflect The scheduling performance evaluation results of the deep learning load benchmark simulation runtime scheduling performance, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • the present application provides a simulation device for heterogeneous cluster scheduling, including: a request receiving module for receiving a simulation running request sent by a user terminal; an information reading module for responding to the simulation running request, reading The local database obtains historical heterogeneous resource configuration information; the instruction setting module is used to execute the operation mode of the pre-trained deep learning load and the setting of the instruction scheduling strategy based on the historical heterogeneous resource configuration information; the load operation module is used to Based on the execution of instructions, the deep learning load runs according to the operation mode and the instruction scheduling strategy; the simulation operation module is used to simulate and expand the cluster nodes of the running deep learning load based on the kubernetes virtualization technology, and simulate the large-scale heterogeneous cluster of the deep learning load.
  • Run get the running behavior characteristics and running status data of the deep learning load.
  • the basic deep learning workload benchmark is trained based on the training data set to obtain the trained deep learning workload benchmark, and based on the pre-collected historical heterogeneous resource configuration information, the execution mode of the instruction and the instruction scheduling strategy are executed for the deep learning workload benchmark. Setting; and then make the deep learning load benchmark run according to the operation mode and the instruction scheduling strategy based on the execution instructions to accurately obtain the running behavior characteristics of the deep learning load; and then simulate and expand the running deep learning load based on the kubernetes virtualization technology.
  • the large-scale heterogeneous cluster operation that simulates the deep learning load to obtain the running status data; then, based on the performance indicators of the application layer of the deep learning load benchmark and the performance indicators of the micro-architecture layer, the index collection configuration file is set to perform the operation of the deep learning load in operation. Collect scheduling performance indicators, and evaluate the collected scheduling performance data to quickly obtain scheduling performance evaluation results. It can not only simulate the large-scale cluster deployment mode of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the cost of heterogeneous clusters to a certain extent. Assess time and cost.
  • FIG. 8 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 that communicate with each other through a system bus. It should be noted that only the computer device 8 with components 81-83 is shown in the figure, but it should be understood that implementation of all shown components is not required, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded devices etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 81 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or a memory of the computer device 8 .
  • the memory 81 may also be an external storage device of the computer device 8, such as a plug-in hard disk equipped on the computer device 8, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc.
  • the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device.
  • the memory 81 is generally used to store the operating system and various application software installed on the computer device 8 , such as program codes of a method for simulating heterogeneous cluster scheduling.
  • the memory 81 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 82 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 82 is typically used to control the overall operation of the computer device 8 .
  • the processor 82 is configured to run the program code stored in the memory 81 or process data, for example, the program code for executing the simulation method of the heterogeneous cluster scheduling.
  • the network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish a communication connection between the computer device 8 and other electronic devices.
  • the present application also provides another embodiment, which is to provide a computer-readable storage medium, where the computer-readable storage medium stores a simulation program scheduled by a heterogeneous cluster, and the simulation program scheduled by a heterogeneous cluster can be at least One processor executes to cause the at least one processor to execute the steps of the above-described simulation method for scheduling a heterogeneous cluster.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Des modes de réalisation de la présente demande concernent le domaine technique des informations et un procédé de simulation pour une planification de grappes hétérogènes, consistant : à recevoir une demande d'opération de simulation envoyée par un terminal d'utilisateur ; en réponse à la demande d'opération de simulation, à lire une base de données locale pour obtenir des informations historiques de configuration de ressources hétérogènes ; à définir le mode d'opération d'une instruction d'exécution et une politique de planification d'instructions pour une charge pré-instruite d'apprentissage profond, sur la base des informations historiques de configuration de ressources hétérogènes ; à activer, sur la base de l'instruction d'exécution, l'opération de la charge d'apprentissage profond selon le mode d'opération et selon la politique de planification d'instructions ; et à effectuer, sur la base d'une technologie de virtualisation de Kubernetes, une extension de simulation de nœuds de grappes sur la charge d'apprentissage profond en opération, de façon à simuler une opération en grappe hétérogène à grande échelle de la charge d'apprentissage profond pour obtenir des caractéristiques de comportement d'opération et des données d'état d'opération de la charge d'apprentissage profond. La présente demande utilise également un système de planification de grappes hétérogènes, un dispositif informatique et un support de stockage. La présente demande peut fournir un environnement expérimental à bas coût et réduire efficacement, dans une certaine mesure, le temps d'évaluation d'une grappe hétérogène.
PCT/CN2020/139683 2020-11-30 2020-12-25 Procédé et appareil de simulation pour planification de grappes hétérogènes, dispositif informatique et support de stockage WO2022110446A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011375112.7 2020-11-30
CN202011375112.7A CN112433819B (zh) 2020-11-30 2020-11-30 异构集群调度的模拟方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022110446A1 true WO2022110446A1 (fr) 2022-06-02

Family

ID=74697516

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139683 WO2022110446A1 (fr) 2020-11-30 2020-12-25 Procédé et appareil de simulation pour planification de grappes hétérogènes, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN112433819B (fr)
WO (1) WO2022110446A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237581A (zh) * 2022-09-21 2022-10-25 之江实验室 一种面向异构算力的多策略智能调度方法和装置
CN116523045A (zh) * 2023-03-13 2023-08-01 之江实验室 一种面向多芯粒芯片的深度学习推理模拟器
CN117271268A (zh) * 2023-11-20 2023-12-22 成都大征创智科技有限公司 一种数字化计算平台中的集群架构性能评估方法

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420517B (zh) * 2021-05-28 2023-01-06 清华大学 面向云端深度学习推理的fpga虚拟化硬件系统栈设计
CN113298176B (zh) * 2021-06-10 2023-04-25 中国科学技术大学 异构模型自适应协作方法
CN113377540A (zh) * 2021-06-15 2021-09-10 上海商汤科技开发有限公司 集群资源调度方法及装置、电子设备和存储介质
CN113504966B (zh) * 2021-06-22 2023-10-31 中国科学院计算技术研究所 Gpu集群调度策略模拟方法及gpu集群模拟器
CN113391925A (zh) * 2021-06-25 2021-09-14 北京字节跳动网络技术有限公司 云资源管理方法、系统、介质、计算机设备
CN113553140B (zh) * 2021-09-17 2022-03-18 阿里云计算有限公司 资源调度方法、设备及系统
CN113973049B (zh) * 2021-10-13 2022-08-02 中国科学院计算技术研究所 一种fpga集群管理与部署比特流的方法
CN114637650B (zh) * 2022-03-11 2023-04-18 电子科技大学 一种基于Kubernetes集群的弹性伸缩方法
CN116170518B (zh) * 2023-04-26 2023-07-18 北京太极信息系统技术有限公司 一种国产芯片容器云跨架构管理的方法及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236582A (zh) * 2011-07-15 2011-11-09 浙江大学 虚拟化集群负载在多台物理机中均衡分配的方法
CN105205003A (zh) * 2015-10-28 2015-12-30 努比亚技术有限公司 一种基于集群化系统的自动化测试方法和装置
US20200186616A1 (en) * 2018-12-11 2020-06-11 Sap Se Kubernetes as a distributed operating system for multitenancy/multiuser
CN112000421A (zh) * 2020-07-15 2020-11-27 北京计算机技术及应用研究所 基于超融合架构的管理调度技术

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11526370B2 (en) * 2019-03-10 2022-12-13 Microsoft Technology Licensing, Llc. Cloud resource management using machine learning
CN111274036B (zh) * 2020-01-21 2023-11-07 南京大学 一种基于速度预测的深度学习任务的调度方法
CN111966484A (zh) * 2020-06-23 2020-11-20 北京大学 一种基于深度强化学习的集群资源管理和任务调度方法及系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236582A (zh) * 2011-07-15 2011-11-09 浙江大学 虚拟化集群负载在多台物理机中均衡分配的方法
CN105205003A (zh) * 2015-10-28 2015-12-30 努比亚技术有限公司 一种基于集群化系统的自动化测试方法和装置
US20200186616A1 (en) * 2018-12-11 2020-06-11 Sap Se Kubernetes as a distributed operating system for multitenancy/multiuser
CN112000421A (zh) * 2020-07-15 2020-11-27 北京计算机技术及应用研究所 基于超融合架构的管理调度技术

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237581A (zh) * 2022-09-21 2022-10-25 之江实验室 一种面向异构算力的多策略智能调度方法和装置
CN115237581B (zh) * 2022-09-21 2022-12-27 之江实验室 一种面向异构算力的多策略智能调度方法和装置
CN116523045A (zh) * 2023-03-13 2023-08-01 之江实验室 一种面向多芯粒芯片的深度学习推理模拟器
CN116523045B (zh) * 2023-03-13 2023-11-07 之江实验室 一种面向多芯粒芯片的深度学习推理模拟器
CN117271268A (zh) * 2023-11-20 2023-12-22 成都大征创智科技有限公司 一种数字化计算平台中的集群架构性能评估方法
CN117271268B (zh) * 2023-11-20 2024-01-30 成都大征创智科技有限公司 一种数字化计算平台中的集群架构性能评估方法

Also Published As

Publication number Publication date
CN112433819A (zh) 2021-03-02
CN112433819B (zh) 2024-04-19

Similar Documents

Publication Publication Date Title
CN112433819B (zh) 异构集群调度的模拟方法、装置、计算机设备及存储介质
Han et al. Benchmarking big data systems: A review
Liu et al. FogWorkflowSim: An automated simulation toolkit for workflow performance evaluation in fog computing
Di et al. Characterizing and modeling cloud applications/jobs on a Google data center
WO2012033909A2 (fr) Procédé et système de simulation d'un centre de données
US11429434B2 (en) Elastic execution of machine learning workloads using application based profiling
CN103955373A (zh) 一种sdn应用集成开发环境的设计方法
CN103593192A (zh) 一种基于slurm调度的算法集成与评测平台及方法
Huang et al. A simulation-based optimization approach for reliability-aware service composition in edge computing
CN105630575A (zh) 针对kvm虚拟化服务器的性能评估方法
Shen et al. Performance prediction of parallel computing models to analyze cloud-based big data applications
Rizvandi et al. On modeling dependency between mapreduce configuration parameters and total execution time
JP2012509546A (ja) 組み込みシステムをシミュレートするための方法及びデータ処理システム
Huang et al. Performance and replica consistency simulation for quorum-based NoSQL system cassandra
Hauser et al. Predictability of resource intensive big data and hpc jobs in cloud data centres
Oladimeji et al. A comprehensive survey on cloud computing simulators
Zhang et al. Repeatable multi-dimensional virtual network embedding in cloud service platform
Yu et al. A two steps method of resources utilization predication for large Hadoop data center
Sinaei et al. Run-time mapping algorithm for dynamic workloads using association rule mining
Jawaddi et al. Autoscaling in serverless computing: Taxonomy and OpenChallenges
Kim et al. RETRACTED ARTICLE: Simulator considering modeling and performance evaluation for high-performance computing of collaborative-based mobile cloud infrastructure
He et al. Dynamic scalable stochastic petri net: A novel model for designing and analysis of resource scheduling in cloud computing
Amar et al. Tunable scheduling in a GridRPC framework
Nandhini et al. An assessment survey of cloud simulators for fault identification
An et al. Resource Demand Forecasting Approach Based on Generic Cloud Workload Model

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20963334

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20963334

Country of ref document: EP

Kind code of ref document: A1