CN116720544A - Model training time-consuming prediction method, device and system based on heterogeneous computing system - Google Patents

Model training time-consuming prediction method, device and system based on heterogeneous computing system Download PDF

Info

Publication number
CN116720544A
CN116720544A CN202310974618.7A CN202310974618A CN116720544A CN 116720544 A CN116720544 A CN 116720544A CN 202310974618 A CN202310974618 A CN 202310974618A CN 116720544 A CN116720544 A CN 116720544A
Authority
CN
China
Prior art keywords
computing
training
computing system
time
heterogeneous
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310974618.7A
Other languages
Chinese (zh)
Other versions
CN116720544B (en
Inventor
唐轶男
李仁刚
赵雅倩
郭振华
王丽
高开
曹芳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202310974618.7A priority Critical patent/CN116720544B/en
Publication of CN116720544A publication Critical patent/CN116720544A/en
Application granted granted Critical
Publication of CN116720544B publication Critical patent/CN116720544B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a model training time-consuming prediction method, equipment and a system based on a heterogeneous computing system, which relate to the field of neural networks and can be used for setting a plurality of corresponding simplified sub-computing systems according to the types of computing equipment contained in the heterogeneous computing system; then, the target model and the training data can be issued to each sub-computing system, and each sub-computing system can be controlled to perform multiple rounds of iterative training on the target model by utilizing the training data so as to record time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system; furthermore, the method and the device can input the actually acquired time-consuming information, the data transmission quantity and the communication bandwidth among the computing devices in the heterogeneous computing system into the preset mathematical model together to conduct time-consuming prediction, and obtain the predicted time-consuming of the heterogeneous computing system for training the target model, so that the defect that the time-consuming required by the heterogeneous computing system training model cannot be accurately predicted in the related technology can be overcome.

Description

Model training time-consuming prediction method, device and system based on heterogeneous computing system
Technical Field
The invention relates to the field of neural networks, in particular to a model training time-consuming prediction method, device and system based on a heterogeneous computing system.
Background
Along with the continuous increase of the size of the neural network model, the training difficulty and complexity of the model are continuously improved. To efficiently train a large-scale model, the model is typically distributed using a distributed computing system. In order to facilitate optimization of the model training process and the distributed computing system, the time consumption of training the neural network model in the distributed computing system is usually required to be predicted in the related art. However, the related art cannot accurately predict the time consumption required for training a neural network model for a heterogeneous computing system (a distributed computing system including multiple types of computing devices), and actually running a large-scale distributed training with a complete heterogeneous computing system and measuring the time consumption easily brings greater time cost and computational effort cost, which is further unfavorable for the model training process and optimization of the heterogeneous computing system.
Disclosure of Invention
The invention aims to provide a model training time-consuming prediction method, device and system based on a heterogeneous computing system, which can utilize a plurality of simplified sub-computing systems corresponding to the heterogeneous computing system to actually measure time-consuming information and data transmission quantity corresponding to a training target model of each type of computing device, so that the time consumption of the training target model of the heterogeneous computing system can be accurately predicted by utilizing actual measurement data, communication bandwidth among computing devices in the heterogeneous computing system and a mathematical modeling mode at lower cost.
In order to solve the technical problems, the invention provides a model training time-consuming prediction method based on a heterogeneous computing system, which comprises the following steps:
acquiring a target model, a training set, various computing equipment types contained in a heterogeneous computing system, training data amounts corresponding to the computing equipment types and communication bandwidths among the computing equipment in the heterogeneous computing system;
setting sub-computing systems corresponding to the types of the computing devices, and distributing training data to the computing devices in the sub-computing systems by utilizing the training data quantity and the training set; each of the sub-computing systems includes a plurality of computing devices of the same type, the number of the computing devices in the sub-computing system being less than the number of the computing devices in the heterogeneous computing system;
controlling each sub-computing system to perform multiple rounds of iterative training on the target model by utilizing the training data, and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system;
and inputting the time-consuming information, the communication bandwidth and the data transmission quantity into a preset mathematical model to conduct time-consuming prediction, so as to obtain the predicted time consumption of training the target model by the heterogeneous computing system.
Optionally, the recording the time-consuming information and the data transmission amount corresponding to each computing device in each sub-computing system includes:
when determining that each sub-computing system has completed a preset number of iterative training, recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system.
Optionally, the acquiring the communication bandwidth between each computing device in the heterogeneous computing system includes:
acquiring network address information among all computing devices in the heterogeneous computing system;
and measuring the communication bandwidth among all the computing devices in the heterogeneous computing system according to the network address information.
Optionally, the measuring the communication bandwidth between the computing devices in the heterogeneous computing system according to the network address information includes:
and measuring the communication bandwidth among all the computing devices in the heterogeneous computing system by using a network testing tool according to the network address information.
Optionally, the acquiring the communication bandwidth between each computing device in the heterogeneous computing system includes:
an input communication bandwidth between computing devices in the heterogeneous computing system is received.
Optionally, the setting a sub-computing system corresponding to each computing device type includes:
selecting, for each computing device type, a plurality of target computing devices from the heterogeneous computing system;
and setting sub-computing systems corresponding to the computing device types by utilizing target computing devices corresponding to the computing device types.
Optionally, each of the sub-computing systems comprises two computing devices of the same type.
Optionally, the allocating training data to the computing devices in each of the sub-computing systems using the training data amount and the training set includes:
randomly extracting data of the training data amount from the training set as the training data.
Optionally, the recording the time-consuming information and the data transmission amount corresponding to each computing device in each sub-computing system includes:
and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system by using a model performance analysis tool.
Optionally, the acquiring the target model includes:
and receiving the target model input in the form of codes, and analyzing the target model by utilizing a preset script.
Optionally, the training of the target model based on the ring full specification mode, the inputting the time-consuming information, the communication bandwidth, and the data transmission amount to a preset mathematical model to perform time-consuming prediction, to obtain predicted time consumption for training the target model by the heterogeneous computing system, includes:
Inputting the time-consuming information, the communication bandwidth and the data transmission quantity into a preset mathematical model to conduct time-consuming prediction, so as to obtain single iteration time consumption required by each computing device in the heterogeneous computing system to execute single iteration on the target model;
taking the maximum value in the single iteration time consumption as the single iteration total time consumption required by the heterogeneous computing system to execute the single iteration on the target model;
the predicted time consumption of the heterogeneous computing system to train the target model is determined based on the single iteration total time consumption.
Optionally, the time-consuming information includes a data processing time consuming and a calculation time consuming, where the data processing time consuming is a time consuming that the central processor performs a training data issuing action to the corresponding computing device, and the calculation time consuming includes a time consuming that the computing device performs a forward and reverse calculation.
Optionally, the inputting the time-consuming information, the communication bandwidth, and the data transmission amount to a preset mathematical model to perform time-consuming prediction, so as to obtain time consumption of a single iteration required by each computing device in the heterogeneous computing system to perform the single iteration on the target model, where the method includes:
determining a data processing time consumption average value and a calculation time consumption average value corresponding to a calculation device type to which the sub-calculation system belongs by using the data processing time consumption and the calculation time consumption recorded from the same sub-calculation system, and determining the data processing time consumption and the calculation time consumption of each calculation device in the heterogeneous calculation system by using the data processing time consumption average value and the calculation time consumption average value corresponding to each calculation device type;
Determining a data transmission quantity average value by utilizing the data transmission quantity, and determining the annular full-specification operation time consumption of each computing device in the heterogeneous computing system by utilizing the data transmission quantity average value, the communication bandwidth and the annular full-specification operation sequence among the computing devices in the heterogeneous computing system;
and determining single iteration time required by each computing device in the heterogeneous computing system to execute single iteration on the target model by using the time consumption of data processing, the time consumption of computation and the time consumption of ring full-specification operation of each computing device in the heterogeneous computing system.
Optionally, the training data issuing action is executed in parallel with the forward and backward computation and the circular full-specification operation, and determining a single iteration time required by each computing device in the heterogeneous computing system to perform a single iteration on the target model by using a time consumed by data processing, a time consumed by computation, and a time consumed by circular full-specification operation of each computing device in the heterogeneous computing system includes:
determining the equipment processing time consumption of the computing equipment according to the computing time consumption of the computing equipment and the annular full specification operation time consumption;
and taking the maximum value of the data processing time consumption and the device processing time consumption as single iteration time consumption required by the computing device to perform single iteration on the target model.
Optionally, the determining the device processing time consumption of the computing device according to the computing time consumption of the computing device and the ring full specification operation time consumption includes:
extracting forward computing total time consumption of the computing device for performing forward computing and backward computing time consumption for performing first backward computing from the computing time consumption;
and determining the equipment processing time consumption of the computing equipment by using the forward computing total time consumption, the backward computing time consumption and the annular full-specification operation time consumption of each computing equipment.
Optionally, the determining the time-consuming circular full-specification operation of each computing device in the heterogeneous computing system by using the data transmission amount average value, the communication bandwidth and the circular full-specification operation sequence among each computing device in the heterogeneous computing system includes:
determining adjacent computing equipment which performs annular full-specification operation with the computing equipment according to the annular full-specification operation sequence;
determining the time consumption of the computing device and the adjacent computing device for executing single annular full specification operation by using the data transmission quantity average value and the communication bandwidth between the computing device and the adjacent computing device;
determining a total number of times the computing device performs the circular full specification operation with the neighboring computing device using a number of computing devices contained by the heterogeneous computing system;
And determining the time consumption of the ring full specification operation of the computing device by using the total times and the time consumption of the single ring full specification operation of the computing device and the adjacent computing devices.
Optionally, the determining the predicted time-consuming for the heterogeneous computing system to train the target model based on the single iteration total time-consuming includes:
determining the single iteration times required by a heterogeneous computing system to completely train the target model by utilizing the training set according to the total data quantity of the training set, the training data quantity corresponding to each computing device type, the small-batch training data quantity used by each computing device in the heterogeneous computing system for executing the single iteration and the number of computing devices contained in the heterogeneous computing system;
and determining the predicted time consumption of the heterogeneous computing system for completely training the target model by using the training set by using the single iteration times and the single iteration total time consumption.
Optionally, after determining the predicted time consuming for the heterogeneous computing system to fully train the target model with the training set using the single iteration number and the single iteration total time consuming, further comprising:
and determining a predicted total time consumption of the heterogeneous computing system for completely training the target model by using the training set for a plurality of times by using the predicted time consumption.
The invention also provides a model training time-consuming prediction device based on the heterogeneous computing system, which comprises the following steps:
the information collection module is used for obtaining a target model, a training set, each computing equipment type contained in the heterogeneous computing system, training data quantity corresponding to each computing equipment type and communication bandwidth among computing equipment in the heterogeneous computing system;
the task allocation module is used for setting sub-computing systems corresponding to the types of the computing devices and allocating training data to the computing devices in the sub-computing systems by utilizing the training data quantity and the training set; each of the sub-computing systems includes a plurality of computing devices of the same type, the number of the computing devices in the sub-computing system being less than the number of the computing devices in the heterogeneous computing system;
the data collection module is used for controlling each sub-computing system to perform multi-round iterative training on the target model by utilizing the training data, and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system;
and the time consumption prediction module is used for inputting the time consumption information, the communication bandwidth and the data transmission quantity into a preset mathematical model to perform time consumption prediction, so as to obtain the predicted time consumption of the heterogeneous computing system for training the target model.
The present invention also provides an electronic device including:
a memory for storing a computer program;
a processor for implementing the model training time-consuming prediction method based on the heterogeneous computing system as described above when executing the computer program.
The invention also provides a model training time-consuming prediction system based on the heterogeneous computing system, which comprises the following steps: the electronic equipment and the plurality of sub-computing systems are arranged according to the types of the computing equipment contained in the heterogeneous computing systems, each sub-computing system contains a plurality of computing equipment of the same type, and the number of the computing equipment in the sub-computing system is smaller than that of the computing equipment in the heterogeneous computing system;
the electronic equipment is used for executing the model training time-consuming prediction method based on the heterogeneous computing system;
the plurality of sub-computing systems are used for jointly performing multiple rounds of iterative training on the target model by utilizing training data under the control of the electronic equipment.
The invention also provides a computer readable storage medium, wherein the computer readable storage medium stores computer executable instructions, and when the computer executable instructions are loaded and executed by a processor, the model training time-consuming prediction method based on the heterogeneous computing system is realized.
The invention provides a model training time-consuming prediction method based on a heterogeneous computing system, which comprises the following steps: acquiring a target model, a training set, various computing equipment types contained in a heterogeneous computing system, training data amounts corresponding to the computing equipment types and communication bandwidths among the computing equipment in the heterogeneous computing system; setting sub-computing systems corresponding to the types of the computing devices, and distributing training data to the computing devices in the sub-computing systems by utilizing the training data quantity and the training set; each of the sub-computing systems includes a plurality of computing devices of the same type, the number of the computing devices in the sub-computing system being less than the number of the computing devices in the heterogeneous computing system; controlling each sub-computing system to perform multiple rounds of iterative training on the target model by utilizing the training data, and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system; and inputting the time-consuming information, the communication bandwidth and the data transmission quantity into a preset mathematical model to conduct time-consuming prediction, so as to obtain the predicted time consumption of training the target model by the heterogeneous computing system.
The invention has the beneficial effects that: the method comprises the steps of firstly obtaining a target model, a training set, each computing device type contained in a heterogeneous computing system, training data quantity corresponding to each computing device type and communication bandwidth among computing devices in the heterogeneous computing system, setting sub-computing systems corresponding to each computing device type, and distributing training data to the computing devices in each sub-computing system by utilizing the training data quantity and the training set, wherein each sub-computing system comprises a plurality of computing devices of the same type, and the number of the computing devices in the sub-computing systems is smaller than that of the computing devices in the heterogeneous computing system. In other words, the present invention can provide a plurality of simplified sub-computing systems according to the related information of the heterogeneous computing systems. Then, the invention can control each sub-computing system to perform multi-round iterative training on the target model by utilizing the training data, record the time-consuming information and the data transmission quantity corresponding to each computing device in each sub-computing system, namely, the target model to be actually trained can be deployed into the plurality of simplified sub-computing systems, and control the sub-computing systems to perform multi-round iterative training on the target model by utilizing the actual training data, so as to record the actual time-consuming condition and the data transmission quantity of each type of computing device when training the target model. Finally, the time consumption information, the communication bandwidth and the data transmission quantity can be input into the preset mathematical model to conduct time consumption prediction, the predicted time consumption of the heterogeneous computing system training target model is obtained, namely, the actual measurement data generated in the process of training the target model by all types of computing equipment, the network communication condition among all computing equipment in the heterogeneous computing system and the data modeling mode can be utilized, the time consumption required by training the target model of the heterogeneous computing system can be accurately predicted at lower cost, and therefore the defect that the time consumption required by training based on the heterogeneous computing system predicted training model cannot be accurately predicted in the related technology can be overcome. The invention also provides a model training time-consuming prediction device, electronic equipment, a system and a computer-readable storage medium based on the heterogeneous computing system, which have the beneficial effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a model training time-consuming prediction method based on a heterogeneous computing system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a ring-like full specification mode according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of time consuming single iteration according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of another single iteration time consuming provided by an embodiment of the present invention;
FIG. 5 is a flowchart of another model training time-consuming prediction method based on heterogeneous computing systems according to an embodiment of the present invention;
FIG. 6 is a block diagram of a model training time-consuming prediction apparatus based on a heterogeneous computing system according to an embodiment of the present invention;
fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention;
FIG. 8 is a block diagram of a model training time-consuming prediction system based on a heterogeneous computing system according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Along with the continuous increase of the scale of the neural network model, the training difficulty and complexity of the neural network model are continuously improved. To efficiently train a large-scale model, the model is typically distributed using a distributed computing system. In order to facilitate optimization of the model training process and the distributed computing system, the time consumption of training the neural network model in the distributed computing system is usually required to be predicted in the related art. However, the related art cannot accurately predict the time required for training a neural network model for a heterogeneous computing system (a distributed computing system including multiple types of computing devices), and actually running a large-scale distributed training with a complete heterogeneous computing system and measuring the time tends to incur significant time and effort costs. In view of the above, the embodiment of the invention may provide a model training time-consuming prediction method based on a heterogeneous computing system, which may utilize a plurality of simplified sub-computing systems corresponding to the heterogeneous computing system to actually measure time-consuming information and data transmission amount corresponding to a training target model of each type of computing device, so that time consumption of training the target model by the heterogeneous computing system may be accurately predicted with lower cost by using actual measurement data, communication bandwidth between computing devices in the heterogeneous computing system, and mathematical modeling.
It should be noted that, the embodiment of the present invention is not limited to specific hardware devices for executing the method, and may be a personal computer, a server, etc., and may be set according to actual application requirements.
Referring to fig. 1, fig. 1 is a flowchart of a model training time-consuming prediction method based on a heterogeneous computing system according to an embodiment of the present invention, where the method may include:
s100, acquiring a target model, a training set, various computing equipment types contained in the heterogeneous computing system, training data amounts corresponding to the computing equipment types and communication bandwidths among the computing equipment in the heterogeneous computing system.
The information mainly collected in the embodiment of the invention can be divided into three types: training task information, hardware configuration information, and hardware network information. The training task information can comprise a target model and a training set, wherein the target model is a model to be actually trained by the heterogeneous computing system and comprises details, activation functions and the like of each layer in the model; the training set contains training data required to train the target model. Of course, the training task information may further include training mode information, which is information required for training the target model, such as a loss function, a training mode, an optimization algorithm, and the like, so that the model training process may be configured in more detail according to the training mode information. The hardware configuration information may include each computing device type included in the heterogeneous computing system and an amount of training data corresponding to each computing device type. The hardware network information is specifically a communication bandwidth between computing devices in the heterogeneous computing system. It should be noted that the embodiment of the present invention is not limited to a specific type of computing device, and may be, for example, a CPU (central processing unit), a GPU (graphics card), an FPGA (Field Programmable Gate Array ), etc., and may further include a subdivision class, for example, further include different types of GPUs.
Further, the embodiment of the present invention is not limited to a specific method for acquiring the information, and for example, the information may be input by a user through a manual input method, or may be acquired through a code analysis method, an actual measurement method, or the like. Specifically, for the target model, the target model input by the manual configuration of the user can be received, or the target model input in the form of a code (such as a code file written by using an artificial intelligence computing framework tensorflow, pytorch) can be received, and the target model is analyzed by using a preset script. Of course, if the training task information further includes training mode information, the information may also be input by manually configuring input or code input. For another example, the communication bandwidth between the computing devices in the heterogeneous computing system may be received as input, or the communication bandwidth between the computing devices may be measured based on the communication addresses of the computing devices.
Based on this, acquiring the communication bandwidth between computing devices in the heterogeneous computing system may include:
s101: network address information between computing devices in a heterogeneous computing system is obtained.
S102: communication bandwidth between computing devices in the heterogeneous computing system is measured based on the network address information.
Specifically, the embodiment of the invention can utilize a network testing tool (such as the iperf) to perform measurement, and the specific measurement process can refer to the related technology of the network testing tool.
Based on this, measuring the communication bandwidth between computing devices in the heterogeneous computing system based on the network address information may include:
s1021: and measuring the communication bandwidth among all the computing devices in the heterogeneous computing system by using a network testing tool according to the network address information.
It should be noted that, for ease of recording and operation, the communication bandwidth between computing devices may be recorded based on an adjacency matrix, e.g., each row in the matrix may be represented as: (the calculation force ip1,the force ip2 is calculated and the result is obtained,) Wherein the computing power indicates the computing device, ip is the network address of the computing device, +.>Representing the bandwidth of communication between computing devices.
S200, setting sub-computing systems corresponding to the types of the computing devices, and distributing training data to the computing devices in the sub-computing systems by utilizing the training data quantity and the training set; each sub-computing system contains a plurality of computing devices of the same type, the number of computing devices in the sub-computing system being less than the number thereof in the heterogeneous computing system.
After the collection of the related information of the target model and the related information of the heterogeneous computing system is completed, the embodiment of the invention firstly sets the sub-computing system corresponding to each computing device type according to the multiple computing device types contained in the heterogeneous computing system. The sub-computing system is made up of multiple computing devices of the same type, and the number of computing devices in the sub-computing system is less than in the heterogeneous computing system. Meanwhile, the sub-computing systems need to work together to perform training test on the target model together based on a distributed training mode. Then, according to the training data amount corresponding to each computing device type, the embodiment of the invention distributes training data by utilizing the computing devices in each computing system of the training data amount so as to send the target model and the training data to each computing system and control the computing systems to perform training test on the target model. In other words, the multiple sub-computing systems are simplified versions of the heterogeneous computing system, and the embodiments of the present invention can use the sub-computing systems to actually measure specific time consumption of training the target model of the heterogeneous computing system and the data transmission amount between computing devices with less expense, so that the time consumption of training the target model of the heterogeneous computing system can be predicted according to the actually measured time consumption information and the data transmission amount.
It is worth pointing out that, because the embodiment of the invention can adopt a plurality of simplified sub-computing systems to actually measure the time-consuming information and the data transmission quantity generated by training the target model, and can utilize the actual measurement data to carry out time-consuming prediction, the complexity of the sub-computing systems is lower than that of the heterogeneous computing systems, the number of computing devices required by the sub-computing systems is far lower than that of the heterogeneous computing systems, namely, the difficulty in constructing and adjusting the sub-computing systems is obviously lower than that of the complete heterogeneous computing systems, therefore, the invention can accurately predict the time consumption required by training the target model of the heterogeneous computing systems only by means of testing and mathematical modeling of the sub-computing systems under the condition of not utilizing the complete heterogeneous computing systems to actually run large-scale distributed training, thereby being capable of reducing the time cost and the calculation cost of the time-consuming prediction of model training and bringing convenience to the optimization of the model training process and the heterogeneous computing systems. In addition, compared with a purely mathematical modeling mode, the embodiment of the invention collects the actual measurement data to examine the actual working condition of the computing equipment, so that the actual working condition of each computing equipment can be fused to carry out training time-consuming prediction, further the predicted result can be ensured to be more fit with the actual working condition of the heterogeneous computing system, and the time-consuming accuracy required by the training target model of the heterogeneous computing system can be improved.
It should be noted that, the embodiment of the present invention does not limit how to set the above-mentioned sub-computing systems, and when the heterogeneous computing system is already built, the embodiment of the present invention may select, for each computing device type, a plurality of target computing devices from the heterogeneous computing system, and set the sub-computing system corresponding to each computing device type by using the target computing device corresponding to each computing device type; when the heterogeneous computing system is not built, the embodiment of the invention can also re-build the corresponding sub-computing system according to the type of the computing equipment contained in the heterogeneous computing system.
Based on this, setting the sub-computing system corresponding to each computing device type may include:
s211: for each computing device type, a plurality of target computing devices are selected from the heterogeneous computing systems.
S212: and setting a sub-computing system corresponding to each computing device type by using the target computing device corresponding to each computing device type.
It should be noted that, the embodiment of the present invention is not limited to the number of computing devices included in each sub-computing system, at least two computing devices may be provided, and other numbers may be provided. The purpose of arranging a plurality of computing devices in the sub-computing systems is to facilitate the measurement of the data transmission quantity between the computing devices, so that each sub-computing system at least comprises two computing devices of the same kind in order to meet the measurement requirements. Of course, the number of computing devices of the same type in each sub-computing system is preferably two, considering that the provision of two computing devices in the sub-computing system can meet the measurement requirements with the most effort-saving.
Further, embodiments of the present invention are not limited to how training data is distributed, for example, data of a training data amount may be randomly extracted from a training set as training data, so as to ensure that the training data is sufficiently random.
Based on this, assigning training data to computing devices in each sub-computing system using the training data amount and the training set may include:
s221: data of a training data amount is randomly extracted from the training set as training data.
And S300, controlling each sub-computing system to perform multi-round iterative training on the target model by utilizing training data, and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system.
After the setting of a plurality of sub-computing systems and the distribution of training data are completed, the invention can control each sub-computing system to perform multiple rounds of iterative training on the target model by utilizing the training data. In the iterative training process, the embodiment of the invention records the time-consuming information and the data transmission quantity corresponding to each computing device in each sub-computing system, wherein the time-consuming information can reflect the total time consumption required by the computing device to execute single iterative training on the target model; the data transmission amount refers to the data transmission amount between the computing devices, and is because in the distributed training, the sub-training results generated by each computing device need to be summarized to obtain a complete total training result, and then the computing devices need to perform data transmission to exchange the sub-training results.
It should be noted that, the embodiment of the present invention is not limited to the specific recording manner of the time-consuming information and the data transmission amount, for example, the time-consuming information and the data transmission amount may be recorded by using a model performance analysis tool (such as profiler) of an artificial intelligence computing framework (such as Pytorch, tensorFlow, etc.), or may be implemented by self-developing a specific performance analysis module.
Based on this, recording the time-consuming information and the data transmission amount corresponding to each computing device in each sub-computing system may include:
s311: and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system by using a model performance analysis tool.
Further, it should be noted that, since the model cannot be trained smoothly in the initial iteration, the time-consuming information and the data transmission amount corresponding to each computing device are inaccurate, and thus the prediction of the time consumption of model training is easily interfered. Therefore, in the embodiment of the invention, after determining that each sub-computing system has jointly completed the preset number of iterative training, that is, after determining that the model can perform smooth training, time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system can be recorded. For example, for a total of 100 iterations, only the time-consuming information and data transfer amounts corresponding to each computing device when performing the 20 th to 100 th iterations may be recorded.
Based on this, recording the time-consuming information and the data transmission amount corresponding to each computing device in each sub-computing system may include:
s321, when it is determined that the sub-computing systems have jointly completed the preset number of iterative training, recording time-consuming information and data transmission quantity corresponding to the computing devices in the sub-computing systems.
S400, time-consuming information, communication bandwidth and data transmission quantity are input into a preset mathematical model to conduct time-consuming prediction, and the predicted time consumption of the heterogeneous computing system training target model is obtained.
After time-consuming information, communication bandwidth and data transmission quantity are obtained, the embodiment of the invention can utilize the preset mathematical model to estimate the time consumption of each computing device in the heterogeneous computing system when iterative training is carried out and the time consumption when data transmission between devices is carried out, and further can effectively predict the predicted time consumption of the training target model of the heterogeneous computing system.
It should be noted that, the embodiment of the present invention is not limited to a specific setting manner of the predictive mathematical model, and is related to a specific model training mode (for example, all reduction full specification, ring-all reduction annular full specification) of the neural network model, and the corresponding preset mathematical model may be set according to the actually selected model training mode.
Based on the above embodiment, the present invention first obtains a target model, a training set, each computing device type included in a heterogeneous computing system, a training data amount corresponding to each computing device type, and a communication bandwidth between computing devices in the heterogeneous computing system, and may set a sub-computing system corresponding to each computing device type, and distributes training data to computing devices in each sub-computing system by using the training data amount and the training set, where each sub-computing system includes a plurality of computing devices of the same type, and the number of computing devices in the sub-computing system is smaller than the number of computing devices in the heterogeneous computing system. In other words, the present invention can provide a plurality of simplified sub-computing systems according to the related information of the heterogeneous computing systems. Then, the invention can control each sub-computing system to perform multi-round iterative training on the target model by utilizing the training data, record the time-consuming information and the data transmission quantity corresponding to each computing device in each sub-computing system, namely, the target model to be actually trained can be deployed into the plurality of simplified sub-computing systems, and control the sub-computing systems to perform multi-round iterative training on the target model by utilizing the actual training data, so as to record the actual time-consuming condition and the data transmission quantity of each type of computing device when training the target model. Finally, the time consumption information, the communication bandwidth and the data transmission quantity can be input into the preset mathematical model to conduct time consumption prediction, the predicted time consumption of the heterogeneous computing system training target model is obtained, namely, the actual measurement data generated in the process of training the target model by all types of computing equipment, the network communication condition among all computing equipment in the heterogeneous computing system and the data modeling mode can be utilized, the time consumption required by training the target model of the heterogeneous computing system can be accurately predicted at lower cost, and therefore the defect that the time consumption required by training based on the heterogeneous computing system predicted training model cannot be accurately predicted in the related technology can be overcome.
Based on the above embodiment, a detailed description of the specific process of the time-consuming prediction will be given below based on a specific model training mode. In one possible scenario, the target model inputs time-consuming information, communication bandwidth and data transmission amount to a preset mathematical model to perform time-consuming prediction based on ring-full-reduction model training (ring-full-reduction), so as to obtain predicted time consumption of training the target model by the heterogeneous computing system, and the method may include:
s401, time-consuming information, communication bandwidth and data transmission quantity are input into a preset mathematical model to conduct time-consuming prediction, and single iteration time consumption required by each computing device in the heterogeneous computing system to execute single iteration on the target model is obtained.
For ease of understanding, please refer to fig. 2, fig. 2 is a schematic diagram of an annular full specification mode according to an embodiment of the present invention. Where the left side of fig. 2 is a diagram of a heterogeneous computing system that includes a plurality of heterogeneous computing devices (also referred to as heterogeneous computing devices) that can communicate with each other through inter-board communication within a server or through network communication between servers. On the right side of fig. 2 is a diagram of distributed training based on a torus full specification model, the algorithm will group multiple computing devices into a torus (e.g., three computing devices into a torus in fig. 2), where the edges on the torus represent the communication links between two computing nodes. Based on this loop, multiple computing devices may perform parameter synchronization in a single iteration (step) of each distributed training, thereby completing the distributed training. Since in the ring full-specification mode, each computing device will execute the computing action and the parameter synchronization action in parallel, for example, computing devices 1, 2 and 3 simultaneously perform forward and backward computation on the target model, and for example, while computing device 1 synchronizes parameters to computing device 2, computing device 2 also synchronizes parameters to computing device 3, computing device 3 also synchronizes parameters to computing device 1, and meanwhile, each computing device has a difference in terms of computing efficiency and reading and writing rate, that is, the time spent on executing a single iteration by each computing device is not the same, so the total time spent on executing a single iteration by the heterogeneous computing system is virtually equal to the time spent on executing a single iteration by the slowest computing device in the computing system, which can be expressed specifically as:
Wherein, the liquid crystal display device comprises a liquid crystal display device,representing a single iteration of each computing device is time consuming. For the model training time-consuming prediction in the annular full-specification mode, the time-consuming information, the communication bandwidth and the data transmission quantity are firstly required to be input into a preset mathematical model to conduct time-consuming prediction, so that single iteration time required by each computing device in the heterogeneous computing system to execute single iteration on the target model is obtained, and the maximum single iteration time in each computing device is used as the total single iteration time of the heterogeneous computing system. It should be noted that the training time-consuming prediction method for predicting the total time consumption of a single iteration of the heterogeneous computing system based on the time consumption of the single iteration of each computing device has a strong correlation with the annular full-specification mode, that is, the method is applicable to model training time-consuming prediction in the annular full-specification mode, but is not applicable to other training modes; moreover, the model training time-consuming prediction mode in other training modes is not suitable for the annular full-specification mode. Further, it should be noted that, since the computing device includes a plurality of actions during the process of executing a single iteration, in order to improve the prediction accuracy, the time-consuming information may also include time-consuming information corresponding to each of the actions executed by the computing device. For example, the time-consuming information may include data processing time-consuming and computing time-consuming, wherein the data processing time-consuming is time-consuming for the central processor to perform training data issuing actions to the computing device, and the computing time-consuming includes time-consuming for the computing device to perform forward-backward computations. It is worth noting that the number of the parts, The embodiment of the invention particularly collects the time consumption of the central processing unit for executing the action of issuing training data to the computing equipment, and fully considers the influence of the actual IO condition (input-output condition) between the central processing unit and the computing equipment on the time consumption of model training. It should be noted that, the embodiment of the present invention is not limited to the specific content included in the training data issuing action executed by the cpu, and may include, for example, extraction, preprocessing, issuing of batch data (batch_size). Embodiments of the present invention are also not limited to the specific content of forward and backward calculations performed by the computing device, and may include multiple forward calculations and multiple backward calculations, for example.
Based on the two time-consuming information, a single iteration time-consuming prediction process required for each computing device in the heterogeneous computing system to perform a single iteration on the target model will be described in detail.
Based on this, the time-consuming information, the communication bandwidth, and the data transmission amount are input to a preset mathematical model to perform time-consuming prediction, so as to obtain a single iteration time required by each computing device in the heterogeneous computing system to perform a single iteration on the target model, which may include:
s4011: and determining the data processing time consumption average value and the calculation time consumption average value corresponding to the type of the computing equipment to which the sub-computing system belongs by using the data processing time consumption and the calculation time consumption recorded from the same sub-computing system, and determining the data processing time consumption and the calculation time consumption of each computing equipment in the heterogeneous computing system by using the data processing time consumption average value and the calculation time consumption average value corresponding to each type of the computing equipment.
In order to improve accuracy, the embodiment of the invention can determine the data processing time consumption average value and the calculation time consumption average value corresponding to the type of the computing equipment to which the sub-computing system belongs by utilizing the data processing time consumption and the calculation time consumption recorded from the same sub-computing system. Furthermore, according to the embodiment of the invention, the corresponding data processing time-consuming mean value and the corresponding computing time-consuming mean value can be used as the data processing time-consuming and the computing time-consuming of the computing equipment according to the type of the computing equipment to which each computing equipment belongs in the heterogeneous computing system.
S4012: and determining a data transmission quantity average value by utilizing the data transmission quantity, and determining the annular full-specification operation time consumption of each computing device in the heterogeneous computing system by utilizing the data transmission quantity average value, the communication bandwidth and the annular full-specification operation sequence among the computing devices in the heterogeneous computing system.
Also, to improve accuracy, embodiments of the present invention may determine a mean value of data traffic using data traffic counted from each computing system. Furthermore, because the annular full-specification operation sequence among the computing devices in the heterogeneous devices is preset information, the embodiment of the invention can determine the adjacent computing device which performs the annular full-specification operation with a certain computing device based on the information, and further can determine the annular full-specification operation time consumption for performing the annular full-specification operation between the computing device and the adjacent computing device according to the data transmission quantity average value and the communication bandwidth between the computing device and the adjacent computing device.
Based on this, determining the time-consuming circular full-specification operation of each computing device in the heterogeneous computing system using the data traffic average, the communication bandwidth, and the circular full-specification operation order among each computing device in the heterogeneous computing system may include:
step 11: according to the circular full specification operation sequence, adjacent computing devices which execute circular full specification operation with the computing devices are determined.
Step 12: and determining the time consumption of the computing device and the adjacent computing device for executing the single annular full specification operation by using the data transmission quantity average value and the communication bandwidth between the computing device and the adjacent computing device.
Step 13: the number of computing devices included in the heterogeneous computing system is used to determine a total number of times the computing device performs a circular full specification operation with the neighboring computing devices.
Step 14: the total number of times and the time consumed by the computing device to perform a single ring full specification operation with the adjacent computing device are utilized to determine the ring full specification operation time consumed by the computing device.
In particular, the circular full specification operating time of a computing device can be expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing time-consuming circular full-specification operation of a computing device, < >>Represents the mean value of data transmission quantity,/->Representing communication bandwidth between a computing device and a neighboring computing device,/- >Representing the number of computing devices in a heterogeneous computing system, +.>Representing the total number of times a computing device performs a circular full specification operation with a neighboring computing device.
S4013: and determining single iteration time required by each computing device in the heterogeneous computing system to execute single iteration on the target model by using the time consumption of data processing, the time consumption of computation and the time consumption of ring full-specification operation of each computing device in the heterogeneous computing system.
After the time consumption of data processing, calculation and annular full-specification operation of each computing device are obtained, modeling is only required according to the execution sequence of the three actions of the training data issuing action, forward and reverse calculation and annular full-specification operation, so that the time consumption of single iteration required by each computing device for executing single iteration on the target model can be obtained.
The modeling process is described in detail below based on the specific execution sequence of the three actions of training data delivery actions, forward reverse computation, and circular full specification operation. In one possible scenario, the training data issuing action is performed in parallel with the forward and backward computation and the circular full specification operation, and determining a single iteration time required for each computing device in the heterogeneous computing system to perform a single iteration on the target model using the data processing time, the computation time, and the circular full specification operation time of each computing device in the heterogeneous computing system may include:
Step 11: the device processing time consumption of the computing device is determined according to the computing time consumption of the computing device and the ring full specification operation time consumption.
Step 12: the maximum of the data processing time and the device processing time is taken as a single iteration time required by the computing device to perform a single iteration on the target model.
In the embodiment of the invention, because the training data issuing action is different from the execution main body of the forward and backward calculation and the annular full specification operation, the training data issuing action can be executed in parallel with the forward and backward calculation and the annular full specification operation, and the central processing unit prepares and issues the training data required by the next round of iteration for the computing device while the computing device executes the forward and backward calculation and the annular full specification operation. It will be appreciated that the computing device may enter the next round of iterations only after the round of iterations is completed and the next round of training data issued by the central processor is received. Since the time consumption of performing the training data issuing action may be different from the time consumption of processing the device consumed by the forward and backward computation and the circular full-specification operation together, the time consumption of performing a single iteration by the computing device is actually equal to the maximum value of the time consumption of processing the device and the time consumption of processing the data, which are formed by the time consumption of computing the computing device and the time consumption of the circular full-specification operation.
Further, since the computing device typically needs to sequentially perform multiple forward computations and reverse computations, and the circular full-specification operation typically starts to be performed in parallel with other reverse computations when the computing device completes the first reverse computation, the device processing time of the computing device is equal to the sum of the total forward computation time of the computing device performing the forward computation, the reverse computation time of the first reverse computation, and the circular full-specification operation time.
In one possible scenario, determining a device processing time consuming of a computing device from a computing time consuming of the computing device and a ring full specification operation time consuming, comprising:
step 21: the total time consumption of forward computing for the computing device to perform forward computing and the time consumption of backward computing for performing first backward computing are extracted from the time consumption of the computing.
Step 22: the device processing time of the computing device is determined by using the forward computing total time consumption, the backward computing time consumption and the ring full specification operation time consumption of each computing device.
Specifically, a single iteration time of a computing device may be expressed as:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing the total time taken by the computing device to perform forward computation for forward computation, +.>Back-computing time-consuming representing that the computing device performs a first back-computing, < > >Representing Data processing time-consuming (DL, data loading, data reading), a +.>Representing time-consuming of a circular full specification operation representing a computing device,/->Representation fetchAnd->Is the maximum value of (a).
For ease of understanding, please refer to fig. 3 and 4, wherein fig. 3 is a schematic diagram of single iteration time consumption provided by an embodiment of the present invention, and fig. 4 is a schematic diagram of single iteration time consumption provided by an embodiment of the present invention. As can be seen from the figure, each iteration necessarily contains. However, the junction of each iterationThe beam depends on the circular full specification operation process and the data reading process. In some cases, the data reading process (DL for next, data read for the next iteration) is relatively fast (fig. 3), while in some cases, the data reading process is slow (fig. 4), so the total time-consuming calculation formula of a single iteration of the heterogeneous computing system includes a formula of max, which is used to model the difference between these two cases.
S402, taking the maximum value in the single iteration time as the total single iteration time required by the heterogeneous computing system to execute the single iteration on the target model.
S403, determining the predicted time consumption of the heterogeneous computing system training target model based on the total time consumption of single iteration.
After determining the total time consumption of a single iteration required by the heterogeneous computing system to execute the single iteration on the target model, the embodiment of the invention can determine the single iteration times required by the heterogeneous computing system to completely train the target model (namely one epoch) by using the training set according to the total data amount of the training set, the training data amount corresponding to each computing device type, the small batch training data amount used by each computing device in the heterogeneous computing system to execute the single iteration and the number of computing devices contained in the heterogeneous computing system, and further can determine the prediction time consumption of the heterogeneous computing system to completely train the target model by using the training set by using the single iteration times and the total time consumption of the single iteration.
Based on this, a predicted time-consuming for training the target model by the heterogeneous computing system is determined based on the single iteration total time-consuming, including:
s4031: and determining the single iteration times required by the heterogeneous computing system to completely train the target model by using the training set according to the total data volume of the training set, the training data volume corresponding to each computing device type, the small-batch training data volume used by each computing device in the heterogeneous computing system for executing single iteration and the number of computing devices contained in the heterogeneous computing system.
S4032: and determining the prediction time consumption of the heterogeneous computing system for completely training the target model by using the training set by using the single iteration times and the total time consumption of single iteration.
In addition, the embodiment of the invention can also use the predicted time consumption to determine the predicted total time consumption of the heterogeneous computing system to completely train the target model (i.e. a plurality of epochs) for a plurality of times by using the training set.
Based on this, after determining the predicted time consuming of the heterogeneous computing system to fully train the target model with the training set using the number of single iterations and the total time consuming of the single iterations, it may further include:
s4033: the predicted total time consumption of the heterogeneous computing system to complete training of the target model multiple times using the training set is determined using the predicted time consumption.
Based on the embodiment, the method and the device can accurately predict the time consumption required by the single-step iteration of the heterogeneous computing system on the target model only by depending on the test and mathematical modeling modes of the sub-computing system under the condition that the complete heterogeneous computing system is not utilized to actually run the large-scale distributed training, thereby reducing the time cost and the computational cost of the time consumption prediction of the model training and bringing convenience to the optimization of the model training process and the heterogeneous computing system.
The model training time-consuming prediction method is fully described below based on specific examples. Referring to fig. 5, fig. 5 is a flowchart of another model training time-consuming prediction method based on a heterogeneous computing system according to an embodiment of the present invention, wherein A, B, C, D, E, F is a step number. In one possible scenario, the training task needs to be deployed in a truly heterogeneous computing system consisting of 100 chilly's MLU370 boards and 100 NVIDIA a100 boards, and distributed training with a ring-all reduction model (ring-all reduction). The target model is resnet50 and requires training using a data set of 200 TB. Wherein each MLU370 is assigned A1 GB data set and each a100 is also assigned A1 GB data set. The training task will also use the artificial intelligence computing framework of pytorch. The purpose of this embodiment is to predict and evaluate the time consumption of a single iteration (step) after a task deployment before actually deploying the task.
As shown in fig. 5, the present embodiment includes 5 modules in total, which are respectively:
i. the distributed training task information collection module: for collecting information of distributed training tasks that need to be predicted.
Test task distribution module: and generating a distributed training subtask according to the distributed training task information, and issuing the distributed training subtask to a sub-computing system consisting of various computing devices to be used for performing performance test on the various computing devices.
Test result data collection module: and collecting the running information results of the distributed training subtasks of each sub-computing system.
Heterogeneous computing system information collection module: information in the heterogeneous computing system is collected for training a task time-consuming prediction. The heterogeneous computing system can be a real heterogeneous computing system or an unstructured heterogeneous computing system to be simulated. Information of the heterogeneous computing system may thus be obtained by measurement, or by manual input.
And v. a distributed training task time consumption prediction module: based on the running information result of the distributed training subtasks, heterogeneous computing system information and distributed training task information, the time consumption of the distributed training tasks is predicted by combining the distributed training time consumption mathematical modeling defined by the invention, and a prediction result is output.
Since the computing device related to the present embodiment includes the MLU370 and the a100, the sub-computing systems used in the test task are respectively formed by the MLU370 and the a 100.
As shown in fig. 5, during the training process, the whole workflow and the respective modules function as follows:
A. the distributed training task collection module collects training information resnet50 of the distributed training tasks and then sends the training information resnet50 to the test task distribution module. The training information of the distributed training task comprises:
i. training task information: details of each layer in the resnet50 model, activation functions of each layer, loss functions (loss) for training the model, training set. In an embodiment, the part is submitted by a code mode (for example, code such as tensorflow, pytorch) and is analyzed by a script.
information of hardware configuration:
the hardware configuration information includes: the computing device types called by the training task are MLU370 and A100, and each computing device is assigned A1 GB data set.
Hardware network information:
the partial information includes: the number of nodes participating in training is 200, ring-allreduce rings connect specific node sequences, and ip information of each MLU370 and A100 called by the training task in the heterogeneous computing system.
B. The test task distribution module constructs a distributed training subtask for each of the MLUs 370 and a100 based on the received training information for the distributed training task. Specifically, for both distributed training subtasks of MLU370 and a100, the model of their tasks includes all details of the original distributed training task, with the training data used being 2GB of data that is split and randomly extracted from the total data set, each heterogeneous computing power being 1GB of training data. The two distributed training sub-tasks are sent to the corresponding heterogeneous computing power subsystem for execution.
C. The sub-computing system, consisting of MLU370 and a100, will send training information to the test result data collection module during the training process.
D. The test result data collection module receives the distributed training subtask operation information results of the sub-computing system, and the collection process is realized by using a model performance analyzer profiler built in a pyrach. For the collected subtask operation information result, mainly including:
i. data processing time-consuming mean value:
the CPU reads the training data with a batch size (batch_size) from the storage, and the time consumed by the CPU for transmitting the training data to the GPU after preprocessing is recorded as
Calculating a time-consuming mean:
Meaning that in one iteration, each operator calculation time of this sub-training task from the beginning of the iteration to the last inverse calculation operator is noted as where forward operator time is noted asThe reverse operator time consumption is recorded as +.>,/>Representing the operator sequence number.
Data transmission data volume mean:
the amount of data that each heterogeneous computing force performs ring-allreduce to be transmitted outwards is recorded as each iteration
The statistics are obtained by counting the 20 th to 100 th iterations of the distributed training subtask and then taking the average.
E. The distributed training task information collection module sends the collected distributed training task training information to the distributed training task time consumption prediction module. The heterogeneous computing system information collection module collects information in the heterogeneous computing system and sends the information to the distributed training task time consumption prediction module. For information in heterogeneous computing systems, embodiments test with some network test tools iperf and are recorded by adjacency matrices between heterogeneous computing forces, each row in the matrix can be represented as: (computing force ip1, computing force ip2,)。
of course, in one possible case, the bandwidth adjacency matrix of the heterogeneous computing system to be built can also be directly input, and each data in the matrix is: (computing force ip1, computing force ip2, )。
F. The distributed training task time consumption prediction module collects training information of the distributed training task, information in heterogeneous computing systems and distributed training subtask operation information results of all the sub-computing systems, then predicts single step training time consumption of the distributed training task according to mathematical modeling facing the distributed training task time consumption in the heterogeneous computing systems, and outputs time consumption prediction results. The specific calculation formula is as follows:
here, according to modeling, 200 nodes need to be calculatedAnd then taking the maximum value, and finally obtaining the result.
The following describes a model training time-consuming prediction device, an electronic device, a system and a computer readable storage medium based on a heterogeneous computing system, where the model training time-consuming prediction device, the electronic device, the system and the computer readable storage medium described below and the model training time-consuming prediction method of the heterogeneous computing system described above can be referred to correspondingly.
Referring to fig. 6, fig. 6 is a block diagram of a model training time-consuming prediction apparatus based on a heterogeneous computing system according to an embodiment of the present invention, where the apparatus may include:
The information collection module 601 is configured to obtain a target model, a training set, each computing device type included in the heterogeneous computing system, a training data amount corresponding to each computing device type, and a communication bandwidth between each computing device in the heterogeneous computing system;
the task allocation module 602 is configured to set sub-computing systems corresponding to the types of the computing devices, and allocate training data to the computing devices in each sub-computing system by using the training data amount and the training set; each sub-computing system comprises a plurality of computing devices of the same type, and the number of the computing devices in the sub-computing system is smaller than that of the computing devices in the heterogeneous computing system;
the data collection module 603 is configured to control each sub-computing system to perform multiple rounds of iterative training on the target model by using training data, and record time-consuming information and data transmission amount corresponding to each computing device in each sub-computing system;
the time consumption prediction module 604 is configured to input time consumption information, a communication bandwidth, and a data transmission amount to a preset mathematical model to perform time consumption prediction, so as to obtain predicted time consumption of the heterogeneous computing system training target model.
Optionally, the data collection module 603 is specifically configured to:
when the fact that each sub-computing system completes the preset number of iterative training is determined, time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system are recorded.
Alternatively, the information collecting module 601 may include:
the network address information collecting sub-module is used for obtaining network address information among all computing devices in the heterogeneous computing system;
and the bandwidth measurement sub-module is used for measuring the communication bandwidth among all the computing devices in the heterogeneous computing system according to the network address information.
Optionally, the bandwidth measurement submodule is specifically configured to:
and measuring the communication bandwidth among all the computing devices in the heterogeneous computing system by using a network testing tool according to the network address information.
Alternatively, the information collecting module 601 may include:
and the bandwidth input sub-module is used for receiving the communication bandwidth among all computing devices in the input heterogeneous computing system.
Optionally, the task allocation module 602 includes:
a selection sub-module for selecting, for each computing device type, a plurality of target computing devices from among the heterogeneous computing systems;
the setting sub-module is used for setting the sub-computing system corresponding to each computing device type by utilizing the target computing device corresponding to each computing device type.
Alternatively, each sub-computing system contains two computing devices of the same type.
Optionally, the task allocation module 602 may include:
And the training data distribution sub-module is used for randomly extracting data of the training data quantity from the training set as training data.
Optionally, the data collection module 603 is specifically configured to:
and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system by using a model performance analysis tool.
Alternatively, the information collecting module 601 may include:
and the model information receiving sub-module is used for receiving the target model input in the form of codes and analyzing the target model by utilizing a preset script.
Alternatively, the goal model is trained based on a circular full specification model, the time-consuming prediction module 604 may include:
the single iteration time consumption prediction sub-module is used for inputting time consumption information, communication bandwidth and data transmission quantity into a preset mathematical model to perform time consumption prediction, so as to obtain single iteration time consumption required by each computing device in the heterogeneous computing system to execute single iteration on the target model;
the single iteration total time consumption prediction sub-module is used for taking the maximum value in single iteration time consumption as the single iteration total time consumption required by the heterogeneous computing system to execute single iteration on the target model;
and the predicted time consumption generation sub-module is used for determining the predicted time consumption of the heterogeneous computing system training target model based on the total time consumption of single iteration.
Optionally, the time-consuming information includes a data processing time-consuming and a computing time-consuming, where the data processing time-consuming is a time-consuming for the central processor to perform the training data issuing action to the corresponding computing device, and the computing time-consuming includes a time-consuming for the computing device to perform the forward and backward computation.
Alternatively, the single iteration time-consuming prediction submodule may include:
the first computing unit is used for determining a data processing time consumption average value and a computing time consumption average value corresponding to the computing equipment type of the sub-computing system by utilizing the data processing time consumption and the computing time consumption recorded from the same sub-computing system, and determining the data processing time consumption and the computing time consumption of each computing equipment in the heterogeneous computing system by utilizing the data processing time consumption average value and the computing time consumption average value corresponding to each computing equipment type;
the second calculation unit is used for determining a data transmission quantity average value by utilizing the data transmission quantity, and determining annular full-specification operation time consumption of each calculation device in the heterogeneous calculation system by utilizing the data transmission quantity average value, the communication bandwidth and the annular full-specification operation sequence among the calculation devices in the heterogeneous calculation system;
the third computing unit is used for determining single iteration time required by each computing device in the heterogeneous computing system to execute single iteration on the target model by using the time consumption of data processing, the time consumption of computation and the time consumption of ring full-specification operation of each computing device in the heterogeneous computing system.
Optionally, the training data issuing action is performed in parallel with the forward reverse calculation and the circular full specification operation, and the third calculation unit may include:
a first computing subunit, configured to determine a device processing time consumption of the computing device according to the computing time consumption and the ring full specification operation time consumption of the computing device;
and the second computing subunit is used for taking the maximum value of the data processing time consumption and the device processing time consumption as single iteration time consumption required by the computing device to perform single iteration on the target model.
Optionally, the first computing subunit is specifically configured to:
extracting total forward computing time consumption of the computing device for executing forward computing and backward computing time consumption for executing first backward computing from the computing time consumption;
the device processing time of the computing device is determined by using the forward computing total time consumption, the backward computing time consumption and the ring full specification operation time consumption of each computing device.
Optionally, the second computing unit may include:
a third computing subunit, configured to determine, according to the circular full specification operation order, a neighboring computing device that performs a circular full specification operation with the computing device;
a fourth computing subunit, configured to determine time consumption of the computing device and the adjacent computing device for executing the single circular full specification operation by using the average value of the data transmission amount and the communication bandwidth between the computing device and the adjacent computing device;
A fifth computing subunit, configured to determine, using the number of computing devices included in the heterogeneous computing system, a total number of times the computing device performs a ring full specification operation with the neighboring computing device;
and a sixth computing subunit configured to determine a ring full specification operation time of the computing device using the total number of times and a time consumed by the computing device to perform a single ring full specification operation with the neighboring computing device.
Optionally, the prediction time-consuming generation sub-module may include:
the fourth calculation unit is used for determining the single iteration times required by the heterogeneous computing system to completely train the target model by using the training set according to the total data volume of the training set, the training data volume corresponding to each computing device type, the small quantity of training data volume used by each computing device in the heterogeneous computing system for executing single iteration and the number of computing devices contained in the heterogeneous computing system;
and the fifth calculation unit is used for determining the prediction time consumption of the heterogeneous calculation system for completely training the target model by using the training set by using the single iteration times and the single iteration total time consumption.
Optionally, the prediction time-consuming generation sub-module may further include:
and the sixth calculation unit is used for determining the predicted total time consumption of the heterogeneous calculation system for completely training the target model by using the training set for a plurality of times by using the predicted time consumption.
Referring to fig. 7, fig. 7 is a block diagram of an electronic device according to an embodiment of the present invention, and an electronic device 70 according to an embodiment of the present invention includes a processor 71 and a memory 72; wherein the memory 72 is for storing a computer program; the processor 71 is configured to execute the model training time-consuming prediction method based on the heterogeneous computing system provided in the foregoing embodiment when executing the computer program.
For the specific process of the model training time-consuming prediction method based on the heterogeneous computing system, reference may be made to the corresponding content provided in the foregoing embodiment, and no further description is given here.
The memory 72 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the storage mode may be a temporary storage or a permanent storage.
In addition, the electronic device 70 further includes a power supply 73, a communication interface 74, an input-output interface 75, and a communication bus 76; wherein the power supply 73 is configured to provide an operating voltage for each hardware device on the electronic device 70; the communication interface 74 can create a data transmission channel between the electronic device 70 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present invention, which is not specifically limited herein; the input/output interface 75 is used for obtaining external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
Referring to fig. 8, fig. 8 is a block diagram of a model training time-consuming prediction system based on a heterogeneous computing system according to an embodiment of the present invention, and the embodiment of the present invention further provides a model training time-consuming prediction system based on a heterogeneous computing system, including: the electronic device 810 and the plurality of sub-computing systems 820 are arranged according to the types of the computing devices contained in the heterogeneous computing systems, each sub-computing system contains a plurality of computing devices 821 of the same type, and the number of the computing devices 821 in the sub-computing systems 820 is smaller than the number of the computing devices 821 in the heterogeneous computing systems.
An electronic device 810 for performing the heterogeneous computing system based model training time-consuming prediction method described above;
a plurality of sub-computing systems 820 for performing multiple rounds of iterative training together on the target model using training data under control of the electronic device.
Since the embodiments of the system portion correspond to the embodiments of the model training time-consuming prediction method portion based on the heterogeneous computing system, the embodiments of the system portion refer to the descriptions of the embodiments of the model training time-consuming prediction method portion based on the heterogeneous computing system, which are not repeated herein.
The embodiment of the invention also provides a computer readable storage medium, and a computer program is stored on the computer readable storage medium, and when the computer program is executed by a processor, the steps of the model training time-consuming prediction method based on the heterogeneous computing system in any embodiment are realized.
Since the embodiments of the computer readable storage medium portion and the embodiments of the model training time-consuming prediction method portion based on the heterogeneous computing system correspond to each other, the embodiments of the storage medium portion are referred to for a description of the embodiments of the model training time-consuming prediction method portion based on the heterogeneous computing system, and are not repeated herein.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The method, the device and the system for predicting the model training time consumption based on the heterogeneous computing system provided by the invention are described in detail. The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to facilitate an understanding of the method of the present invention and its core ideas. It should be noted that it will be apparent to those skilled in the art that various modifications and adaptations of the invention can be made without departing from the principles of the invention and these modifications and adaptations are intended to be within the scope of the invention as defined in the following claims.

Claims (22)

1. A model training time-consuming prediction method based on a heterogeneous computing system, comprising:
Acquiring a target model, a training set, various computing equipment types contained in a heterogeneous computing system, training data amounts corresponding to the computing equipment types and communication bandwidths among the computing equipment in the heterogeneous computing system;
setting sub-computing systems corresponding to the computing device types, and distributing training data to computing devices in the sub-computing systems by utilizing the training data quantity and the training set; each of the sub-computing systems includes a plurality of computing devices of the same type, the number of the computing devices in the sub-computing system being less than the number of the computing devices in the heterogeneous computing system;
controlling each sub-computing system to perform multiple rounds of iterative training on the target model by utilizing the training data, and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system;
and inputting the time-consuming information, the communication bandwidth and the data transmission quantity into a preset mathematical model to conduct time-consuming prediction, so as to obtain the predicted time consumption of training the target model by the heterogeneous computing system.
2. The method for predicting time consumption of model training according to claim 1, wherein recording time consumption information and data transmission amount corresponding to each computing device in each sub-computing system comprises:
When determining that each sub-computing system has completed a preset number of iterative training, recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system.
3. The model training time-consuming prediction method of claim 1, wherein the obtaining a communication bandwidth between computing devices in the heterogeneous computing system comprises:
acquiring network address information among all computing devices in the heterogeneous computing system;
and measuring the communication bandwidth among all the computing devices in the heterogeneous computing system according to the network address information.
4. The model training time consuming prediction method of claim 3, wherein the measuring the communication bandwidth between computing devices in the heterogeneous computing system based on the network address information comprises:
and measuring the communication bandwidth among all the computing devices in the heterogeneous computing system by using a network testing tool according to the network address information.
5. The model training time-consuming prediction method of claim 1, wherein the obtaining a communication bandwidth between computing devices in the heterogeneous computing system comprises:
An input communication bandwidth between computing devices in the heterogeneous computing system is received.
6. The method for predicting model training time consumption according to claim 1, wherein said setting up the sub-computing system corresponding to each of the computing device types comprises:
selecting, for each computing device type, a plurality of target computing devices from the heterogeneous computing system;
and setting sub-computing systems corresponding to the computing device types by utilizing target computing devices corresponding to the computing device types.
7. The model training time-consuming prediction method of claim 1, wherein each of the sub-computing systems comprises two computing devices of the same type.
8. The method of model training time-consuming prediction of claim 1, wherein the assigning training data to computing devices in each of the sub-computing systems using the training data amount and the training set comprises:
randomly extracting data of the training data amount from the training set as the training data.
9. The method for predicting time consumption of model training according to claim 1, wherein recording time consumption information and data transmission amount corresponding to each computing device in each sub-computing system comprises:
And recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system by using a model performance analysis tool.
10. The model training time-consuming prediction method of claim 1, wherein the obtaining the target model comprises:
and receiving the target model input in the form of codes, and analyzing the target model by utilizing a preset script.
11. The method for predicting time consumption of model training according to any one of claims 1 to 10, wherein the target model is trained based on a circular full-specification model, and the inputting the time consumption information, the communication bandwidth, and the data transmission amount into a preset mathematical model to perform time consumption prediction, to obtain predicted time consumption of training the target model by the heterogeneous computing system, includes:
inputting the time-consuming information, the communication bandwidth and the data transmission quantity into a preset mathematical model to conduct time-consuming prediction, so as to obtain single iteration time consumption required by each computing device in the heterogeneous computing system to execute single iteration on the target model;
taking the maximum value in the single iteration time consumption as the single iteration total time consumption required by the heterogeneous computing system to execute the single iteration on the target model;
The predicted time consumption of the heterogeneous computing system to train the target model is determined based on the single iteration total time consumption.
12. The model training time-consuming prediction method of claim 11, wherein the time-consuming information comprises a data processing time-consuming and a computing time-consuming, the data processing time-consuming being a time-consuming for a central processor to perform a training data issuing action to a corresponding computing device, the computing time-consuming comprising a time-consuming for the computing device to perform a forward-backward computation.
13. The method for predicting time consumption of model training according to claim 12, wherein the step of inputting the time consumption information, the communication bandwidth, and the data transmission amount into a preset mathematical model to perform time consumption prediction, to obtain a single iteration time required by each computing device in the heterogeneous computing system to perform a single iteration on the target model, includes:
determining a data processing time consumption average value and a calculation time consumption average value corresponding to a calculation device type to which the sub-calculation system belongs by using the data processing time consumption and the calculation time consumption recorded from the same sub-calculation system, and determining the data processing time consumption and the calculation time consumption of each calculation device in the heterogeneous calculation system by using the data processing time consumption average value and the calculation time consumption average value corresponding to each calculation device type;
Determining a data transmission quantity average value by utilizing the data transmission quantity, and determining the annular full-specification operation time consumption of each computing device in the heterogeneous computing system by utilizing the data transmission quantity average value, the communication bandwidth and the annular full-specification operation sequence among the computing devices in the heterogeneous computing system;
and determining single iteration time required by each computing device in the heterogeneous computing system to execute single iteration on the target model by using the time consumption of data processing, the time consumption of computation and the time consumption of ring full-specification operation of each computing device in the heterogeneous computing system.
14. The model training time-consuming prediction method of claim 13, wherein the training data issuing act is performed in parallel with the forward-backward computation and the circular full-specification operation, and wherein determining a single iteration time required for each computing device in the heterogeneous computing system to perform a single iteration on the target model using the data processing time, the computation time, and the circular full-specification operation time of each computing device in the heterogeneous computing system comprises:
determining the equipment processing time consumption of the computing equipment according to the computing time consumption of the computing equipment and the annular full specification operation time consumption;
And taking the maximum value of the data processing time consumption and the device processing time consumption as single iteration time consumption required by the computing device to perform single iteration on the target model.
15. The model training time consuming prediction method of claim 14, wherein the determining the device processing time consuming of the computing device according to the computing time consuming of the computing device and the ring full specification operation time consuming comprises:
extracting forward computing total time consumption of the computing device for performing forward computing and backward computing time consumption for performing first backward computing from the computing time consumption;
and determining the equipment processing time consumption of the computing equipment by using the forward computing total time consumption, the backward computing time consumption and the annular full-specification operation time consumption of each computing equipment.
16. The method of claim 13, wherein determining the circular full specification operating time for each computing device in the heterogeneous computing system using the data transfer volume average, the communication bandwidth, and the circular full specification operating order among the computing devices in the heterogeneous computing system comprises:
determining adjacent computing equipment which performs annular full-specification operation with the computing equipment according to the annular full-specification operation sequence;
Determining the time consumption of the computing device and the adjacent computing device for executing single annular full specification operation by using the data transmission quantity average value and the communication bandwidth between the computing device and the adjacent computing device;
determining a total number of times the computing device performs the circular full specification operation with the neighboring computing device using a number of computing devices contained by the heterogeneous computing system;
and determining the time consumption of the ring full specification operation of the computing device by using the total times and the time consumption of the single ring full specification operation of the computing device and the adjacent computing devices.
17. The model training time consuming prediction method of claim 11, wherein the determining the predicted time consuming for the heterogeneous computing system to train the target model based on the single iteration total time consuming comprises:
determining the single iteration times required by a heterogeneous computing system to completely train the target model by utilizing the training set according to the total data quantity of the training set, the training data quantity corresponding to each computing device type, the small-batch training data quantity used by each computing device in the heterogeneous computing system for executing the single iteration and the number of computing devices contained in the heterogeneous computing system;
And determining the predicted time consumption of the heterogeneous computing system for completely training the target model by using the training set by using the single iteration times and the single iteration total time consumption.
18. The model training time consuming prediction method of claim 17, further comprising, after determining a predicted time consuming for the heterogeneous computing system to fully train the target model with the training set using the single iteration number and the single iteration total time consumption:
and determining a predicted total time consumption of the heterogeneous computing system for completely training the target model by using the training set for a plurality of times by using the predicted time consumption.
19. A model training time-consuming prediction apparatus based on a heterogeneous computing system, comprising:
the information collection module is used for obtaining a target model, a training set, each computing equipment type contained in the heterogeneous computing system, training data quantity corresponding to each computing equipment type and communication bandwidth among computing equipment in the heterogeneous computing system;
the task allocation module is used for setting sub-computing systems corresponding to the types of the computing devices and allocating training data to the computing devices in the sub-computing systems by utilizing the training data quantity and the training set; each of the sub-computing systems includes a plurality of computing devices of the same type, the number of the computing devices in the sub-computing system being less than the number of the computing devices in the heterogeneous computing system;
The data collection module is used for controlling each sub-computing system to perform multi-round iterative training on the target model by utilizing the training data, and recording time-consuming information and data transmission quantity corresponding to each computing device in each sub-computing system;
and the time consumption prediction module is used for inputting the time consumption information, the communication bandwidth and the data transmission quantity into a preset mathematical model to perform time consumption prediction, so as to obtain the predicted time consumption of the heterogeneous computing system for training the target model.
20. An electronic device, comprising:
a memory for storing a computer program;
a processor for implementing the heterogeneous computing system based model training time-consuming prediction method of any of claims 1 to 18 when executing the computer program.
21. A model training time-consuming prediction system based on a heterogeneous computing system, comprising: the electronic equipment and the plurality of sub-computing systems are arranged according to the types of the computing equipment contained in the heterogeneous computing systems, each sub-computing system contains a plurality of computing equipment of the same type, and the number of the computing equipment in the sub-computing system is smaller than that of the computing equipment in the heterogeneous computing system;
The electronic device for performing the heterogeneous computing system based model training time-consuming prediction method of any of claims 1 to 18;
the plurality of sub-computing systems are used for jointly performing multiple rounds of iterative training on the target model by utilizing training data under the control of the electronic equipment.
22. A computer readable storage medium having stored therein computer executable instructions which when loaded and executed by a processor implement the model training time consuming prediction method based on a heterogeneous computing system as claimed in any of claims 1 to 18.
CN202310974618.7A 2023-08-04 2023-08-04 Model training time-consuming prediction method, device and system based on heterogeneous computing system Active CN116720544B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310974618.7A CN116720544B (en) 2023-08-04 2023-08-04 Model training time-consuming prediction method, device and system based on heterogeneous computing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310974618.7A CN116720544B (en) 2023-08-04 2023-08-04 Model training time-consuming prediction method, device and system based on heterogeneous computing system

Publications (2)

Publication Number Publication Date
CN116720544A true CN116720544A (en) 2023-09-08
CN116720544B CN116720544B (en) 2023-11-07

Family

ID=87869979

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310974618.7A Active CN116720544B (en) 2023-08-04 2023-08-04 Model training time-consuming prediction method, device and system based on heterogeneous computing system

Country Status (1)

Country Link
CN (1) CN116720544B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954873A (en) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 Heterogeneous computing system, and method, device, equipment and medium for selecting power nodes of heterogeneous computing system
CN117971630A (en) * 2024-04-01 2024-05-03 浪潮电子信息产业股份有限公司 Heterogeneous computing platform, task simulation and time consumption prediction method, device and equipment thereof

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9582781B1 (en) * 2016-09-01 2017-02-28 PagerDuty, Inc. Real-time adaptive operations performance management system using event clusters and trained models
CN110516805A (en) * 2019-08-23 2019-11-29 广东浪潮大数据研究有限公司 The training duration prediction method and device of training pattern
CN115511186A (en) * 2022-09-29 2022-12-23 苏州浪潮智能科技有限公司 Prediction management method, device and equipment for deep learning training duration
CN116244159A (en) * 2023-05-08 2023-06-09 浪潮电子信息产业股份有限公司 Training duration prediction method and device, multi-heterogeneous computing equipment and medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9582781B1 (en) * 2016-09-01 2017-02-28 PagerDuty, Inc. Real-time adaptive operations performance management system using event clusters and trained models
CN110516805A (en) * 2019-08-23 2019-11-29 广东浪潮大数据研究有限公司 The training duration prediction method and device of training pattern
CN115511186A (en) * 2022-09-29 2022-12-23 苏州浪潮智能科技有限公司 Prediction management method, device and equipment for deep learning training duration
CN116244159A (en) * 2023-05-08 2023-06-09 浪潮电子信息产业股份有限公司 Training duration prediction method and device, multi-heterogeneous computing equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MA GONZALEZ ET AL: "A Comparison of GPU Execution Time Predicting Using Machine Learning and Analytical Modeling", 《2016 IEEE 15TH INTERNATIONAL SYMPOSIUM ON NETWORK COMPUTING AND APPLICATIONS(NCA)》, pages 326 - 333 *
梁正友 等: "异构分布计算环境下应用程序的执行时间预测研究", 《2005年全国开放式分布与并行计算会议》, pages 120 - 121 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116954873A (en) * 2023-09-21 2023-10-27 浪潮电子信息产业股份有限公司 Heterogeneous computing system, and method, device, equipment and medium for selecting power nodes of heterogeneous computing system
CN116954873B (en) * 2023-09-21 2024-01-23 浪潮电子信息产业股份有限公司 Heterogeneous computing system, and method, device, equipment and medium for selecting power nodes of heterogeneous computing system
CN117971630A (en) * 2024-04-01 2024-05-03 浪潮电子信息产业股份有限公司 Heterogeneous computing platform, task simulation and time consumption prediction method, device and equipment thereof

Also Published As

Publication number Publication date
CN116720544B (en) 2023-11-07

Similar Documents

Publication Publication Date Title
CN110515739B (en) Deep learning neural network model load calculation method, device, equipment and medium
CN103136039B (en) For job scheduling method and the system of equilibrium energy consumption and scheduling performance
CN111274036B (en) Scheduling method of deep learning task based on speed prediction
CN116720544B (en) Model training time-consuming prediction method, device and system based on heterogeneous computing system
CN114862656B (en) Multi-GPU-based acquisition method for training cost of distributed deep learning model
Tuli et al. MCDS: AI augmented workflow scheduling in mobile edge cloud computing systems
Wu et al. On performance modeling and prediction in support of scientific workflow optimization
CN110825522A (en) Spark parameter self-adaptive optimization method and system
Alavani et al. Predicting execution time of CUDA kernel using static analysis
Dube et al. AI gauge: Runtime estimation for deep learning in the cloud
CN115543626A (en) Power defect image simulation method adopting heterogeneous computing resource load balancing scheduling
CN106469114A (en) A kind of Parallel Computing Performance detecting system towards communication test and its method
US8065132B2 (en) Computer-implemented systems and methods for augmenting stochastic event simulations for design of experiments
Geng et al. A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures
Jawaddi et al. Integrating OpenAI Gym and CloudSim Plus: A simulation environment for DRL Agent training in energy-driven cloud scaling
US11644882B2 (en) System and method for predicting power usage of network components
CN109901919B (en) Information output method and device
Bowman et al. Performance modeling for 3d visualization in a heterogeneous computing environment
Betting et al. Oikonomos: An Opportunistic, Deep-Learning, Resource-Recommendation System for Cloud HPC
Ladd et al. Optimizing Cloud Computing Resource Usage for Hemodynamic Simulation
Wang et al. A rapid design optimization framework
Wang et al. GPARS: Graph predictive algorithm for efficient resource scheduling in heterogeneous GPU clusters
US11093229B2 (en) Deployment scheduling using failure rate prediction
Verma et al. Demystifying the mlperf benchmark suite
Kerbyson et al. Is predictive tracing too late for HPC users?

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant