WO2022105295A1 - 基于nGraph的GPU后端分布式训练方法和系统 - Google Patents

基于nGraph的GPU后端分布式训练方法和系统 Download PDF

Info

Publication number
WO2022105295A1
WO2022105295A1 PCT/CN2021/109206 CN2021109206W WO2022105295A1 WO 2022105295 A1 WO2022105295 A1 WO 2022105295A1 CN 2021109206 W CN2021109206 W CN 2021109206W WO 2022105295 A1 WO2022105295 A1 WO 2022105295A1
Authority
WO
WIPO (PCT)
Prior art keywords
training
nccl
ngraph
distributed
gpu
Prior art date
Application number
PCT/CN2021/109206
Other languages
English (en)
French (fr)
Inventor
王丽
曹芳
邱志勇
郭振华
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Priority to US18/034,566 priority Critical patent/US12001960B2/en
Publication of WO2022105295A1 publication Critical patent/WO2022105295A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/098Distributed learning, e.g. federated learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of machine learning, and in particular, to an nGraph-based GPU back-end distributed training method, system, and related devices.
  • DNN Deep Neural Networks
  • the nGraph framework as a deep neural network model compiler for various devices and frameworks, can greatly simplify the complexity of deep learning performance optimization across frameworks and hardware platforms, and expand the applicability of deep learning models. sex and portability.
  • the front-end deep learning frameworks that nGraph has supported or are developing and supporting include Tensorflow, MXNet, PaddlePaddle, etc.
  • the back-end hardware acceleration devices that have been supported or are being developed include CPU, NNP (Neural Network Processor, neural network processor), and All kinds of GPUs.
  • NVIDIA GPU acceleration devices are mainly used in deep learning application scenarios to realize cross-device distributed parallel training of large-scale neural network models.
  • OpenMPI open Message Passing Interface
  • the current version of the nGraph framework only supports single-machine and single-card training on the backend such as CPU and GPU, which greatly limits its application scope.
  • the purpose of this application is to provide an nGraph-based GPU back-end distributed training method, system, computer-readable storage medium and electronic device, which can improve the performance of deep learning network training.
  • the present application provides a GPU back-end distributed training method based on nGraph, and the specific technical solutions are as follows:
  • the NCCL communication interface configuration is called to obtain the training model;
  • the NCCL communication interface is a communication operation interface located at the GPU back end of the nGraph framework and based on the NCCL library file;
  • GPU backend training is performed on the training data by using the training model.
  • the method before receiving the training request and acquiring the corresponding training data, the method further includes:
  • Modify the compiled file of the nGraph framework enable the NCCL function in the distributed function of the nGraph framework, and execute the steps of obtaining the NCCL library file through the system path of the NCCL library file linked by the nGraph framework when the NCCL function is enabled.
  • the method further includes:
  • the distributed training type of the training model is determined according to the training request; the distributed training type includes multi-machine distributed and single-machine distributed.
  • training model uses the training model to perform GPU backend training on the training data.
  • training model uses the training model to perform GPU backend training on the training data.
  • the execution environment training initialization includes:
  • the execution environment training initialization includes:
  • training model uses the training model to perform GPU backend training on the training data. After that, it also includes:
  • the occupied memory resources and process resources are released, and the calling step of the NCCL communication interface is ended.
  • the method before calling the NCCL communication interface configuration to obtain the training model according to the training request, the method further includes:
  • a function calling relationship with the corresponding operation of the NCCL library is established for the parameters obtained by parsing, so as to obtain the NCCL communication interface.
  • the NCCL communication interface includes an NCCL-based aggregation operation, an NCCL-based broadcast operation, an NCCL-based sending operation, and an NCCL-based receiving operation.
  • the application also provides an nGraph-based GPU back-end distributed training system, including:
  • the request receiving module is used to receive training requests and obtain corresponding training data
  • the file obtaining module is used to obtain the NCCL library file through the system path of the NCCL library file linked by the nGraph framework;
  • a model generation module configured to call the NCCL communication interface according to the training request to obtain a training model;
  • the NCCL communication interface is a communication operation interface located at the GPU back end of the nGraph framework and based on the NCCL library file;
  • a training module configured to perform GPU backend training on the training data by using the training model.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above-described method.
  • the present application also provides an electronic device, comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the steps of the above method when the computer program in the memory is invoked.
  • the present application provides an nGraph-based GPU back-end distributed training method, including: receiving a training request and obtaining corresponding training data; obtaining the NCCL library file through the system path of the NCCL library file linked by the nGraph framework; according to the training request Calling the NCCL communication interface configuration to obtain a training model; the NCCL communication interface is a communication operation interface located at the GPU backend of the nGraph framework and based on the NCCL library file; using the training model to perform GPU backend training on the training data .
  • This application integrates the NCCL library in the server system into the nGraph framework, so that it can not only support the use of the communication interface functions in the NCCL library to optimize the communication operations of the nGraph GPU backend, but also support users to independently choose a distributed training method during the compilation process.
  • NCCL NCCL communication interfaces such as Allreduce.
  • nGraph can support the distributed training of deep learning networks on the GPU back-end, which expands the application scope of the nGraph framework, so that the nGraph framework can not only support a variety of deep learning frameworks, but also It meets the urgent needs of users for distributed training of neural networks based on the nGraph GPU backend, and further improves the performance of deep learning network training.
  • the present application also provides a GPU back-end distributed training system, a computer-readable storage medium, and an electronic device, which have the above-mentioned beneficial effects, and will not be repeated here.
  • FIG. 1 is a flowchart of an nGraph-based GPU back-end distributed training method provided by an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of an nGraph-based GPU back-end distributed training system provided by an embodiment of the present application.
  • Nvidia's NCCL library Nvidia Collective multi-GPU Communication Library
  • PCIe a bus and its communication protocol
  • IB Infinite Bandwidth
  • the NCCL library installed in the server system needs to be integrated into the nGraph framework, so that the communication operations in the NCCL library can be directly used in the subsequent steps. Integrating the NCCL library into the nGraph framework is mainly divided into two processes:
  • the GPU backend of the nGraph framework provides a list of unsupported operations, including operations related to communication operations such as allreduce (aggregation), send (send), and recv (receive), it cannot support distributed network training on the GPU backend.
  • allreduce aggregation
  • send send
  • recv recv
  • it cannot support distributed network training on the GPU backend.
  • it is necessary to add communication interface support to the GPU backend of the nGraph framework, so that the GPU backend can not only support communication operations such as Allreduce, but also call the implementation of distributed operations in the NCCL library.
  • the added communication-related operations support mainly include: Allreduce, Broadcast (broadcast), Send and Recv, etc. These operations are implemented in the NCCL library after optimization, corresponding to ncclAllreduce, ncclBrodcast, ncclSend, ncclRecv, which are based on NCCL aggregation operation, NCCL based broadcast operation, NCCL based transmit operation and NCCL based receive operation. It should be noted that each operation corresponds to a corresponding interface, and those skilled in the art can also configure interfaces for other communication-related operations on this basis, which should also be within the protection scope of the present application.
  • Step 1 Obtain the communication operation function
  • Step 2 performing parameter analysis on the communication operation function
  • Step 3 Establish a function calling relationship with the corresponding operation of the NCCL library for the parameters obtained by the analysis, and obtain the NCCL communication interface.
  • step 1 it is necessary to obtain the communication operation function.
  • the communication operation function includes but is not limited to Allreduce, Broadcast (broadcast), Send, and Recv mentioned above. Those skilled in the art can also configure the operation required in the training process. Corresponding communication operation interface.
  • step 1 it is necessary to determine the operation function corresponding to the communication operation.
  • the operation function includes the operation object and operation mode of the communication operation, and is defined in the form of a function, that is, the corresponding communication operation function is obtained.
  • the process of configuring the communication operation interface corresponding to the NCCL library is actually to establish a mapping between the nGraph GPU back-end communication operation and the corresponding communication operation in the NCCL library.
  • the deep learning distributed parallel training process of the GPU backend under the nGraph framework can be realized.
  • FIG. 1 is a flowchart of an nGraph-based GPU back-end distributed training method provided by an embodiment of the application, and the method includes:
  • S101 Receive a training request, and obtain corresponding training data
  • the purpose of this step is to receive training requests and obtain corresponding training data.
  • the purpose of this step is to obtain the NCCL library file according to the system path of the NCCL library file. Since the NCCL library file has been linked to the nGraph framework in the configuration process described above, the NCCL library file can be obtained directly according to the recorded address information. .
  • the purpose of this step is to call the NCCL communication interface to process the training data.
  • the NCCL communication interface is the communication operation interface located at the GPU back end of the nGraph framework and based on the NCCL library file.
  • the NCCL library file obtained in step S102 is the basis for realizing the invocation of the NCCL communication interface, that is, the NCCL
  • the library file contains the corresponding instructions or codes of the NCCL communication interface.
  • a module can be integrated in the GPU backend, so that the NCCL communication interface in the module can be directly called when performing distributed training.
  • the training model in this step is actually a function calculation graph, that is, the training model includes the training process in the subsequent training process, not the actual data processing process. That is, in this step, parameters in the execution process, such as which NCCL communication interfaces to call the training data and the calling sequence, are added to the training model, so that when the training model is trained, it is trained according to the execution process recorded in the training model.
  • NCCL library is not integrated into the nGraph framework, it is impossible to call the NCCL library file and the NCCL communication interface in the process of generating the training model, and naturally it is impossible to realize the deep learning of the nGraph GPU backend based on the NCCL library. Distributed parallel training.
  • the distributed training type of the training model may also be determined according to the training request during the execution of this step, and the distributed training type includes multi-machine distributed and single-machine distributed.
  • the type of distributed training includes four processes: environment initialization, GPU device allocation, communication operation implementation, and device resource release.
  • multi-machine distributed environment initialization includes MPI (Message Passing Interface) initialization and NCCL initialization
  • single-machine distributed only includes NCCL initialization.
  • the GPU device allocation process mainly realizes allocating tasks to different GPUs according to the parallel number and number of distributed computing.
  • the communication operation implementation process needs to complete the mapping of the communication-related operations customized by the nGraph GPU backend to the communication operations configured in the NCCL library.
  • This module includes operations such as data reading and data type processing.
  • S104 Use the training model to perform GPU backend training on the training data.
  • the training model can be used to perform GPU back-end training on the training data.
  • the communication interface support in the NCCL library can be added to the GPU backend of the nGraph framework on the basis of the above, so that the GPU backend of the distributed training process can directly support communication operations such as ncclAllreduce.
  • the specific execution process of the GPU backend training is not specifically limited here, and usually includes processes such as creating a GPU backend and initializing the environment.
  • the occupied memory resources and process resources can also be released, and the calling step of the NCCL communication interface is ended.
  • the occupied device memory, MPI process and other resources are released, and the calling step of the NCCL communication interface is ended, which is beneficial to reduce the occupation of system resources and improve the system performance.
  • the embodiment of the present application integrates the NCCL library in the server system into the nGraph framework, so that it can not only support the use of the communication interface functions in the NCCL library to optimize the communication operations of the nGraph GPU backend, but also support users to independently select distributed distribution during the compilation process.
  • the training method is NCCL.
  • the GPU backend supports NCCL communication interfaces such as Allreduce.
  • nGraph can support the distributed training of deep learning networks on the GPU back-end, which expands the application scope of the nGraph framework, so that the nGraph framework can not only support a variety of deep learning frameworks, but also It meets the urgent needs of users for distributed training of neural networks based on the nGraph GPU backend, and further improves the performance of deep learning network training.
  • the first step is to build a function calculation graph
  • the second step is to create a GPU backend
  • the third step is to enter data
  • the fourth step is to open up storage space for the input data
  • the fifth step write the input data into the model, and perform distributed training according to the function calculation graph;
  • the sixth step is to output the training results.
  • the function calculation graph contains the configuration data in the training process, including the training method, that is, using multi-machine distributed or single-machine distributed, as well as resource allocation and equipment allocation, etc., which also includes obtaining NCCL library files and calling
  • the NCCL communication interface and other related processes, that is, the function calculation graph is equivalent to the "instruction manual" of distributed training, which contains the configuration data and training process, and only needs to input the data and then execute the distributed training.
  • the distributed training program there will be communication operations such as Allreduce that aggregates multi-node gradient data.
  • users only need to specify the back-end part of the distributed training code as GPU, and then the GPU back-end distribution can be realized. train.
  • the training request in the previous embodiment can be placed in the function calculation graph as configuration data, and the input data can be trained by calling the NCCL communication interface configuration to obtain the training model according to the information in the function calculation graph.
  • the following describes an nGraph-based GPU back-end distributed training system provided by the embodiments of the present application.
  • the GPU back-end distributed training system described below and a GPU back-end distributed training method based on nGraph described above can be refer to each other.
  • the application also provides a kind of GPU back-end distributed training system based on nGraph, including:
  • a request receiving module 100 configured to receive a training request and obtain corresponding training data
  • the file acquisition module 200 is used to acquire the NCCL library file through the system path of the NCCL library file linked by the nGraph framework;
  • the model generation module 300 is used to call the NCCL communication interface configuration according to the training request to obtain a training model;
  • the NCCL communication interface is a communication operation interface located at the GPU back end of the nGraph framework and based on the NCCL library file;
  • a training module 400 is configured to perform GPU back-end training on the training data by using the training model.
  • a type determination module configured to determine the distributed training type of the training model according to the training request; the distributed training type includes multi-machine distributed and single-machine distributed.
  • the environment initialization module is used to perform GPU back-end training on the training data by using the training model. Before, perform environment training initialization;
  • the environment initialization module is a module for performing MPI initialization and NCCL library initialization
  • the environment initialization module is a module for executing NCCL library initialization.
  • the resource release module is used for releasing occupied memory resources and process resources, and ending the calling steps of the NCCL communication interface.
  • the communication operation interface configuration module is used to obtain the communication operation function; the parameter analysis is carried out to the communication operation function; the function call relationship of the corresponding operation with the NCCL library is established to the parameters obtained by the analysis, and the NCCL communication interface is obtained.
  • the present application also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed, can implement the steps of the nGraph-based GPU backend distributed training method provided by the above embodiments.
  • the storage medium may include: U disk, removable hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the present application also provides an electronic device, which may include a memory and a processor, where a computer program is stored in the memory, and when the processor calls the computer program in the memory, the nGraph-based software provided by the above embodiments can be implemented.
  • the steps of the GPU backend distributed training method may also include various network interfaces, power supplies and other components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Stored Programmes (AREA)
  • Computer And Data Communications (AREA)
  • Multi Processors (AREA)

Abstract

一种基于nGraph的GPU后端分布式训练方法、GPU后端分布式训练系统、计算机可读存储介质和电子设备,包括:接收训练请求,并获取对应的训练数据;通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;利用所述训练模型对所述训练数据进行GPU后端训练。本申请能够满足用户对于基于nGraph GPU后端进行神经网络分布式训练的迫切需求,进一步提升了深度学习网络训练的性能。

Description

基于nGraph的GPU后端分布式训练方法和系统
本申请要求于2020年11月19日提交中国专利局、申请号为202011302180.0、发明名称为“基于nGraph的GPU后端分布式训练方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及机器学习领域,特别涉及一种基于nGraph的GPU后端分布式训练方法、系统及相关装置。
背景技术
近年来,随着人工智能的兴起,深度神经网络(Deep Neural Networks,缩写为DNN)在图像视频分类、语音识别和语言翻译等领域得到广泛应用。随着训练数据集的增大和网络规模的日趋复杂,深度学习的巨量计算需求催生硬件架构不断创新。各深度学习框架(TensorFlow、pytorch等)致力于在各自的应用场景中对框架进行深入修改以在每一个硬件后端(CPU、GPU、FPGA和ASIC)提升训练性能。用户在对不同深度学习应用的开发过程中,不仅需要适配各种框架,还需要支持各种AI加速设备硬件,需要付出大量的精力和时间进行迁移和优化,极大的限制了人工智能应用发展效率。针对以上问题,nGraph框架作为一种面向各种设备和框架的深度神经网络模型编译器,可大大简化跨框架和硬件平台实现深度学习性能优化这类工作的复杂性,扩展了深度学习模型的适用性和可移植性。目前,nGraph已经支持或正在开发支持的前端深度学习框架有Tensorflow,MXNet,PaddlePaddle等,已经支持或正在开发支持的后端硬件加速设备有CPU,NNP(Neural Network Processor,神经网络处理器),及各类GPU。
GPU是当前大规模神经网络模型训练的最主要加速设备,为了提高神经网络模型训练的性能,深度学习各应用场景主要使用英伟达GPU加速设备实现大规模神经网络模型的跨设备分布式并行训练。在nGraph的早期版本中提供了对CPU后端实现基于OpenMPI(open Message Passing Interface, 一种信息传递接口)的多机分布式并行训练支持,然而在其后期版本更新中,为了集中优化单机单卡的训练性能,去除了对分布式训练的支持,目前版本的nGraph框架仅支持CPU、GPU等后端的单机单卡训练,大大限制了其应用范围。
发明内容
本申请的目的是提供一种基于nGraph的GPU后端分布式训练方法、系统、计算机可读存储介质和电子设备,能够提高深度学习网络训练的性能。
为解决上述技术问题,本申请提供一种基于nGraph的GPU后端分布式训练方法,具体技术方案如下:
接收训练请求,并获取对应的训练数据;
通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;
按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;
利用所述训练模型对所述训练数据进行GPU后端训练。
可选的,接收训练请求,并获取对应的训练数据之前,还包括:
在所述nGraph框架源码中添加所述NCCL库文件的系统路径;
修改nGraph框架的编译文件,在所述nGraph框架的分布式功能启用NCCL功能,并在所述NCCL功能启用时执行通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件的步骤。
可选的,按照所述训练请求调用NCCL通信接口配置得到训练模型时,还包括:
根据所述训练请求确定所述训练模型的分布式训练类型;所述分布式训练类型包括多机分布式和单机分布式。
可选的,利用所述训练模型对训练数据进行GPU后端训练。之前,还包括:
执行环境训练初始化;
若所述训练模型的分布式训练类型为多机分布式,所述执行环境训练 初始化包括:
执行MPI初始化和NCCL库初始化;
若所述训练模型的分布式训练类型为单机分布式,所述执行环境训练初始化包括:
执行NCCL库初始化。
可选的,利用所述训练模型对训练数据进行GPU后端训练。后,还包括:
释放占用的内存资源和进程资源,并结束所述NCCL通信接口的调用步骤。
可选的,按照所述训练请求调用NCCL通信接口配置得到训练模型之前,还包括:
获取通信操作函数;
对所述通信操作函数进行参数解析;
对解析得到的参数建立与NCCL库相应操作的函数调用关系,得到所述NCCL通信接口。
可选的,其特征在于,所述NCCL通信接口包括基于NCCL的聚合操作、基于NCCL的广播操作、基于NCCL的发送操作和基于NCCL的接收操作。
本申请还提供一种基于nGraph的GPU后端分布式训练系统,包括:
请求接收模块,用于接收训练请求,并获取对应的训练数据;
文件获取模块,用于通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;
模型生成模块,用于按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;
训练模块,用于利用所述训练模型对所述训练数据进行GPU后端训练。
本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现如上所述的方法的步骤。
本申请还提供一种电子设备,包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如上所述的方法的步骤。
本申请提供一种基于nGraph的GPU后端分布式训练方法,包括:接收训练请求,并获取对应的训练数据;通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;利用所述训练模型对所述训练数据进行GPU后端训练。
本申请将服务器系统中的NCCL库集成到nGraph框架中,使其不仅能够支持使用NCCL库中通信接口函数对nGraph GPU后端的通信操作进行优化,且支持用户在编译过程中自主选择分布式训练方式为NCCL。其次,实现了GPU后端对Allreduce等NCCL通信接口的支持。基于此设计实现nGraph框架GPU后端分布式训练后,能够使nGraph支持GPU后端的深度学习网络分布式训练,扩展了nGraph框架的应用范围,使得nGraph框架不仅能够支持多种深度学习框架,而且能够满足用户对于基于nGraph GPU后端进行神经网络分布式训练的迫切需求,进一步提升了深度学习网络训练的性能。
本申请还提供一种GPU后端分布式训练系统、计算机可读存储介质和电子设备,具有上述有益效果,此处不再赘述。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请实施例所提供的一种基于nGraph的GPU后端分布式训练方法的流程图;
图2为本申请实施例所提供的一种基于nGraph的GPU后端分布式训练系统结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
深度学习中常常需要多GPU并行训练,而Nvidia(英伟达)的NCCL库(Nvidia Collective multi-GPU Communication Library,英伟达集体通信库)在各大深度学习框架(Caffe/Tensorflow/Torch/Theano)的多卡并行中经常被使用。英伟达对NCCL库中的通信策略做了很多优化,以在PCIe、Nvlink(一种总线及其通信协议)、InfiniBand(直译为“无限带宽”技术,缩写为IB,一种计算机通信标准)上实现较高的通信速度。为了实现nGraph框架GPU后端分布式训练的目标,并充分利用NVIDIA GPU通信库NCCL的优势,本发明提出了一种GPU后端分布式训练方法,为了更清楚的描述该方法,下文先针对执行该方案前的配置步骤加以说明:
为了能在nGraph框架中应用NCCL库,需要将将服务器系统中安装的NCCL库集成到nGraph框架中,以供后续步骤中直接使用NCCL库中的通信操作。将NCCL库集成到nGraph框架中,主要分为两个过程:
①在nGraph框架源码中添加NCCL库文件的系统路径:具体的,可以在nGraph源码的cmake moudle中添加FindNCCL.cmake文件,使得nGraph框架能够自动识别系统中已经安装的NCCL库,并链接到NCCL库文件所在的系统路径。
②修改nGraph框架的编译文件,启用nGraph框架的分布式功能中的NCCL功能:
为nGraph分布式功能添加NCCL选项,使得在用户开启分布式NCCL功能时,将上述NCCL库文件路径传达给nGraph编译文件。完成NCCL 库集成后,重新cmake然后编译安装nGraph,即将NCCL库集成到nGraph源码框架中,便于nGraph中其他文件对NCCL库的使用。在修改nGraph框架的编译文件后,NCCL功能即已处于启用状态。
除了需要将NCCL库集成至nGraph框架中外,为了便于通信操作,需要配置NCCL库对应的通信操作接口。由于nGraph框架的GPU后端提供了不支持的操作列表,其中包括通信操作相关的allreduce(聚合)、send(发送)、recv(接收)等操作,因此其不能支持GPU后端的分布式网络训练。为了实现GPU后端的深度学习任务分布式训练,需要对nGraph框架的GPU后端添加通信接口支持,使得GPU后端不仅能够支持Allreduce等通信操作,还能调用到NCCL库分布式操作的实现中,因此添加的通信相关操作支持主要包括:Allreduce、Broadcast(广播)、Send和Recv等,这几个操作在NCCL库中均有优化后的操作实现,分别对应ncclAllreduce、ncclBrodcast、ncclSend、ncclRecv,即基于NCCL的聚合操作、基于NCCL的广播操作、基于NCCL的发送操作和基于NCCL的接收操作。需要注意的是,每种操作对应一个相应的接口,且本领域技术人员还可以在此基础上配置其他通信相关操作的接口,也应在本申请的保护范围内。
在此提供一种配置NCCL库对应的通信操作接口的具体过程:
步骤一、获取通信操作函数;
步骤二、对所述通信操作函数进行参数解析;
步骤三、对解析得到的参数建立与NCCL库相应操作的函数调用关系,得到所述NCCL通信接口。
步骤一中,需要获取通信操作函数,该通信操作函数包括但不限于上文所述的Allreduce、Broadcast(广播)、Send和Recv等,本领域技术人员还可以针对训练过程中所需求的操作配置相应的通信操作接口。在步骤一中,需要确定通信操作对应的操作函数,操作函数中包含通信操作的操作对象和操作方式,以函数形式定义,即得到相应的通信操作函数。此后,对该通信操作函数执行参数解析,从而得到包含操作对象、操作方式等参数,并与NCCL库中相应操作配置函数调用,使得用户在GPU后端训练时,所选择的通信操作函数可以直接作用于NCCL库中的相应操作,以在 NCCL库中实现相应的通信操作。
换句话说,配置NCCL库对应通信操作接口的过程实际也为建立nGraph GPU后端通信操作与NCCL库中相应通信操作的映射。
完成上述配置后,在用户深度学习训练程序中,如指定了使用GPU加速设备,即可实现nGraph框架下GPU后端的深度学习分布式并行训练过程。
请参考图1,图1为本申请实施例所提供的一种基于nGraph的GPU后端分布式训练方法的流程图,该方法包括:
S101:接收训练请求,并获取对应的训练数据;
本步骤旨在接收训练请求,并获取对应的训练数据。在此对于如何接收训练请求、如何获取对应的训练数据并不作具体限定。
S102:通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;
本步骤旨在根据NCCL库文件的系统路径获取NCCL库文件,由于在上文所述的配置过程中已经将NCCL库文件链接至nGraph框架中,此时可以直接根据记载的地址信息获取NCCL库文件。
S103:按照所述训练请求调用NCCL通信接口配置得到训练模型;
本步骤中旨在调用NCCL通信接口对训练数据加以处理。且NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口,换句话说,步骤S102中获取到的NCCL库文件是实现调用NCCL通信接口的基础,即该NCCL库文件中包含了NCCL通信接口的相应指令或代码。具体的,可以在GPU后端集成一个模块,便于执行分布式训练时直接调用模块中的NCCL通信接口。
需要注意的是,本步骤中的训练模型实际为function计算图,即该训练模型中包含了后续训练过程中的训练过程,而并非实际的数据处理过程。即在本步骤中将对训练数据调用哪些NCCL通信接口以及调用顺序等执行过程中的参数添加至训练模型中,使得训练模型被执行训练时,按照训练模型中记录的执行过程加以训练。
而在相关技术中,由于并未将NCCL库集成至nGraph框架中,其根 本无法在生成训练模型的过程中调用NCCL库文件以及NCCL通信接口,自然无法实现基于NCCL库的nGraph GPU后端深度学习分布式并行训练。
作为本步骤的一种优选的执行方式,还可以在执行本步骤的过程中根据训练请求确定训练模型的分布式训练类型,该分布式训练类型包括多机分布式和单机分布式。而无论哪一种分布式训练类型,均包括环境初始化、GPU设备分配、通信操作实现和设备资源释放四个过程。其中,多机分布式的环境初始化包括MPI(Message Passing Interface,信息传递接口)的初始化和NCCL初始化两部分,而单机分布式只包括NCCL初始化。GPU设备分配过程主要实现根据分布式计算的并行数量和编号,将任务分配到不同的GPU上。通信操作实现过程需要完成nGraph GPU后端自定义的通信相关操作到NCCL库中配置好的通信操作的映射,该模块包括数据读取、数据类型处理等操作。
S104:利用所述训练模型对所述训练数据进行GPU后端训练。
在步骤S103得到训练模型后,即可利用训练模型对训练数据进行GPU后端训练。在本申请的实际应用时,可以在上文的基础上在nGraph框架的GPU后端添加NCCL库中的通信接口支持,使得分布式训练过程GPU后端能够直接支持ncclAllreduce等通信操作。
在此对于GPU后端训练的具体执行过程不作具体限定,通常包括创建GPU后端、环境初始化等过程。
作为一种优选的执行方式,对训练模型进行GPU后端训练后,还可以释放占用的内存资源和进程资源,并结束所述NCCL通信接口的调用步骤。在完成相应的通信操作后,释放占用的设备内存、MPI进程等资源,并结束所述NCCL通信接口的调用步骤,有利于降低对系统资源的占用,提高系统性能。
本申请实施例将服务器系统中的NCCL库集成到nGraph框架中,使其不仅能够支持使用NCCL库中通信接口函数对nGraph GPU后端的通信操作进行优化,且支持用户在编译过程中自主选择分布式训练方式为NCCL。其次,实现了GPU后端对Allreduce等NCCL通信接口的支持。基于此设计实现nGraph框架GPU后端分布式训练后,能够使nGraph支持 GPU后端的深度学习网络分布式训练,扩展了nGraph框架的应用范围,使得nGraph框架不仅能够支持多种深度学习框架,而且能够满足用户对于基于nGraph GPU后端进行神经网络分布式训练的迫切需求,进一步提升了深度学习网络训练的性能。
下文以一种具体的GPU后端分布式训练过程对上文所公开的基于nGraph的GPU后端分布式训练方法进行说明:
第一步、构建function计算图;
第二步、创建GPU后端;
第三步、输入数据;
第四步、为输入数据开辟存储空间;
第五步、将输入数据写入模型,按照function计算图执行分布式训练;
第六步、输出训练结果。
在实际训练过程中,需要先构建function计算图。在function计算图中包含了训练过程中的配置数据,包括训练方式,即采用多机分布式或是单机分布式,以及资源分配方式和设备分配方式等等,其中也包含获取NCCL库文件和调用NCCL通信接口等相关过程,即function计算图相当于分布式训练的“说明书”,其中包含了配置数据和训练流程,只需输入数据后执行分布式训练即可。分布式训练程序中会有聚合多节点梯度数据的Allreduce等通信操作,用户在其分布式训练程序中只需要将分布式训练代码的创建后端部分指定为GPU,即可实现GPU后端分布式训练。当然,可以将上一实施例中的训练请求作为配置数据置于function计算图中,即可根据function计算图中的信息调用NCCL通信接口配置得到训练模型对输入数据加以训练。
下面对本申请实施例提供的一种基于nGraph的GPU后端分布式训练系统进行介绍,下文描述的GPU后端分布式训练系统与上文描述的一种基于nGraph的GPU后端分布式训练方法可相互对应参照。
参见图2,本申请还提供一种基于nGraph的GPU后端分布式训练系 统,包括:
请求接收模块100,用于接收训练请求,并获取对应的训练数据;
文件获取模块200,用于通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;
模型生成模块300,用于按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;
训练模块400,用于利用所述训练模型对所述训练数据进行GPU后端训练。
基于上述实施例,作为优选的实施例,还包括:
配置模块,用于在所述nGraph框架源码中添加所述NCCL库文件的系统路径;修改nGraph框架的编译文件,在所述nGraph框架的分布式功能启用NCCL功能,并在所述NCCL功能启用时允许进入所述文件获取模块。
基于上述实施例,作为优选的实施例,还包括:
类型确定模块,用于根据所述训练请求确定所述训练模型的分布式训练类型;所述分布式训练类型包括多机分布式和单机分布式。
基于上述实施例,作为优选的实施例,还包括:
环境初始化模块,用于利用所述训练模型对训练数据进行GPU后端训练。之前,执行环境训练初始化;
若所述训练模型的分布式训练类型为多机分布式,所述环境初始化模块为用于执行MPI初始化和NCCL库初始化的模块;
若所述训练模型的分布式训练类型为单机分布式,所述环境初始化模块为用于执行NCCL库初始化的模块。
基于上述实施例,作为优选的实施例,还可以包括:
资源释放模块,用于释放占用的内存资源和进程资源,并结束所述NCCL通信接口的调用步骤。
基于上述实施例,作为优选的实施例,还可以包括:
通信操作接口配置模块,用于获取通信操作函数;对所述通信操作函 数进行参数解析;对解析得到的参数建立与NCCL库相应操作的函数调用关系,得到所述NCCL通信接口。
本申请还提供了一种计算机可读存储介质,其上存有计算机程序,该计算机程序被执行时可以实现上述实施例所提供的基于nGraph的GPU后端分布式训练方法的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
本申请还提供了一种电子设备,可以包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时,可以实现上述实施例所提供的基于nGraph的GPU后端分布式训练方法的步骤。当然所述电子设备还可以包括各种网络接口,电源等组件。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例提供的系统而言,由于其与实施例提供的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一 个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (10)

  1. 一种基于nGraph的GPU后端分布式训练方法,其特征在于,包括:
    接收训练请求,并获取对应的训练数据;
    通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;
    按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;
    利用所述训练模型对所述训练数据进行GPU后端训练。
  2. 根据权利要求1所述的GPU后端分布式训练方法,其特征在于,接收训练请求,并获取对应的训练数据之前,还包括:
    在所述nGraph框架源码中添加所述NCCL库文件的系统路径;
    修改nGraph框架的编译文件,在所述nGraph框架的分布式功能启用NCCL功能,并在所述NCCL功能启用时执行通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件的步骤。
  3. 根据权利要求1所述的GPU后端分布式训练方法,其特征在于,按照所述训练请求调用NCCL通信接口配置得到训练模型时,还包括:
    根据所述训练请求确定所述训练模型的分布式训练类型;所述分布式训练类型包括多机分布式和单机分布式。
  4. 根据权利要求3所述的GPU后端分布式训练方法,其特征在于,利用所述训练模型对训练数据进行GPU后端训练之前,还包括:
    执行环境训练初始化;
    若所述训练模型的分布式训练类型为多机分布式,所述执行环境训练初始化包括:
    执行MPI初始化和NCCL库初始化;
    若所述训练模型的分布式训练类型为单机分布式,所述执行环境训练初始化包括:
    执行NCCL库初始化。
  5. 根据权利要求1所述的GPU后端分布式训练方法,其特征在于,利用所述训练模型对训练数据进行GPU后端训练后,还包括:
    释放占用的内存资源和进程资源,并结束所述NCCL通信接口的调用步骤。
  6. 根据权利要求1所述的GPU后端分布式训练方法,其特征在于,按照所述训练请求调用NCCL通信接口配置得到训练模型之前,还包括:
    获取通信操作函数;
    对所述通信操作函数进行参数解析;
    对解析得到的参数建立与NCCL库相应操作的函数调用关系,得到所述NCCL通信接口。
  7. 根据权利要求1至6任一项所述的GPU后端分布式训练方法,其特征在于,所述NCCL通信接口包括基于NCCL的聚合操作、基于NCCL的广播操作、基于NCCL的发送操作和基于NCCL的接收操作。
  8. 一种基于nGraph的GPU后端分布式训练系统,其特征在于,包括:
    请求接收模块,用于接收训练请求,并获取对应的训练数据;
    文件获取模块,用于通过nGraph框架链接的NCCL库文件的系统路径获取NCCL库文件;
    模型生成模块,用于按照所述训练请求调用NCCL通信接口配置得到训练模型;所述NCCL通信接口为位于所述nGraph框架GPU后端、基于所述NCCL库文件的通信操作接口;
    训练模块,用于利用所述训练模型对所述训练数据进行GPU后端训练。
  9. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-7任一项所述的GPU后端分布式训练方法的步骤。
  10. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1-7任一项所述的GPU后端分布式训练方法的步骤。
PCT/CN2021/109206 2020-11-19 2021-07-29 基于nGraph的GPU后端分布式训练方法和系统 WO2022105295A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/034,566 US12001960B2 (en) 2020-11-19 2021-07-29 NGraph-based GPU backend distributed training method and system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011302180.0 2020-11-19
CN202011302180.0A CN112465112B (zh) 2020-11-19 2020-11-19 基于nGraph的GPU后端分布式训练方法和系统

Publications (1)

Publication Number Publication Date
WO2022105295A1 true WO2022105295A1 (zh) 2022-05-27

Family

ID=74837727

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109206 WO2022105295A1 (zh) 2020-11-19 2021-07-29 基于nGraph的GPU后端分布式训练方法和系统

Country Status (3)

Country Link
US (1) US12001960B2 (zh)
CN (1) CN112465112B (zh)
WO (1) WO2022105295A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112465112B (zh) * 2020-11-19 2022-06-07 苏州浪潮智能科技有限公司 基于nGraph的GPU后端分布式训练方法和系统
CN114358136B (zh) * 2021-12-10 2024-05-17 鹏城实验室 一种图像数据处理方法、装置、智能终端及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200051201A1 (en) * 2018-08-08 2020-02-13 International Business Machines Corporation Graphic processor unit topology-aware all-reduce operation
CN110908799A (zh) * 2019-11-08 2020-03-24 浪潮电子信息产业股份有限公司 一种分布式训练中的通信方法、装置、设备、介质
CN110991614A (zh) * 2019-11-29 2020-04-10 苏州浪潮智能科技有限公司 一种Linux下GPU神经网络深度学习测试方法和系统
CN111124656A (zh) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机程序产品
CN112465112A (zh) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 基于nGraph的GPU后端分布式训练方法和系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951926B (zh) * 2017-03-29 2020-11-24 山东英特力数据技术有限公司 一种混合架构的深度学习方法及装置
US11270201B2 (en) * 2017-12-29 2022-03-08 Intel Corporation Communication optimizations for distributed machine learning
US20190378016A1 (en) * 2018-06-07 2019-12-12 International Business Machines Corporation Distributed computing architecture for large model deep learning
CN110969198A (zh) * 2019-11-24 2020-04-07 广东浪潮大数据研究有限公司 深度学习模型的分布式训练方法、装置、设备及存储介质
CN111274018A (zh) * 2020-01-21 2020-06-12 行星算力(深圳)科技有限公司 一种基于dl框架下的分布式训练方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200051201A1 (en) * 2018-08-08 2020-02-13 International Business Machines Corporation Graphic processor unit topology-aware all-reduce operation
CN111124656A (zh) * 2018-10-31 2020-05-08 伊姆西Ip控股有限责任公司 用于向专用计算资源分配任务的方法、设备和计算机程序产品
CN110908799A (zh) * 2019-11-08 2020-03-24 浪潮电子信息产业股份有限公司 一种分布式训练中的通信方法、装置、设备、介质
CN110991614A (zh) * 2019-11-29 2020-04-10 苏州浪潮智能科技有限公司 一种Linux下GPU神经网络深度学习测试方法和系统
CN112465112A (zh) * 2020-11-19 2021-03-09 苏州浪潮智能科技有限公司 基于nGraph的GPU后端分布式训练方法和系统

Also Published As

Publication number Publication date
CN112465112B (zh) 2022-06-07
US12001960B2 (en) 2024-06-04
US20230316089A1 (en) 2023-10-05
CN112465112A (zh) 2021-03-09

Similar Documents

Publication Publication Date Title
EP3754495B1 (en) Data processing method and related products
WO2022262167A1 (zh) 集群资源调度方法及装置、电子设备和存储介质
US9983857B2 (en) Dynamic computational acceleration using a heterogeneous hardware infrastructure
WO2020233369A1 (zh) 基于模拟端口改进软件集成系统的方法及相关设备
WO2022105295A1 (zh) 基于nGraph的GPU后端分布式训练方法和系统
WO2022002030A1 (zh) 数据处理方法、装置、设备及计算机可读存储介质
JP2022013649A (ja) Dagインタラクションに基づくストリーミングコンピューティング方法及び装置
US9501304B1 (en) Lightweight application virtualization architecture
WO2016010830A1 (en) Composing and executing workflows made up of functional pluggable building blocks
CN111507476A (zh) 部署机器学习模型的方法、设备和计算机程序产品
US8938712B2 (en) Cross-platform virtual machine and method
WO2021000971A1 (zh) 操作数据的生成方法、装置及相关产品
CN112363913B (zh) 一种并行测试任务调度寻优的方法、装置和计算设备
CN114363170A (zh) 容器服务网络配置方法及相关产品
US20210158131A1 (en) Hierarchical partitioning of operators
KR20210141704A (ko) 네트워크 기반 미디어 처리(nbmp)에서의 미디어 처리 함수를 대한 구성 파라미터의 그래프 표현 및 설명
CN108595331A (zh) 异步接口的测试方法、介质、装置和计算设备
WO2023232006A1 (zh) 仿真装置、仿真系统及其仿真方法、存储介质
KR20160046223A (ko) 멀티 쓰레딩 기반 멀티 코어 에뮬레이션 장치 및 방법
US20220172044A1 (en) Method, electronic device, and computer program product for deploying machine learning model
CN111679859B (zh) 一种面向i/o密集型高性能应用的自动化并行mpi-i/o加速方法
US8918767B2 (en) Pattern-based compilation of asynchronous consumption
US8276165B2 (en) Continuation-based runtime callback invocation
WO2021168711A1 (zh) 编译控制方法、编译控制装置和存储介质
Wu et al. An automatic artificial intelligence training platform based on kubernetes

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893447

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893447

Country of ref document: EP

Kind code of ref document: A1