WO2023207035A1 - 一种数据同步方法、装置、设备及存储介质 - Google Patents

一种数据同步方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2023207035A1
WO2023207035A1 PCT/CN2022/132053 CN2022132053W WO2023207035A1 WO 2023207035 A1 WO2023207035 A1 WO 2023207035A1 CN 2022132053 W CN2022132053 W CN 2022132053W WO 2023207035 A1 WO2023207035 A1 WO 2023207035A1
Authority
WO
WIPO (PCT)
Prior art keywords
acceleration
data
acceleration devices
physical topology
devices
Prior art date
Application number
PCT/CN2022/132053
Other languages
English (en)
French (fr)
Inventor
曹芳
郭振华
王丽
高开
赵雅倩
李仁刚
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Publication of WO2023207035A1 publication Critical patent/WO2023207035A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L49/00Packet switching elements
    • H04L49/90Buffering arrangements
    • H04L49/9084Reactions to storage capacity overflow
    • H04L49/9089Reactions to storage capacity overflow replacing packets in a storage arrangement, e.g. pushout
    • H04L49/9094Arrangements for simultaneous transmit and receive, e.g. simultaneous reading/writing from/to the storage element
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/12Discovery or management of network topologies

Definitions

  • This application relates to the technical field of model training, and in particular to a data synchronization method, device, equipment and storage medium.
  • the current distributed model training methods include data parallelism and model parallelism.
  • the data parallel method divides the input data to be trained, and trains multiple batches of data simultaneously on multiple acceleration devices during each training iteration.
  • Data parallelism is divided into two methods: synchronous data parallelism and asynchronous data parallelism.
  • synchronous data parallel method after all acceleration devices calculate the batch data gradients, they combine multiple gradients together and update the shared model parameters.
  • Allreduce set communication
  • Allreduce is a set communication operator. Its goal is to integrate data in different computing nodes and then distribute the results to Each node ultimately makes each computing node have integrated data.
  • the equipment used in the process of synchronous data parallel training is required to be similar equipment, such as all using GPU (Graphics Processing Unit, graphics processor) equipment or all using FPGA (Field Programmable Gata Array, field programmable gate array) equipment.
  • GPU Graphics Processing Unit
  • FPGA Field Programmable Gata Array, field programmable gate array
  • Allreduce process requires communication and exchange of data between devices. Communication between similar devices usually has higher bandwidth and lower latency, while communication between heterogeneous devices usually requires a higher price.
  • GPU devices can communicate at high speed through NVLink (NVIDIA Link, a bus and communication protocol developed and launched by NVIDIA).
  • NVLink NVIDIA Link, a bus and communication protocol developed and launched by NVIDIA
  • communication between GPU and FPGA often requires the CPU to be used as an intermediate medium for transmission, which reduces efficiency. Very low.
  • the purpose of this application is to provide a data synchronization method, device, equipment and storage medium that can realize deep learning data parallelism based on multiple heterogeneous acceleration devices and improve hardware resource utilization and data communication efficiency.
  • the specific plan is as follows:
  • the first aspect of this application provides a data synchronization method, including:
  • a first-level physical topology of a ring structure between acceleration devices of the same type on the target server and the number of acceleration devices of the same type is constructed between acceleration devices of different types on the target server.
  • the secondary physical topology of a consistent ring structure wherein, the target server is provided with different types of acceleration devices that support the cache consistency protocol, and the number of different types of acceleration devices is the same and at least two, and the secondary physical topology is Each acceleration device in the topology is connected through the cache consistency protocol;
  • the first processing is performed on the data to be synchronized related to model training in the same type of acceleration equipment through the scatter_reduce communication method, and according to the second-level physical topology, the different types of acceleration equipment are processed through the scatter_reduce communication method.
  • the data after the first processing undergoes the second processing;
  • the acceleration devices corresponding to each of the first-level physical topologies are independent of each other.
  • the acceleration devices corresponding to different first-level physical topologies are executed concurrently;
  • the acceleration devices corresponding to each of the secondary physical topologies are independent of each other.
  • the acceleration devices corresponding to different secondary physical topologies are executed concurrently.
  • the data synchronization method also includes:
  • a three-level physical topology of a ring structure consistent with the number of acceleration devices of the same type in each target server is constructed between acceleration devices of the same type in different target servers. ; Wherein, each of the three-level physical topology contains the same number of acceleration devices as the target servers, and each acceleration device is located on a different target server;
  • performing a reduction operation on the second processed data in acceleration devices in different target servers according to the three-level physical topology and broadcasting the reduced data to each acceleration device respectively includes:
  • Data is broadcast to each acceleration device respectively; wherein each target server is connected to one of the programmable switches.
  • the first-level physical topology of a ring structure with the same number of types of acceleration devices is constructed between acceleration devices of the same type on the target server, and the same type of physical topology is constructed between acceleration devices of different types on the target server.
  • the number of types of acceleration devices is consistent with the secondary physical topology of the ring structure, including:
  • the first-level physical topology corresponding to each target server is constructed between acceleration devices of the same type in each target server, and is constructed between acceleration devices of different types in each target server.
  • the second-processed data in different types of acceleration devices are subjected to the third processing through the all_gather communication method according to the second-level physical topology, and the same type of data is processed through the all_gather communication method according to the first-level physical topology.
  • the third processed data in the acceleration device undergoes fourth processing, including:
  • the first-level physical topology corresponding to the target server performs fourth processing on the third-processed data in the same type of acceleration device of each target server through the all_gather communication method.
  • the acceleration devices corresponding to each of the three-level physical topologies are independent of each other.
  • the acceleration devices corresponding to different three-level physical topologies are executed concurrently.
  • acceleration devices of the same type before building a physical topology between acceleration devices of the same type, also include:
  • the second aspect of this application provides a data synchronization device, including:
  • the topology building module is used to build a first-level physical topology of a ring structure with the same number of types of acceleration devices between acceleration devices of the same type on the target server, and to build the same first-level physical topology between acceleration devices of different types on the target server.
  • a two-level physical topology of a ring structure with a consistent number of types of acceleration devices wherein, the target server is provided with different types of acceleration devices that support the cache consistency protocol, and the number of different types of acceleration devices is the same and at least two , each acceleration device in the second-level physical topology is connected through a cache consistency protocol;
  • the first synchronization module is used to perform first processing on the data to be synchronized related to model training in the same type of acceleration equipment through the scatter_reduce communication method according to the first-level physical topology, and to use the scatter_reduce communication method according to the second-level physical topology. Perform second processing on the first-processed data in different types of acceleration devices;
  • the second synchronization module is configured to perform third processing on the second-processed data in different types of acceleration devices according to the second-level physical topology through the all_gather communication method, and to perform the third processing on the same data through the all_gather communication method according to the first-level physical topology.
  • the third processed data in the acceleration device of the type is subjected to the fourth processing.
  • a third aspect of the present application provides an electronic device, which includes a processor and a memory; wherein the memory is used to store a computer program, and the computer program is loaded and executed by the processor to achieve the aforementioned data synchronization. method.
  • the fourth aspect of the present application provides a computer non-volatile readable storage medium.
  • Computer executable instructions are stored in the computer non-volatile readable storage medium.
  • the computer executable instructions are loaded by a processor and When executed, the aforementioned data synchronization method is implemented.
  • a first-level physical topology of a ring structure with the same number of types of acceleration devices is first constructed between acceleration devices of the same type on the target server, and then a same type of topology is constructed between acceleration devices of different types on the target server.
  • a two-level physical topology of a ring structure with a consistent number of acceleration devices wherein, the target server is provided with different types of acceleration devices that support cache consistency protocols, and the number of different types of acceleration devices is the same and at least two,
  • Each acceleration device in the second-level physical topology is connected through a cache consistency protocol; then the data to be synchronized in the same type of acceleration device is first processed through scatter_reduce communication according to the first-level physical topology, and the data to be synchronized is processed as described
  • the secondary physical topology performs second processing on the first processed data in different types of acceleration devices through scatter_reduce communication method; finally, according to the secondary physical topology, the second processed data on different types of acceleration devices are processed through all_gather communication method.
  • the data is subjected to the third processing, and the third processed data in the same type of acceleration device is subjected to the fourth processing through the all_gather communication method according to the first-level physical topology.
  • this application constructs a physical topology based on the cache consistency protocol connection method between different types of acceleration devices, and simultaneously performs scatter_reduce communication and all_gather communication in combination with the physical topology constructed by the same type of acceleration devices. It can synchronize data between different types of acceleration devices, that is, heterogeneous acceleration devices, and implement deep learning data parallelism based on multiple heterogeneous acceleration devices. At the same time, the utilization rate of hardware resources is improved, making the data communication efficiency in the process of deep learning synchronized data parallel training higher.
  • Figure 1 is a flow chart of a data synchronization method provided by this application.
  • FIG. 2 is a schematic diagram of a specific data synchronization method provided by this application.
  • FIG. 3 is a schematic diagram of a CXL heterogeneous device cluster provided by this application.
  • Figure 4 is a first-level physical topology diagram provided by this application.
  • Figure 5 is a secondary physical topology diagram provided by this application.
  • Figure 6 is a three-level physical topology diagram provided by this application.
  • Figure 7 is a schematic structural diagram of a data synchronization device provided by this application.
  • Figure 8 is a structural diagram of a data synchronization electronic device provided by this application.
  • this application provides a data synchronization solution that can realize deep learning data parallelism based on multiple heterogeneous acceleration devices and improve hardware resource utilization and data communication efficiency.
  • Figure 1 is a flow chart of a data synchronization method provided by an embodiment of the present application. As shown in Figure 1, the data synchronization method includes:
  • S11 Construct a first-level physical topology of a ring structure between acceleration devices of the same type on the target server and the number of acceleration devices of the same type is constructed between acceleration devices of different types on the target server.
  • the number of different types of acceleration devices is the same and at least two.
  • Each acceleration device in the two-level physical topology is Devices connect via the cache coherence protocol.
  • the target server is equipped with different types of acceleration devices that support cache coherence protocols.
  • the acceleration devices include but are not limited to GPUs, FPGAs and other devices.
  • the number of acceleration devices of different types is the same and at least two.
  • Acceleration devices under the cache coherence protocol can also be called CXL (Compute Express Link) devices.
  • CXL Computer Express Link
  • the full name of CXL is an open industrial standard proposed by Intel for high-bandwidth and low-latency device interconnection. It can be used to connect to CPUs. (Central Processing Unit) and Accelerator (Accelerator), Memory Buffer (Memory Buffer Register) and Smart NIC (Intelligent Network Card) and other types of devices.
  • CXL solves the problem of inefficient communication between heterogeneous devices and makes it possible to conduct deep learning data parallel method training based on multiple heterogeneous devices.
  • connection by using CPU as an intermediate medium there are two ways of physical connection between heterogeneous devices, namely, connection by using CPU as an intermediate medium and connection by cache consistency protocol. Since the acceleration devices in the target server all support the cache consistency protocol, and the bandwidth of data transmission using the cache consistency protocol is significantly higher than the transmission method using the CPU as an intermediate medium, the CXL connection method is chosen here. Obtain a CXL device from different types of devices in the same server node, and connect these heterogeneous devices using CXL. That is, heterogeneous devices are connected through cache consistency protocols, that is, each acceleration device in the secondary physical topology is connected through cache consistency protocols.
  • the bandwidth of data transmission through CXL connection is not necessarily optimal. Therefore, when building a first-level physical topology, it is necessary to first determine whether the bandwidth of data transmission between the same type of acceleration devices through other available connection methods is higher than the bandwidth of data transmission through cache coherence protocol connections. If the same type of acceleration devices If the bandwidth for data transmission between devices through other available connection methods is higher than the bandwidth for data transmission through cache coherence protocol connections, the physical topology is constructed using other available connection methods between acceleration devices of the same type. Other available connection methods can be their original connection methods.
  • the NVLink method By comparing the CXL bandwidth between similar devices with the bandwidth of their original connection methods, select the relatively optimal bandwidth connection method, and use this optimal connection method to connect the same target Similar acceleration devices in the server are connected in pairs. For example, after comparing the data transmission bandwidth of NVLink connection between GPU devices is better than the bandwidth of CXL connection, the NVLink method is selected in this topology.
  • S12 Perform the first processing of the data to be synchronized related to model training in the same type of acceleration equipment through scatter_reduce communication method according to the first-level physical topology, and perform the first processing on different types of acceleration equipment through scatter_reduce communication method according to the second-level physical topology.
  • the processed data undergoes secondary processing.
  • S13 Perform the third processing on the second-processed data in different types of acceleration devices through the all_gather communication method according to the second-level physical topology, and perform the third processing on the same type of acceleration equipment through the all_gather communication method according to the first-level physical topology.
  • the final data undergoes the fourth processing.
  • the AllReduce aggregation operation includes a scatter_reduce (dispersed reduction) phase and an all_gather (all-gathering reduction) phase.
  • the specific execution logic of each stage is consistent with the execution logic in the prior art, and will not be described in detail in this embodiment. The difference is that the execution basis of this embodiment is the built physical topology and the secondary physical topology.
  • the data to be synchronized related to model training in the same type of acceleration equipment is first processed according to the first-level physical topology through scatter_reduce communication method, and different types of acceleration equipment are processed according to the second-level physical topology through scatter_reduce communication method. The data after the first processing is subjected to the second processing.
  • each acceleration device of the target server has complete global data aggregation results.
  • the acceleration devices corresponding to each first-level physical topology are independent of each other.
  • the acceleration devices corresponding to different first-level physical topologies are executed concurrently.
  • the acceleration devices corresponding to each secondary physical topology are independent of each other.
  • the acceleration devices corresponding to different secondary physical topologies are executed concurrently.
  • a first-level physical topology of a ring structure with the same number of types of acceleration devices is constructed between acceleration devices of the same type on the target server, and then a same type of physical topology is constructed between acceleration devices of different types on the target server.
  • Each acceleration device in the physical topology is connected through the cache consistency protocol; then the data to be synchronized in the acceleration devices of the same type is first processed through the scatter_reduce communication method according to the first-level physical topology, and the data to be synchronized is processed according to the second-level physical topology through the scatter_reduce communication method. Perform the second processing on the first processed data in different types of acceleration devices; finally perform the third processing on the second processed data in different types of acceleration devices through the all_gather communication method according to the second-level physical topology, and perform the third processing according to the first-level physical topology.
  • the physical topology performs fourth processing on the third processed data in the same type of acceleration device through the all_gather communication method.
  • the embodiment of the present application constructs a physical topology based on the cache consistency protocol connection method between different types of acceleration devices, and simultaneously performs scatter_reduce communication and all_gather communication in combination with the physical topology constructed by the same type of acceleration devices. It can synchronize data between different types of acceleration devices, that is, heterogeneous acceleration devices, and implement deep learning data parallelism based on multiple heterogeneous acceleration devices. At the same time, the utilization rate of hardware resources is improved, making the data communication efficiency in the process of deep learning synchronized data parallel training higher.
  • FIG. 2 is a flow chart of a specific data synchronization method provided by an embodiment of the present application. As shown in Figure 2, the data synchronization method includes:
  • the infrastructure for data synchronization is a server cluster, that is, there are multiple target servers, and each target service contains the same data and types of acceleration devices, so as to obtain a solution where various heterogeneous devices supporting the CXL protocol are deployed.
  • Server cluster for deep neural network training shows a schematic diagram of the customized CXL device cluster in this embodiment. It is assumed that the cluster contains m servers, and the CXL devices on each server are evenly distributed, that is, there are the same number of CXL heterogeneous devices on each server, and the number of various types of heterogeneous devices in each server node is also the same.
  • each server node there are P CXL heterogeneous devices of n types in the cluster, and these n types of devices are deployed on each server, and the number of CXL devices of each type on each server node is P/(mn).
  • Each server node is connected through a programmable switch.
  • a first-level physical topology corresponding to each target server is constructed between the acceleration devices of the same type on each target server, and a first-level physical topology corresponding to each target server is constructed between acceleration devices of different types on each target server.
  • the first-level physical topology includes ring_1_1 composed of ⁇ CXL_A01, CXL_A02, CXL_A03,... ⁇ , ring_1_2 composed of ⁇ CXL_B01, CXL_B02, CXL_B03,... ⁇ , etc., as shown in Figure 4.
  • the secondary physical topology includes ring_2_1 consisting of ⁇ CXL_A01, CXL_B01,... ⁇ , ring_2_2 consisting of ⁇ CXL_A02, CXL_B02,... ⁇ , etc., as shown in Figure 5.
  • S22 Construct a three-level physical topology of a ring structure between acceleration devices of the same type on different target servers consistent with the number of acceleration devices of the same type in each target server; where each three-level physical topology contains The target server has the same number of acceleration devices and each acceleration device is located on a different target server.
  • Each level-3 physical topology contains the same number of acceleration devices as the target servers, and each acceleration device is located on a different target server.
  • Each three-level physical topology includes ring_3_1 composed of ⁇ CXL_A01, CXL_A11,...,CXL_AM1 ⁇ , ring_3_2 composed of ⁇ CXL_A02, CXL_A12,...,CXL_AM2 ⁇ , ring_3_3 composed of ⁇ CXL_B01, CXL_B11,...,CXL_BM1 ⁇ , etc., As shown in Figure 6.
  • S23 Perform first processing on the data to be synchronized related to model training in the same type of acceleration device through scatter_reduce communication method according to the first-level physical topology corresponding to each target server, and perform the first processing according to the second-level physical topology corresponding to each target server. Perform second processing on the first-processed data in different types of acceleration devices through scatter_reduce communication method.
  • S24 Perform a reduction operation on the second processed data in the acceleration devices in different target servers according to the three-level physical topology and broadcast the reduced data to each acceleration device respectively, so as to determine the broadcast data as each acceleration device.
  • the second processed data The second processed data.
  • each CXL device in the level 1 physical topology has a partial aggregation result of the data block that is different from any other CXL device in that topology.
  • the final data is determined as the second processed data in each acceleration device.
  • Different target servers are connected through programmable switches. That is to say, programmable switches are used to receive the second processed data from acceleration devices in different target servers, and the received data are processed according to the three-level physical topology. At the same time, programmable switches are used to broadcast the processed data separately. to each acceleration device.
  • the acceleration devices corresponding to each three-level physical topology are independent of each other. When executing protocol operations, the acceleration devices corresponding to different three-level physical topologies execute concurrently.
  • S25 Perform third processing on the second-processed data in different types of acceleration devices of each target server through the all_gather communication method according to the secondary physical topology corresponding to each target server, and perform third processing according to the secondary physical topology corresponding to each target server.
  • the first-level physical topology performs fourth processing on the third-processed data in the same type of acceleration device of each target server through the all_gather communication method.
  • the second-processed data in the different types of acceleration devices of each target server is subjected to the third processing through the all_gather communication method according to the secondary physical topology corresponding to each target server. That is, return to the secondary physical topology to perform the all_gather operation.
  • the third-processed data in the same type of acceleration device of each target server is processed through the all_gather communication method, that is, the first-level physical topology is returned and executed. all_gather operation.
  • each acceleration device of the target server has complete global data aggregation results.
  • CXL heterogeneous device cluster that is, a server cluster for deep neural network training that deploys various heterogeneous devices that support the CXL protocol. Then a hierarchical physical topology is constructed based on the described heterogeneous device cluster, and the physical topology is divided into three levels. By performing specific operations on each level, the complete Allreduce aggregation result is finally obtained, which solves the problem of CXL heterogeneous device cluster.
  • the Allreduce data aggregation problem in synchronous data parallel training improves the utilization of hardware resources in the data center.
  • a data synchronization device which includes:
  • the topology building module 11 is used to build a first-level physical topology of a ring structure with the same number of types of acceleration devices between acceleration devices of the same type on the target server, and to build a first-level physical topology of the same type between acceleration devices of different types on the target server.
  • Each acceleration device in the physical topology is connected through the cache consistency protocol;
  • the first synchronization module 12 is used to perform the first processing on the data to be synchronized related to model training in the acceleration equipment of the same type through the scatter_reduce communication method according to the first-level physical topology, and to perform the first processing on different types of data related to model training according to the second-level physical topology through the scatter_reduce communication method.
  • the first processed data in the acceleration device is subjected to the second processing;
  • the second synchronization module 13 is used to perform third processing on the second-processed data in different types of acceleration devices through the all_gather communication method according to the second-level physical topology, and to accelerate the same type of data through the all_gather communication method according to the first-level physical topology.
  • the third processed data in the device undergoes fourth processing.
  • a first-level physical topology of a ring structure with the same number of types of acceleration devices is constructed between acceleration devices of the same type on the target server, and then a same type of physical topology is constructed between acceleration devices of different types on the target server.
  • Each acceleration device in the physical topology is connected through the cache consistency protocol; then the data to be synchronized in the acceleration devices of the same type is first processed through the scatter_reduce communication method according to the first-level physical topology, and the data to be synchronized is processed according to the second-level physical topology through the scatter_reduce communication method. Perform the second processing on the first processed data in different types of acceleration devices; finally perform the third processing on the second processed data in different types of acceleration devices through the all_gather communication method according to the second-level physical topology, and perform the third processing according to the first-level physical topology.
  • the physical topology performs fourth processing on the third processed data in the same type of acceleration device through the all_gather communication method.
  • the embodiment of the present application constructs a physical topology based on the cache consistency protocol connection method between different types of acceleration devices, and simultaneously performs scatter_reduce communication and all_gather communication in combination with the physical topology constructed by the same type of acceleration devices. It can synchronize data between different types of acceleration devices, that is, heterogeneous acceleration devices, and implement deep learning data parallelism based on multiple heterogeneous acceleration devices. At the same time, the utilization rate of hardware resources is improved, making the data communication efficiency in the process of deep learning synchronized data parallel training higher.
  • the data synchronization device when there are multiple target servers, the data synchronization device further includes:
  • the cluster topology building module is used to build a three-level physical topology of a ring structure between acceleration devices of the same type on different target servers consistent with the number of acceleration devices of the same type in each target server; where, each level The physical topology contains the same number of acceleration devices as the target servers, and each acceleration device is located on a different target server;
  • the protocol and broadcast module is used to perform protocol operations on the second processed data in the acceleration devices in different target servers according to the three-level physical topology and broadcast the data after the protocol operations to each acceleration device respectively, so as to convert the processed data after the protocol operations.
  • the data is determined to be the second processed data in each acceleration device.
  • the data synchronization device further includes:
  • Determination module used to determine whether the bandwidth of data transmission between acceleration devices of the same type through other available connections is higher than the bandwidth of data transmission through cache consistency protocol connections. If the bandwidth of data transmission between acceleration devices of the same type through other available connections is If the bandwidth of data transmission is higher than the bandwidth of data transmission through cache coherence protocol connection, the physical topology is constructed using other available connection methods between acceleration devices of the same type.
  • FIG. 8 is a structural diagram of the electronic device 20 according to an exemplary embodiment. The content in the figure cannot be considered as any limitation on the scope of the present application.
  • FIG. 8 is a schematic structural diagram of an electronic device 20 provided by an embodiment of the present application.
  • the electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input-output interface 25 and a communication bus 26.
  • the memory 22 is used to store computer programs, and the computer programs are loaded and executed by the processor 21 to implement relevant steps in the data synchronization method disclosed in any of the foregoing embodiments.
  • the power supply 23 is used to provide working voltage for each hardware device on the electronic device 20;
  • the communication interface 24 can create a data transmission channel between the electronic device 20 and external devices, and the communication protocol it follows can be applicable Any communication protocol of the technical solution of this application is not specifically limited here;
  • the input and output interface 25 is used to obtain external input data or output data to the external world, and its specific interface type can be selected according to specific application needs. Here Not specifically limited.
  • the memory 22, as a carrier for resource storage can be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc.
  • the resources stored thereon can include an operating system 221, a computer program 222 and data 223, etc., and the storage method can be short-term storage. Or stored permanently.
  • the operating system 221 is used to manage and control each hardware device and the computer program 222 on the electronic device 20 to realize the calculation and processing of the massive data 223 in the memory 22 by the processor 21.
  • It can be Windows Server, Netware, Unix, Linux etc.
  • the computer program 222 may further include computer programs that can be used to complete other specific tasks.
  • the data 223 may include topological data collected by the electronic device 20 and the like.
  • embodiments of the present application also disclose a non-volatile readable storage medium.
  • a computer program is stored in the non-volatile readable storage medium. When the computer program is loaded and executed by the processor, any of the foregoing implementations can be realized. Example of public data synchronization method steps.

Abstract

一种数据同步方法、装置、设备及存储介质,方法包括:在相同种类的加速设备之间构建一级物理拓扑,在不同种类的加速设备之间构建二级物理拓扑;二级物理拓扑中各加速设备通过缓存一致性协议连接;按照一级物理拓扑通过scatter_reduce通信方式对加速设备中的待同步数据进行第一处理,按照二级物理拓扑通过scatter_reduce通信方式对加速设备中第一处理后的数据进行第二处理;按照二级物理拓扑通过all_gather通信方式对加速设备中第二处理后的数据进行第三处理,并按照一级物理拓扑通过all_gather通信方式对加速设备中的第三处理后的数据进行第四处理。

Description

一种数据同步方法、装置、设备及存储介质
相关申请的交叉引用
本申请要求于2022年04月29日提交中国专利局、申请号202210468218.4、申请名称为“一种数据同步方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及模型训练技术领域,特别涉及一种数据同步方法、装置、设备及存储介质。
背景技术
随着深度神经网络的广泛应用,其模型尺寸变得越来越大,这种增长使得高效的模型训练变得更加重要,分布式训练应运而生。目前的分布式模型训练方法有数据并行和模型并行两种。其中最常用且应用最广泛的是数据并行方法。数据并行方法将待训练的输入数据进行划分,每次训练迭代过程中在多块加速设备上同时训练多个batch(批量)数据。数据并行又分为同步数据并行和异步数据并行两种方法。其中同步数据并行方法,所有加速设备计算完batch数据梯度后,统一将多个梯度合在一起,更新共享模型参数。这种方法可以减少用于计算梯度的权重的陈旧性,使模型最终能够达到较高的收敛精度,具有较好的统计效率,因此应用广泛。在同步数据并行的分布式算法中,Allreduce(集合通信)集合通信算子发挥着重要作用,Allreduce是一个集合通信算子,它的目标是将不同计算节点中的数据整合之后再把结果分发给各个节点,最终使得各计算节点都拥有整合后的数据。
目前,在进行同步数据并行训练过程中使用的设备均被要求为同类设备,例如全部使用GPU(Graphics Processing Unit,图形处理器)设备或全部使用FPGA(Field Programmable Gata Array,现场可编程门阵列)设备。使用同类设备的其中一个主要原因是,Allreduce的过程需要设备间通信交换数据,同类设备间通信通常拥有较高的带宽和较低的延迟,而异构设备间通信则通常要付出较大的代价。例如GPU设备间可通过NVLink(NVIDIA Link,是英伟达(NVIDIA)开发并推出的一种总线及其通信协议)进行高速互联通信,但GPU于FPGA间通信则往往需要CPU作为中间介质进行传输,效率很低。这样一来,若强行将各种异构设备放在同一集群中进行统一的同步数据并行训练,效率必然会很低。然而现代数据中心当中,GPU,FPGA等加速设备被广泛部署,若每次数据并行训练只能使用一类设备,那么必然造成资源的闲置与浪费。
发明内容
有鉴于此,本申请的目的在于提供一种数据同步方法、装置、设备及存储介质,能够实现基于多种异构加速设备进行深度学习数据并行,提高硬件资源利用率和数据通信效率。其具体方案如下:
本申请的第一方面提供了一种数据同步方法,包括:
在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,所述目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,所述二级物理拓扑中各加速设备通过缓存一致性协议进行连接;
按照所述一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;
按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
可选的,各所述一级物理拓扑对应的加速设备相互独立,在执行第一处理和第四处理时,不同所述一级物理拓扑对应的加速设备并发执行;
各所述二级物理拓扑对应的加速设备相互独立,在执行第二处理和第三处理时,不同所述二级物理拓扑对应的加速设备并发执行。
可选的,所述数据同步方法,还包括:
当存在多个所述目标服务器,则在不同的所述目标服务器的相同种类的加速设备之间构建与每个所述目标服务器中相同种类的加速设备的数量一致的环形结构的三级物理拓扑;其中,每个所述三级物理拓扑中包含与所述目标服务器数量一致的加速设备且各加速设备位于不同的所述目标服务器;
所述按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理之后,还包括:
按照所述三级物理拓扑对不同所述目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,以将广播后的数据确定为各加速设备 中第二处理后的数据。
可选的,所述按照所述三级物理拓扑对不同所述目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,包括:
利用可编程交换机接收不同所述目标服务器中的加速设备中第二处理后的数据并按照所述三级物理拓扑对接收到的数据进行规约操作,以及利用所述可编程交换机将规约操作后的数据分别广播至各加速设备;其中,每个所述目标服务器均与一个所述可编程交换机连接。
可选的,所述在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑,包括:
分别在每个所述目标服务器的相同种类的加速设备之间构建与每个所述目标服务器对应的所述一级物理拓扑,并在每个所述目标服务器的不同种类的加速设备之间构建与每个所述目标服务器对应的所述二级物理拓扑;
相应的,所述按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理,包括:
分别按照每个所述目标服务器对应的所述二级物理拓扑通过all_gather通信方式对每个所述目标服务器的不同种类的加速设备中第二处理后的数据进行第三处理,并分别按照每个所述目标服务器对应的所述一级物理拓扑通过all_gather通信方式对每个所述目标服务器的相同种类的加速设备中的第三处理后的数据进行第四处理。
可选的,各所述三级物理拓扑对应的加速设备相互独立,在执行规约操作时,不同所述三级物理拓扑对应的加速设备并发执行。
可选的,在相同种类的加速设备之间构建物理拓扑之前,还包括:
判断相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽;
如果相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则在相同种类的加速设备之间以其它可用连接方式构建物理拓扑。
本申请的第二方面提供了一种数据同步装置,包括:
拓扑构建模块,用于在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相 同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,所述目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,所述二级物理拓扑中各加速设备通过缓存一致性协议进行连接;
第一同步模块,用于按照所述一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;
第二同步模块,用于按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
本申请的第三方面提供了一种电子设备,所述电子设备包括处理器和存储器;其中所述存储器用于存储计算机程序,所述计算机程序由所述处理器加载并执行以实现前述数据同步方法。
本申请的第四方面提供了一种计算机非易失性可读存储介质,所述计算机非易失性可读存储介质中存储有计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现前述数据同步方法。
本申请中,先在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,所述目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,所述二级物理拓扑中各加速设备通过缓存一致性协议进行连接;然后按照所述一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中的待同步数据进行第一处理,并按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;最后按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。可见,本申请基于不同种类的加速设备之间的缓存一致性协议连接方式构建物理拓扑,同时结合由相同种类的加速设备构建的物理拓扑进行scatter_reduce通信和all_gather通信。能够对不同种类的加速设备也即异构加速设备进行数据同步,实现基于多种异构加速设备进行深度学习数据并行。同时提高硬件资源利用率,使得深度学习同步数据并行训练过程中的数据通信效率较高。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本申请提供的一种数据同步方法流程图;
图2为本申请提供的一种具体的数据同步方法示意图;
图3为本申请提供的一种CXL异构设备集群示意图;
图4为本申请提供的一种一级物理拓结构图;
图5为本申请提供的一种二级物理拓结构图;
图6为本申请提供的一种三级物理拓结构图;
图7为本申请提供的一种数据同步装置结构示意图;
图8为本申请提供的一种数据同步电子设备结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
现有技术中,同类设备间通信通常拥有较高的带宽和较低的延迟的优点,异构设备间通信则通常要付出较大的代价,为此在进行同步数据并行训练过程中使用的设备均被要求为同类设备。如果强行将各种异构设备放在同一集群中进行统一的同步数据并行训练,效率必然会很低。针对上述技术缺陷,本申请提供一种数据同步方案,能够实现基于多种异构加速设备进行深度学习数据并行,提高硬件资源利用率和数据通信效率。
图1为本申请实施例提供的一种数据同步方法流程图。参见图1所示,该数据同步方法包括:
S11:在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结 构的一级物理拓扑,并在目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,二级物理拓扑中各加速设备通过缓存一致性协议进行连接。
本实施例中,首先对目标服务器及其搭载的加速设备进行约束,目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,加速设备包括但不限于GPU、FPGA等设备。不同种类的加速设备的数量相同且至少为两个。在缓存一致性协议下加速设备也可称为CXL(Compute Express Link)设备,CXL全称是Intel(英特尔)提出的一种开放工业标准用于高带宽低延迟的设备互联,它可以用来连接CPU(中央处理器)和Accelerator(加速器)、Memory Buffer(内存缓冲寄存器)以及Smart NIC(智能网卡)等类型的设备。CXL解决了异构设备间的低效通信问题,使得基于多种异构设备进行深度学习数据并行方法训练成为可能。
本实施例中,异构设备之间的物理连接方式有两种,分别是通过将CPU作为中间介质进行连接和通过缓存一致性协议连接。由于目标服务器中的加速设备都支持缓存一致性协议,且使用缓存一致性协议进行数据传输的带宽明显要高于将CPU作为中间介质的传输方式,因此此处选择使用CXL连接方式。从同一服务器节点中的不同类设备中依次获取一个CXL设备,将这些异构设备使用CXL相连接。也即异构设备之间通过缓存一致性协议进行连接也即二级物理拓扑中各加速设备通过缓存一致性协议进行连接。
需要说明的是,虽然各个同类CXL设备间也可通过CXL相连,但CXL相连进行数据传输的带宽并一定是最优的。因此,在构建一级物理拓扑时,需要先判断相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽,如果相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则在相同种类的加速设备之间以其它可用连接方式构建物理拓扑。其它可用连接方式可以为其原有的连接方式,通过将同类设备间的CXL带宽和其原有的连接方式带宽对比,选出相对最优带宽连接方式,并使用此最优连接方式将同一目标服务器内的同类加速设备两两相连接。例如,经过比较各GPU设备间使用NVLink连接的数据传输带宽要优于使用CXL连接的带宽,则该拓扑中选用NVLink方式连接。
S12:按照一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照二级物理拓扑通过scatter_reduce通信方式 对不同种类的加速设备中第一处理后的数据进行第二处理。
S13:按照二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
本实施例中,由于AllReduce聚合操纵包括scatter_reduce(分散规约)阶段和all_gather(全聚集规约)阶段。关于每个阶段的具体执行逻辑与现有技术中的执行逻辑一致,本实施例对此不进行赘述。不同之处在于,本实施例的执行基础是构建好的以及物理拓扑和二级物理拓扑。具体来说,首先按照一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理。然后按照二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。至此目标服务器的每个加速设备上均均拥有了完整的全局数据聚合结果。
可以理解,本实施例各一级物理拓扑对应的加速设备相互独立,在执行第一处理和第四处理时,不同一级物理拓扑对应的加速设备并发执行。同时各二级物理拓扑对应的加速设备相互独立,在执行第二处理和第三处理时,不同二级物理拓扑对应的加速设备并发执行。
可见,本申请实施例先在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,二级物理拓扑中各加速设备通过缓存一致性协议进行连接;然后按照一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中的待同步数据进行第一处理,并按照二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;最后按照二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。本申请实施例基于不同种类的加速设备之间的缓存一致性协议连接方式构建物理拓扑,同时结合由相同种类的加速设备构建的物理拓扑进行scatter_reduce通信和all_gather通信。能够对不同种类的加速设备也即异构加速设备进行数据同步,实现基于多种异构加速设备进行深度学习数据并行。同时提高硬件资源利 用率,使得深度学习同步数据并行训练过程中的数据通信效率较高。
图2为本申请实施例提供的一种具体的数据同步方法流程图。参见图2所示,该数据同步方法包括:
S21:当存在多个目标服务器,分别在每个目标服务器的相同种类的加速设备之间构建与每个目标服务器对应的一级物理拓扑,并在每个目标服务器的不同种类的加速设备之间构建与每个目标服务器对应的二级物理拓扑。
本实施例中,数据同步的基础设施为服务器集群,也即存在多个目标服务器,每个目标服务中包含相同数据及种类的加速设备,以得到部署了各种支持CXL协议的异构设备的用于深度神经网络训练的服务器集群。图3所示为本实施例自定义的CXL设备集群示意图。设定集群中包含m个服务器,每个服务器上的CXL设备分布均匀,即每个服务器上有相同数量的CXL异构设备,且每个服务器节点中各类异构设备的数量也相同。例如集群中共有n类共P个CXL异构设备,则每个服务器上均部署有这n类设备,且每个服务器节点上每类CXL设备的个数为P/(mn)。各个服务器节点之间通过可编程交换机相连接。
在此基础上,分别在每个目标服务器的相同种类的加速设备之间构建与每个目标服务器对应的一级物理拓扑,并在每个目标服务器的不同种类的加速设备之间构建与每个目标服务器对应的二级物理拓扑。可以理解,一级物理拓扑包括由{CXL_A01,CXL_A02,CXL_A03,…}组成的ring_1_1、由{CXL_B01,CXL_B02,CXL_B03,…}组成的ring_1_2等,如图4所示。二级物理拓扑包括由由{CXL_A01,CXL_B01,…}组成的ring_2_1,由{CXL_A02,CXL_B02,…}组成的ring_2_2等,如图5所示。
S22:在不同的目标服务器的相同种类的加速设备之间构建与每个目标服务器中相同种类的加速设备的数量一致的环形结构的三级物理拓扑;其中,每个三级物理拓扑中包含与目标服务器数量一致的加速设备且各加速设备位于不同的目标服务器。
本实施例中,由于不同的目标服务器中的加速设备上的数据也需参与同步,故还需要在不同的目标服务器的相同种类的加速设备之间构建与每个目标服务器中相同种类的加速设备的数量一致的环形结构的三级物理拓扑。其中,每个三级物理拓扑中包含与目标服务器数量一致的加速设备且各加速设备位于不同的目标服务器。每个三级物理拓扑包括由{CXL_A01,CXL_A11,…,CXL_AM1}组成的ring_3_1,由{CXL_A02,CXL_A12,…,CXL_AM2}组成的ring_3_2,由{CXL_B01,CXL_B11,…,CXL_BM1}组成的ring_3_3等,如图6所示。
S23:分别按照每个目标服务器对应的一级物理拓扑通过scatter_reduce通信方式对相 同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照每个目标服务器对应的二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理。
S24:按照三级物理拓扑对不同目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,以将广播后的数据确定为各加速设备中第二处理后的数据。
本实施例中,对于每个目标服务器,均执行相同scatter_reduce操作。具体参考前述实施例公开的内容,本申请对此不再进行赘述。第一处理完成之后,一级物理拓扑中的每个CXL设备都有一个不同于该拓扑中任何其他CXL设备的数据块的部分聚合结果。
在此之后,按照三级物理拓扑对不同目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约(reduce)操作后的数据分别广播(broadcast)至各加速设备,以将广播后的数据确定为各加速设备中第二处理后的数据。不同目标服务器通过可编程交换机相连接。也即利用可编程交换机接收不同目标服务器中的加速设备中第二处理后的数据,并按照三级物理拓扑对接收到的数据进行规约操作,同时利用可编程交换机将规约操作后的数据分别广播至各加速设备。同样的,各三级物理拓扑对应的加速设备相互独立,在执行规约操作时,不同三级物理拓扑对应的加速设备并发执行。
S25:分别按照每个目标服务器对应的二级物理拓扑通过all_gather通信方式对每个目标服务器的不同种类的加速设备中第二处理后的数据进行第三处理,并分别按照每个目标服务器对应的一级物理拓扑通过all_gather通信方式对每个目标服务器的相同种类的加速设备中的第三处理后的数据进行第四处理。
本实施例中,在第二阶段,分别按照每个目标服务器对应的二级物理拓扑通过all_gather通信方式对每个目标服务器的不同种类的加速设备中第二处理后的数据进行第三处理,也即返回二级物理拓扑执行all_gather操作。然后分别按照每个目标服务器对应的一级物理拓扑通过all_gather通信方式对每个目标服务器的相同种类的加速设备中的第三处理后的数据进行第四处理,也即返回一级物理拓扑上面执行all_gather操作。至此目标服务器的每个加速设备上均均拥有了完整的全局数据聚合结果。
可见,本申请实施例首先对CXL异构设备集群进行了定义,即部署了各种支持CXL协议的异构设备的用于深度神经网络训练的服务器集群。然后基于所描述异构设备集群构建了分级物理拓扑,将物理拓扑分为三个级别,通过分别在每一级上进行特定的操作最终得到完整的Allreduce聚合结果,解决了CXL异构设备集群中同步数据并行训练中的Allreduce数据 聚合问题,提高了数据中心中硬件资源的利用率。
参见图7所示,本申请实施例还相应公开了一种数据同步装置,包括:
拓扑构建模块11,用于在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,二级物理拓扑中各加速设备通过缓存一致性协议进行连接;
第一同步模块12,用于按照一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;
第二同步模块13,用于按照二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
可见,本申请实施例先在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,二级物理拓扑中各加速设备通过缓存一致性协议进行连接;然后按照一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中的待同步数据进行第一处理,并按照二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;最后按照二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。本申请实施例基于不同种类的加速设备之间的缓存一致性协议连接方式构建物理拓扑,同时结合由相同种类的加速设备构建的物理拓扑进行scatter_reduce通信和all_gather通信。能够对不同种类的加速设备也即异构加速设备进行数据同步,实现基于多种异构加速设备进行深度学习数据并行。同时提高硬件资源利用率,使得深度学习同步数据并行训练过程中的数据通信效率较高。
在一些具体实施例中,当存在多个目标服务器,数据同步装置还包括:
集群拓扑构建模块,用于在不同的目标服务器的相同种类的加速设备之间构建与每个目 标服务器中相同种类的加速设备的数量一致的环形结构的三级物理拓扑;其中,每个三级物理拓扑中包含与目标服务器数量一致的加速设备且各加速设备位于不同的目标服务器;
规约及广播模块,用于按照三级物理拓扑对不同目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,以将规约操作后的数据确定为各加速设备中第二处理后的数据。
在一些具体实施例中,数据同步装置还包括:
判断模块,用于判断相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽,如果相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则在相同种类的加速设备之间以其它可用连接方式构建物理拓扑。
进一步的,本申请实施例还提供了一种电子设备。图8是根据一示例性实施例示出的电子设备20结构图,图中的内容不能认为是对本申请的使用范围的任何限制。
图8为本申请实施例提供的一种电子设备20的结构示意图。该电子设备20,具体可以包括:至少一个处理器21、至少一个存储器22、电源23、通信接口24、输入输出接口25和通信总线26。其中,存储器22用于存储计算机程序,计算机程序由处理器21加载并执行,以实现前述任一实施例公开的数据同步方法中的相关步骤。
本实施例中,电源23用于为电子设备20上的各硬件设备提供工作电压;通信接口24能够为电子设备20创建与外界设备之间的数据传输通道,其所遵循的通信协议是能够适用于本申请技术方案的任意通信协议,在此不对其进行具体限定;输入输出接口25,用于获取外界输入数据或向外界输出数据,其具体的接口类型可以根据具体应用需要进行选取,在此不进行具体限定。
另外,存储器22作为资源存储的载体,可以是只读存储器、随机存储器、磁盘或者光盘等,其上所存储的资源可以包括操作系统221、计算机程序222及数据223等,存储方式可以是短暂存储或者永久存储。
其中,操作系统221用于管理与控制电子设备20上的各硬件设备以及计算机程序222,以实现处理器21对存储器22中海量数据223的运算与处理,其可以是Windows Server、Netware、Unix、Linux等。计算机程序222除了包括能够用于完成前述任一实施例公开的由电子设备20执行的数据同步方法的计算机程序之外,还可以进一步包括能够用于完成其他特定工作的计算机程序。数据223可以包括电子设备20收集到的拓扑数据等。
进一步的,本申请实施例还公开了一种非易失性可读存储介质,非易失性可读存储介质中存储有计算机程序,计算机程序被处理器加载并执行时,实现前述任一实施例公开的数据同步方法步骤。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
最后,还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个…”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。
以上对本申请所提供的数据同步方法、装置、设备及存储介质进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的一般技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。

Claims (20)

  1. 一种数据同步方法,其特征在于,包括:
    在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,所述目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,所述二级物理拓扑中各加速设备通过缓存一致性协议进行连接;
    按照所述一级物理拓扑通过scatter_reduce(分散规约)通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;
    按照所述二级物理拓扑通过all_gather(全聚集规约)通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
  2. 根据权利要求1所述的数据同步方法,其特征在于,各所述一级物理拓扑对应的加速设备相互独立,在执行第一处理和第四处理时,不同所述一级物理拓扑对应的加速设备并发执行;
    各所述二级物理拓扑对应的加速设备相互独立,在执行第二处理和第三处理时,不同所述二级物理拓扑对应的加速设备并发执行。
  3. 根据权利要求1所述的数据同步方法,其特征在于,不同种类的加速设备之间的物理连接方式包括使用CPU作为中间介质连接和使用缓存一致性协议连接;
    在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑的步骤之前,还包括:
    判断使用CPU作为中间介质连接的不同种类的加速设备传输数据的带宽,是否高于使用缓存一致性协议连接的不同种类的加速设备传输数据的带宽;
    如果使用CPU作为中间介质连接的不同种类的加速设备传输数据的带宽,高于使用缓存一致性协议连接的不同种类的加速设备传输数据的带宽,将所述缓存一致性协议作为所述不同种类的加速设备的物理连接方式。
  4. 根据权利要求3所述的数据同步方法,其特征在于,在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的 二级物理拓扑的步骤之前,还包括:
    在所述目标服务器中,设置数量相同且至少为两个的不同种类的加速设备,并采用缓存一致性协议连接所述不同种类的加速设备。
  5. 根据权利要求4所述的数据同步方法,其特征在于,所述在所述目标服务器中,设置数量相同且至少为两个的不同种类的加速设备,并采用缓存一致性协议连接所述不同种类的加速设备的步骤,包括:
    在所述目标服务器中,设置数量至少为两个的GPU设备,以及设置与所述GPU设备数量相同的FPGA设备,并采用缓存一致性协议连接所述GPU设备和所述FPGA设备。
  6. 根据权利要求4所述的数据同步方法,其特征在于,在所述目标服务器中,设置数量相同且至少为两个的不同种类的加速设备,并采用缓存一致性协议连接所述不同种类的加速设备的步骤,包括:
    从所述目标服务器的不同种类的加速设备中,依次获取一个所述加速设备,并采用缓存一致性协议将后一次获取的加速设备与前一次获取的加速设备连接。
  7. 根据权利要求1所述的数据同步方法,其特征在于,按照所述一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,包括:
    通过scatter_reduce通信方式对一级物理拓扑中的每一个加速设备中与模型训练相关的待同步数据进行第一处理,得到至少一个与模型训练相关的待同步数据的部分聚合结果;每一个加速设备中的所述部分聚合结果均不同于所述一级物理拓扑中其他加速设备的所述部分聚合结果。
  8. 根据权利要求1所述的数据同步方法,其特征在于,还包括:
    当存在多个所述目标服务器时,按照与每个所述目标服务器对应的所述一级物理拓扑,通过scatter_reduce通信方式,对每个所述目标服务器中相同种类的加速设备中与模型训练相关的待同步数据进行第一处理。
  9. 根据权利要求1所述的数据同步方法,其特征在于,还包括:
    当存在多个所述目标服务器,则在不同的所述目标服务器的相同种类的加速设备之间构建与每个所述目标服务器中相同种类的加速设备的数量一致的环形结构的三级物理拓扑;其中,每个所述三级物理拓扑中包含与所述目标服务器数量一致的加速设备且各加速设备位于不同的所述目标服务器;
    所述按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理之后,还包括:
    按照所述三级物理拓扑对不同所述目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,以将广播后的数据确定为各加速设备中第二处理后的数据。
  10. 根据权利要求9所述的数据同步方法,其特征在于,所述按照所述三级物理拓扑对不同所述目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,包括:
    利用可编程交换机接收不同所述目标服务器中的加速设备中第二处理后的数据并按照所述三级物理拓扑对接收到的数据进行规约操作,以及利用所述可编程交换机将规约操作后的数据分别广播至各加速设备;其中,每个所述目标服务器均与一个所述可编程交换机连接。
  11. 根据权利要求9所述的数据同步方法,其特征在于,所述在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑,包括:
    分别在每个所述目标服务器的相同种类的加速设备之间构建与每个所述目标服务器对应的所述一级物理拓扑,并在每个所述目标服务器的不同种类的加速设备之间构建与每个所述目标服务器对应的所述二级物理拓扑;
    相应的,所述按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理,包括:
    分别按照每个所述目标服务器对应的所述二级物理拓扑通过all_gather通信方式对每个所述目标服务器的不同种类的加速设备中第二处理后的数据进行第三处理,并分别按照每个所述目标服务器对应的所述一级物理拓扑通过all_gather通信方式对每个所述目标服务器的相同种类的加速设备中的第三处理后的数据进行第四处理。
  12. 根据权利要求9所述的数据同步方法,其特征在于,各所述三级物理拓扑对应的加速设备相互独立,在执行规约操作时,不同所述三级物理拓扑对应的加速设备并发执行。
  13. 根据权利要求1至12任一项所述的数据同步方法,其特征在于,在相同种类的加速设备之间构建物理拓扑之前,还包括:
    判断相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽;
    如果相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则在相同种类的加速设备之间以其它可用连接方式构建物理拓扑。
  14. 根据权利要求13所述的数据同步方法,其特征在于,所述其它可用的连接方式包括原有的连接方式;
    判断相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽;如果相同种类的加速设备之间通过其它可用连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则在相同种类的加速设备之间以其它可用连接方式构建物理拓扑,包括:
    判断相同种类的加速设备之间通过原有的连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽;
    如果相同种类的加速设备之间通过原有的连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则使用原有的连接方式将相同种类的加速设备两两连接。
  15. 根据权利要求14所述的数据同步方法,其特征在于,所述原有的连接方式包括NVLink(NVIDIA Link,是英伟达(NVIDIA)开发并推出的一种总线及其通信协议);
    判断相同种类的加速设备之间通过原有的连接方式进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽;如果相同种类的加速设备之间通过原有的连接方式进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则使用原有的连接方式将相同种类的加速设备两两连接,包括:
    判断相同种类的加速设备之间通过NVLink进行数据传输的带宽是否高于通过缓存一致性协议连接进行数据传输的带宽;
    如果相同种类的加速设备之间通过NVLink进行数据传输的带宽,高于通过缓存一致性协议连接进行数据传输的带宽,则使用NVLink将相同种类的加速设备两两连接。
  16. 一种数据同步装置,其特征在于,包括:
    拓扑构建模块,用于在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,所述目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,所述二级物理拓扑中各加速设备通过缓存一致性协议进行连接;
    第一同步模块,用于按照所述一级物理拓扑通过scatter_reduce通信方式对相同种 类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;
    第二同步模块,用于按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
  17. 根据权利要求16所述的数据同步装置,其特征在于,所述装置还包括:
    集群拓扑构建模块,用于当存在多个所述目标服务器,则在不同的所述目标服务器的相同种类的加速设备之间构建与每个所述目标服务器中相同种类的加速设备的数量一致的环形结构的三级物理拓扑;其中,每个所述三级物理拓扑中包含与所述目标服务器数量一致的加速设备且各加速设备位于不同的所述目标服务器;
    规约及广播模块,用于按照所述三级物理拓扑对不同所述目标服务器中的加速设备中第二处理后的数据进行规约操作并将规约操作后的数据分别广播至各加速设备,以将规约操作后的数据确定为各加速设备中第二处理后的数据。
  18. 一种数据同步系统,其特征在于,包括:所述数据同步系统包含服务器集群,所述服务器集群中的服务器包含支持CXL协议的不同种类的加速设备,所述不同种类加速设备用于深度神经网络训练;
    所述数据同步系统用于:
    在目标服务器的相同种类的加速设备之间构建与加速设备种类数量一致的环形结构的一级物理拓扑,并在所述目标服务器的不同种类的加速设备之间构建与相同种类的加速设备的数量一致的环形结构的二级物理拓扑;其中,所述目标服务器中设置有不同种类的支持缓存一致性协议的加速设备,不同种类的加速设备的数量相同且至少为两个,所述二级物理拓扑中各加速设备通过缓存一致性协议进行连接;
    按照所述一级物理拓扑通过scatter_reduce通信方式对相同种类的加速设备中与模型训练相关的待同步数据进行第一处理,并按照所述二级物理拓扑通过scatter_reduce通信方式对不同种类的加速设备中第一处理后的数据进行第二处理;
    按照所述二级物理拓扑通过all_gather通信方式对不同种类的加速设备中第二处理后的数据进行第三处理,并按照所述一级物理拓扑通过all_gather通信方式对相同种类的加速设备中的第三处理后的数据进行第四处理。
  19. 一种电子设备,其特征在于,所述电子设备包括处理器和存储器;其中所述存储器用于存储计算机程序,所述计算机程序由所述处理器加载并执行以实现如权利要求1 至15任一项所述的数据同步方法。
  20. 一种计算机非易失性可读存储介质,其特征在于,用于存储计算机可执行指令,所述计算机可执行指令被处理器加载并执行时,实现如权利要求1至15任一项所述的数据同步方法。
PCT/CN2022/132053 2022-04-29 2022-11-15 一种数据同步方法、装置、设备及存储介质 WO2023207035A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210468218.4 2022-04-29
CN202210468218.4A CN114884908B (zh) 2022-04-29 2022-04-29 一种数据同步方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2023207035A1 true WO2023207035A1 (zh) 2023-11-02

Family

ID=82673258

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/132053 WO2023207035A1 (zh) 2022-04-29 2022-11-15 一种数据同步方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN114884908B (zh)
WO (1) WO2023207035A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114884908B (zh) * 2022-04-29 2024-02-13 浪潮电子信息产业股份有限公司 一种数据同步方法、装置、设备及存储介质
CN116962438B (zh) * 2023-09-21 2024-01-23 浪潮电子信息产业股份有限公司 一种梯度数据同步方法、系统、电子设备及可读存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110908799A (zh) * 2019-11-08 2020-03-24 浪潮电子信息产业股份有限公司 一种分布式训练中的通信方法、装置、设备、介质
CN114281521A (zh) * 2021-11-21 2022-04-05 苏州浪潮智能科技有限公司 优化深度学习异构资源通信效率方法、系统、设备及介质
CN114884908A (zh) * 2022-04-29 2022-08-09 浪潮电子信息产业股份有限公司 一种数据同步方法、装置、设备及存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10325343B1 (en) * 2017-08-04 2019-06-18 EMC IP Holding Company LLC Topology aware grouping and provisioning of GPU resources in GPU-as-a-Service platform
US10728091B2 (en) * 2018-04-04 2020-07-28 EMC IP Holding Company LLC Topology-aware provisioning of hardware accelerator resources in a distributed environment
US10884795B2 (en) * 2018-04-26 2021-01-05 International Business Machines Corporation Dynamic accelerator scheduling and grouping for deep learning jobs in a computing cluster
CN110222005A (zh) * 2019-07-15 2019-09-10 北京一流科技有限公司 用于异构架构的数据处理系统及其方法
CN110647999A (zh) * 2019-08-23 2020-01-03 苏州浪潮智能科技有限公司 一种基于拓扑结构提高深度学习训练速度的方法及装置
CN111105016B (zh) * 2019-12-06 2023-04-28 浪潮电子信息产业股份有限公司 一种数据处理方法、装置、电子设备及可读存储介质
CN111488987B (zh) * 2020-04-16 2022-12-06 苏州浪潮智能科技有限公司 一种深度学习大模型训练的方法、系统、设备及介质
CN111597139B (zh) * 2020-05-13 2023-01-06 苏州浪潮智能科技有限公司 一种gpu的通信方法、系统、设备以及介质
CN111880911A (zh) * 2020-06-19 2020-11-03 浪潮电子信息产业股份有限公司 一种任务负载调度方法、装置、设备及可读存储介质
CN112333011A (zh) * 2020-10-23 2021-02-05 苏州浪潮智能科技有限公司 一种网络拓扑图生成方法、装置及电子设备和存储介质
CN113568860B (zh) * 2021-07-23 2022-08-19 北京百度网讯科技有限公司 基于深度学习的多机集群拓扑映射方法、装置及程序产品
CN114202027B (zh) * 2021-12-10 2023-05-23 北京百度网讯科技有限公司 执行配置信息的生成方法、模型训练方法和装置
CN114050975B (zh) * 2022-01-10 2022-04-19 苏州浪潮智能科技有限公司 一种异构多节点互联拓扑生成方法和存储介质

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110908799A (zh) * 2019-11-08 2020-03-24 浪潮电子信息产业股份有限公司 一种分布式训练中的通信方法、装置、设备、介质
CN114281521A (zh) * 2021-11-21 2022-04-05 苏州浪潮智能科技有限公司 优化深度学习异构资源通信效率方法、系统、设备及介质
CN114884908A (zh) * 2022-04-29 2022-08-09 浪潮电子信息产业股份有限公司 一种数据同步方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN114884908A (zh) 2022-08-09
CN114884908B (zh) 2024-02-13

Similar Documents

Publication Publication Date Title
WO2023207035A1 (zh) 一种数据同步方法、装置、设备及存储介质
US10698842B1 (en) Domain assist processor-peer for coherent acceleration
CN104820657A (zh) 一种基于嵌入式异构多核处理器上的核间通信方法及并行编程模型
CN102141951B (zh) 芯片仿真系统及方法
CN104754008B (zh) 网络存储节点、网络存储系统以及用于网络存储节点的装置和方法
CN106776455B (zh) 一种单机多gpu通信的方法及装置
CN110798517A (zh) 去中心化集群负载均衡方法、系统、移动终端及存储介质
CN104023062A (zh) 一种面向异构计算的分布式大数据系统的硬件架构
WO2021244168A1 (zh) 片上系统、数据传送方法及广播模块
CN109829546B (zh) 平台即服务云端服务器及其机器学习数据处理方法
CN111459650B (zh) 管理专用处理资源的存储器的方法、设备和介质
CN112583941B (zh) 一种支持接入多电力终端的方法、单元节点及电力物联网
CN104299170B (zh) 间歇性能源海量数据处理方法
WO2023207079A1 (zh) 一种区块链中的区块状态同步方法及第一节点
TW202008172A (zh) 儲存系統
CN104125292A (zh) 一种数据处理装置、云服务器及其使用方法
CN114020454A (zh) 一种内存管理方法、装置、设备及介质
EP4203488A1 (en) Parameter configuration method and related system
CN114567563A (zh) 网络拓扑模型的训练方法、网络拓扑的重构方法及装置
CN114579311A (zh) 执行分布式计算任务的方法、装置、设备以及存储介质
CN113934767A (zh) 一种数据处理的方法及装置、计算机设备和存储介质
CN114490458A (zh) 数据传输方法、芯片、服务器以及存储介质
Yin et al. Edge network model based on double dimension
Zhang et al. Optimising data access latencies of virtual machine placement based on greedy algorithm in datacentre
TWI770860B (zh) 網路頻寬調整方法和相關產品

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22939851

Country of ref document: EP

Kind code of ref document: A1