US12314201B2 - Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment - Google Patents
Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment Download PDFInfo
- Publication number
- US12314201B2 US12314201B2 US18/345,083 US202318345083A US12314201B2 US 12314201 B2 US12314201 B2 US 12314201B2 US 202318345083 A US202318345083 A US 202318345083A US 12314201 B2 US12314201 B2 US 12314201B2
- Authority
- US
- United States
- Prior art keywords
- time
- computation
- input data
- devices
- distributed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/14—Handling requests for interconnection or transfer
- G06F13/36—Handling requests for interconnection or transfer for access to common bus or bus system
- G06F13/362—Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control
- G06F13/3625—Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using a time dependent access
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
- G06F13/4221—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/38—Concurrent instruction execution, e.g. pipeline or look ahead
- G06F9/3885—Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/48—Program initiating; Program switching, e.g. by interrupt
- G06F9/4806—Task transfer initiation or dispatching
- G06F9/4843—Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
- G06F9/4881—Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/501—Performance criteria
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/509—Offload
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2213/00—Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F2213/0026—PCI express
Definitions
- the present disclosure relates to technology for improving communication efficiency by unevenly distributing input data across respective devices when an AI model is processed in parallel.
- Data parallelism is a parallelization technique in which, the same AI model is replicated to respective computation devices (e.g., GPUs) and input data is distributed there across so as to be concurrently processed.
- Training of an AI model broadly includes a (forward) step for processing input data and a (backward) step for reflecting the processing result to the model.
- the respective devices need to communicate with each other at the step for reflecting the processing result in order to synchronize the model.
- An object of the present disclosure is to improve communication efficiency by unevenly distributing input data across respective devices when an AI model is processed in parallel.
- Another object of the present disclosure is to alleviate a communication bottleneck occurring in a network environment in which a communication channel is shared.
- a method for distributed training of an Artificial Intelligence (AI) model in a channel-sharing network environment includes determining whether data parallel processing is applied, calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
- AI Artificial Intelligence
- unevenly distributing the input data may comprise distributing the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
- the smallest size, among the sizes of the unevenly distributed pieces of input data may be set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
- the smallest size, among the sizes of the unevenly distributed pieces of input data may be set based on Equation (1) below:
- Equation (1) above may denote the target computation time
- t ori may denote the computation time
- c may denote the communication time
- d may denote the number of multiple computation devices.
- the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
- the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
- the multiple computation devices may share a shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
- an apparatus for distributed training of an AI model in a channel-sharing network environment includes a parallelism identification unit for determining whether data parallel processing is applied, a profiling unit for calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and a data distribution unit for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
- the data distribution unit may distribute the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
- the data distribution unit may set the smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
- the data distribution unit may set the smallest size, among the sizes of the unevenly distributed pieces of input data, based on Equation (1) below:
- Equation (1) above may denote the target computation time
- t ori may denote the computation time
- c may denote the communication time
- d may denote the number of multiple computation devices.
- the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
- the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
- the multiple computation devices may share a shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
- FIG. 1 is a view conceptually illustrating an example of application of data parallelism
- FIG. 2 is a view conceptually illustrating a data parallelism method in a mesh network environment
- FIG. 3 conceptually illustrates a data parallelism method in a channel-sharing network environment
- FIG. 4 is a flowchart illustrating a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure
- FIG. 5 illustrates the configuration of an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure
- FIG. 6 is a flowchart illustrating in detail a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure
- FIG. 7 is a view of comparison of a communication time when a method according to an embodiment of the present disclosure is applied and a communication time when an existing method is applied;
- FIG. 8 is a block diagram illustrating an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
- FIG. 9 is a view illustrating the configuration of a computer system according to an embodiment.
- each of expressions such as “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B, or C”, “at least one of A, B, and C”, and “at least one of A, B, or C” may include any one of the items listed in the expression or all possible combinations thereof.
- FIG. 1 is a view conceptually illustrating an example of application of data parallelism.
- FIG. 2 is a view conceptually illustrating a data parallelism method in a mesh network environment.
- data parallelism is applied to four GPUs in a mesh network environment and a channel-sharing network environment.
- a network in which a communication channel is shared such as PCIe, is used for communication between devices, but because such a channel-sharing network is used in a time-division manner, communication performance may be degraded when multiple devices simultaneously access the network.
- FIG. 3 conceptually illustrates a data parallelism method in a channel-sharing network environment.
- the present disclosure relates to a distributed training method capable of improving communication efficiency when an AI model is processed in a distributed manner using multiple computation devices in a network environment in which a communication channel is shared.
- Data parallelism is a method of copying an AI model to respective computation devices and dividing input data so as to be processed in a distributed manner.
- the respective computation devices process the input data in parallel, they communicate with each other in order to synchronize the model.
- the present disclosure is a method for distributing input data such that the respective computation devices exclusively use the network at different times in order to alleviate degradation in AI model training performance caused due to the communication channel bottleneck.
- FIG. 4 is a flowchart illustrating a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
- the method for distributed training of an AI model in a channel-sharing network environment includes determining whether data parallel processing is applied at step S 110 , calculating a computation time and a communication time when evenly distributing input data across multiple computation devices at step S 120 , and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time at step S 130 .
- unevenly distributing the input data at step S 130 may comprise distributing the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access the channel.
- the smallest size, among the sizes of the unevenly distributed pieces of input data may be set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
- Equation (1) the smallest size, among the sizes of the unevenly distributed pieces of input data, is set by Equation (1) below:
- t new may be the target computation time
- t ori may be the computation time
- c may be the communication time
- d may be the number of multiple computation devices.
- the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
- the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
- the multiple computation devices may share the shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
- FIG. 5 illustrates the configuration of an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
- the apparatus for distributed training of an AI model in a channel-sharing network environment includes a data parallelism identification unit 210 , a profiling unit 220 , a data division unit 230 , and a data parallelism control unit 240 .
- the data parallelism identification unit 210 determines whether a data parallelism technique can be applied, and the profiling unit 220 measures the execution time of the AI model to be trained. Also, the data division unit 230 determines division of the data to be input to each of computation devices based on the measured execution time, and the data parallelism control unit 240 transfers the divided data to each of the devices and performs data parallelism.
- FIG. 6 is a flowchart illustrating in detail a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
- step S 310 whether an AI model developer applies data parallelism is determined.
- training of an AI model is started using an existing method (without parallelism or by applying another parallelism method) at step S 370 .
- step S 320 whether the current network environment is a channel-sharing network is checked at step S 320 .
- the current network environment is not a channel-sharing network, it is determined that there is no overhead resulting from network channel interference, so the existing data parallelism technique is applied at step S 360 .
- the channel-sharing network is used, application of the present disclosure is started.
- Application of the present disclosure requires information about a computation time and a communication time when the existing data parallelism is used, and the corresponding information may be acquired through a method such as advance profiling or online profiling at step S 330 .
- a method such as advance profiling or online profiling at step S 330 .
- how to divide the input data to be assigned to each of the devices is determined at step S 340 .
- the method of dividing the input data to be assigned to each of the devices may be performed using Equation (1) below:
- t new denotes the computation time corresponding to the data (having the smallest size) to be distributed to the first computation device
- d denotes the number of computation devices to be used
- t ori and c respectively denote the computation time and the communication time measured at the profiling step. That is, t ori and c can be acquired at the profiling step, and d is a value that can be input in advance. Accordingly, t new may be acquired.
- data corresponding to the computation time, t new +c/d may be distributed to the second device
- data corresponding to the computation time, t new +2c/d may be distributed to the third device, . . .
- data corresponding to the computation time, t new +((d ⁇ 1)c/d) may be distributed to the last device. That is, the difference in computation time between the devices may correspond to the communication time divided by the number of devices.
- t new may become a negative value.
- t new is set to a minimum value (e.g., 1) that can be distributed, and data is distributed such that the difference between the data sizes to be transferred to the respective devices is constant.
- FIG. 7 is a view of comparison of a communication time when the method according to an embodiment of the present disclosure is applied and a communication time when the existing method is applied.
- respective devices sequentially access a shared network without interference, whereby the total execution time (AI model training time) may be reduced.
- FIG. 8 is a block diagram illustrating an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
- the apparatus for distributed training of an AI model in a channel-sharing network environment includes a parallelism identification unit 410 for determining whether data parallel processing is applied, a profiling unit 420 for calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and a data distribution unit 430 for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
- the data distribution unit 430 may distribute the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
- the data distribution unit 430 may set the smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
- the data distribution unit 430 sets the smallest size, among the sizes of the unevenly distributed pieces of input data, using Equation (1) below:
- t new may be the target computation time
- t ori may be the computation time
- c may be the communication time
- d may be the number of multiple computation devices.
- the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
- the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
- the multiple computation devices may share the shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
- FIG. 9 is a view illustrating the configuration of a computer system according to an embodiment.
- the apparatus for distributed training of an AI model in a channel-sharing network environment may be implemented in a computer system 1000 including a computer-readable recording medium.
- the computer system 1000 may include one or more processors 1010 , memory 1030 , a user-interface input device 1040 , a user-interface output device 1050 , and storage 1060 , which communicate with each other via a bus 1020 . Also, the computer system 1000 may further include a network interface 1070 connected to a network 1080 .
- the processor 1010 may be a central processing unit or a semiconductor device for executing a program or processing instructions stored in the memory 1030 or the storage 1060 .
- the memory 1030 and the storage 1060 may be storage media including at least one of a volatile medium, a nonvolatile medium, a detachable medium, a non-detachable medium, a communication medium, or an information delivery medium, or a combination thereof.
- the memory 1030 may include ROM 1031 or RAM 1032 .
- communication efficiency may be improved by unevenly distributing input data across respective devices when an AI model is processed in parallel.
- the present disclosure may alleviate a communication bottleneck occurring in a network environment in which a communication channel is shared.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Medical Informatics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Neurology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computer And Data Communications (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
-
- (Patent Document 1) Korean Patent Application Publication No. 10-2022-0098949, titled “System and method for distributed training of deep-learning model”.
Claims (12)
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| KR1020220162976A KR20240079749A (en) | 2022-11-29 | 2022-11-29 | Method and apparatus for distribution learning of artificial intelligence model in channel sharing network environment |
| KR10-2022-0162976 | 2022-11-29 |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| US20240176756A1 US20240176756A1 (en) | 2024-05-30 |
| US12314201B2 true US12314201B2 (en) | 2025-05-27 |
Family
ID=91191747
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| US18/345,083 Active 2043-09-05 US12314201B2 (en) | 2022-11-29 | 2023-06-30 | Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment |
Country Status (2)
| Country | Link |
|---|---|
| US (1) | US12314201B2 (en) |
| KR (1) | KR20240079749A (en) |
Citations (20)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20170061329A1 (en) * | 2015-08-31 | 2017-03-02 | Fujitsu Limited | Machine learning management apparatus and method |
| US20180349313A1 (en) | 2017-06-01 | 2018-12-06 | Electronics And Telecommunications Research Institute | Parameter server and method for sharing distributed deep learning parameter using the same |
| US10587776B2 (en) * | 2017-07-24 | 2020-03-10 | Samsung Electronics Co., Ltd. | Electronic device and method for controlling the electronic device |
| US10748555B2 (en) * | 2014-06-30 | 2020-08-18 | Dolby Laboratories Licensing Corporation | Perception based multimedia processing |
| US20210019152A1 (en) | 2019-07-15 | 2021-01-21 | Microsoft Technology Licensing, Llc | Data parallelism in distributed training of artificial intelligence models |
| KR20210073145A (en) | 2019-12-10 | 2021-06-18 | 한국전자통신연구원 | Scheduling-based Training Data/Model Allocation Method and Apparatus for Distributed-Parallel Deep Learning |
| US11049011B2 (en) * | 2016-11-16 | 2021-06-29 | Indian Institute Of Technology Delhi | Neural network classifier |
| KR20210092078A (en) | 2020-01-15 | 2021-07-23 | 삼성전자주식회사 | Memory Device performing parallel calculation process, Operating Method thereof and Operation Method of Memory controller controlling memory device |
| US11100370B2 (en) * | 2017-07-13 | 2021-08-24 | Peking University Shenzhen Graduate School | Method of using deep discriminate network model for person re-identification in image or video |
| US20210406676A1 (en) * | 2020-06-29 | 2021-12-30 | Alibaba Group Holding Limited | Variable input size techniques for neural networks |
| US20220076115A1 (en) * | 2020-09-10 | 2022-03-10 | SK Hynix Inc. | Data processing based on neural network |
| US11308366B2 (en) * | 2019-10-28 | 2022-04-19 | MakinaRocks Co., Ltd. | Method for determining optimal anomaly detection model for processing input data |
| KR20220098949A (en) | 2021-01-05 | 2022-07-12 | 한국과학기술원 | System and method for distributed training of deep learning model |
| US20220344049A1 (en) * | 2019-09-23 | 2022-10-27 | Presagen Pty Ltd | Decentralized artificial intelligence (ai)/machine learning training system |
| US20220357985A1 (en) * | 2021-05-07 | 2022-11-10 | Google Llc | Asynchronous distributed data flow for machine learning workloads |
| US11698863B1 (en) * | 2020-09-04 | 2023-07-11 | Inspur Suzhou Intelligent Technology Co., Ltd. | Data set and node cache-based scheduling method and device |
| US20230351491A1 (en) * | 2022-05-02 | 2023-11-02 | Truist Bank | Accelerated model training for real-time prediction of future events |
| US11863461B2 (en) * | 2019-12-09 | 2024-01-02 | Lynxi Technologies Co., Ltd. | Data processing method, data processing apparatus, electronic device, storage medium, and program product |
| US12035007B2 (en) * | 2020-03-19 | 2024-07-09 | Samsung Electronics Co., Ltd. | Computing device and operating method thereof |
| US12079720B2 (en) * | 2020-10-14 | 2024-09-03 | Samsung Sds Co., Ltd. | Apparatus and method for scheduling data augmentation technique |
-
2022
- 2022-11-29 KR KR1020220162976A patent/KR20240079749A/en active Pending
-
2023
- 2023-06-30 US US18/345,083 patent/US12314201B2/en active Active
Patent Citations (21)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US10748555B2 (en) * | 2014-06-30 | 2020-08-18 | Dolby Laboratories Licensing Corporation | Perception based multimedia processing |
| US20170061329A1 (en) * | 2015-08-31 | 2017-03-02 | Fujitsu Limited | Machine learning management apparatus and method |
| US11049011B2 (en) * | 2016-11-16 | 2021-06-29 | Indian Institute Of Technology Delhi | Neural network classifier |
| US20180349313A1 (en) | 2017-06-01 | 2018-12-06 | Electronics And Telecommunications Research Institute | Parameter server and method for sharing distributed deep learning parameter using the same |
| US11100370B2 (en) * | 2017-07-13 | 2021-08-24 | Peking University Shenzhen Graduate School | Method of using deep discriminate network model for person re-identification in image or video |
| US10587776B2 (en) * | 2017-07-24 | 2020-03-10 | Samsung Electronics Co., Ltd. | Electronic device and method for controlling the electronic device |
| US20210019152A1 (en) | 2019-07-15 | 2021-01-21 | Microsoft Technology Licensing, Llc | Data parallelism in distributed training of artificial intelligence models |
| US20220344049A1 (en) * | 2019-09-23 | 2022-10-27 | Presagen Pty Ltd | Decentralized artificial intelligence (ai)/machine learning training system |
| US11308366B2 (en) * | 2019-10-28 | 2022-04-19 | MakinaRocks Co., Ltd. | Method for determining optimal anomaly detection model for processing input data |
| US11863461B2 (en) * | 2019-12-09 | 2024-01-02 | Lynxi Technologies Co., Ltd. | Data processing method, data processing apparatus, electronic device, storage medium, and program product |
| KR20210073145A (en) | 2019-12-10 | 2021-06-18 | 한국전자통신연구원 | Scheduling-based Training Data/Model Allocation Method and Apparatus for Distributed-Parallel Deep Learning |
| KR20210092078A (en) | 2020-01-15 | 2021-07-23 | 삼성전자주식회사 | Memory Device performing parallel calculation process, Operating Method thereof and Operation Method of Memory controller controlling memory device |
| US11416178B2 (en) | 2020-01-15 | 2022-08-16 | Samsung Electronics Co., Ltd. | Memory device performing parallel calculation processing, operating method thereof, and operating method of memory controller controlling the memory device |
| US12035007B2 (en) * | 2020-03-19 | 2024-07-09 | Samsung Electronics Co., Ltd. | Computing device and operating method thereof |
| US20210406676A1 (en) * | 2020-06-29 | 2021-12-30 | Alibaba Group Holding Limited | Variable input size techniques for neural networks |
| US11698863B1 (en) * | 2020-09-04 | 2023-07-11 | Inspur Suzhou Intelligent Technology Co., Ltd. | Data set and node cache-based scheduling method and device |
| US20220076115A1 (en) * | 2020-09-10 | 2022-03-10 | SK Hynix Inc. | Data processing based on neural network |
| US12079720B2 (en) * | 2020-10-14 | 2024-09-03 | Samsung Sds Co., Ltd. | Apparatus and method for scheduling data augmentation technique |
| KR20220098949A (en) | 2021-01-05 | 2022-07-12 | 한국과학기술원 | System and method for distributed training of deep learning model |
| US20220357985A1 (en) * | 2021-05-07 | 2022-11-10 | Google Llc | Asynchronous distributed data flow for machine learning workloads |
| US20230351491A1 (en) * | 2022-05-02 | 2023-11-02 | Truist Bank | Accelerated model training for real-time prediction of future events |
Non-Patent Citations (1)
| Title |
|---|
| Xianyan jia et al., "Whale: Efficient Giant Model Training over Heterogeneous GPUs", USENIX Association, Jul. 11, 2022, pp. 673-687. |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20240079749A (en) | 2024-06-05 |
| US20240176756A1 (en) | 2024-05-30 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| EP3407203B1 (en) | Statically schedulable feed and drain structure for systolic array architecture | |
| Allen-Zhu | Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter | |
| US6466946B1 (en) | Computer implemented scalable, incremental and parallel clustering based on divide and conquer | |
| CN110413507B (en) | System test method, device, computer equipment and storage medium | |
| US7689517B2 (en) | Cost management of software application portfolio | |
| US11789711B2 (en) | Using artificial intelligence to optimize software to run on heterogeneous computing resource | |
| CN111971694A (en) | Collaborative heterogeneous processing of training data for deep neural networks | |
| Li et al. | Provable Bregman-divergence based methods for nonconvex and non-Lipschitz problems | |
| CN112232426A (en) | Training method, device and equipment of target detection model and readable storage medium | |
| US20230252299A1 (en) | Detecting and mitigating fault in sparsity computation in deep neural network | |
| CN110599305A (en) | Service processing method, device and storage medium | |
| US20220083838A1 (en) | Method and apparatus with neural network inference optimization implementation | |
| CN111443999A (en) | Data parallel processing method, executor, computer equipment and storage medium | |
| US7181713B2 (en) | Static timing and risk analysis tool | |
| US12314201B2 (en) | Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment | |
| US7058912B2 (en) | Notifying status of execution of jobs used to characterize cells in an integrated circuit | |
| US20110029982A1 (en) | Network balancing procedure that includes redistributing flows on arcs incident on a batch of vertices | |
| US11372633B2 (en) | Method, device and terminal apparatus for code execution and computer readable storage medium | |
| US20230020929A1 (en) | Write combine buffer (wcb) for deep neural network (dnn) accelerator | |
| CN107526648A (en) | A kind of node device that handles is delayed the method and device of machine | |
| US20240111592A1 (en) | Method, system, and computer readable media for elastic heterogeneous clustering and heterogeneity-aware job configuration | |
| US11899967B2 (en) | Vector processor data storage | |
| US20230140239A1 (en) | Method and apparatus with data loading | |
| US20250200695A1 (en) | Apparatus and method for 3-dimensional parallelization for heterogeneous gpu cluster | |
| US20240193406A1 (en) | Method and apparatus with scheduling neural network |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| AS | Assignment |
Owner name: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KOREA, REPUBLIC OF Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KANG, KI-DONG;KIM, HONG-YEON;AN, BAIK-SONG;AND OTHERS;REEL/FRAME:064127/0969 Effective date: 20230614 |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
| STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
| STCF | Information on status: patent grant |
Free format text: PATENTED CASE |