US12314201B2 - Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment - Google Patents

Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment Download PDF

Info

Publication number
US12314201B2
US12314201B2 US18/345,083 US202318345083A US12314201B2 US 12314201 B2 US12314201 B2 US 12314201B2 US 202318345083 A US202318345083 A US 202318345083A US 12314201 B2 US12314201 B2 US 12314201B2
Authority
US
United States
Prior art keywords
time
computation
input data
devices
distributed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active, expires
Application number
US18/345,083
Other versions
US20240176756A1 (en
Inventor
Ki-Dong Kang
Hong-Yeon Kim
Baik-Song AN
Myung-Hoon CHA
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Electronics and Telecommunications Research Institute ETRI
Original Assignee
Electronics and Telecommunications Research Institute ETRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Electronics and Telecommunications Research Institute ETRI filed Critical Electronics and Telecommunications Research Institute ETRI
Assigned to ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE reassignment ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: AN, BAIK-SONG, CHA, MYUNG-HOON, KANG, KI-DONG, KIM, HONG-YEON
Publication of US20240176756A1 publication Critical patent/US20240176756A1/en
Application granted granted Critical
Publication of US12314201B2 publication Critical patent/US12314201B2/en
Active legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/14Handling requests for interconnection or transfer
    • G06F13/36Handling requests for interconnection or transfer for access to common bus or bus system
    • G06F13/362Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control
    • G06F13/3625Handling requests for interconnection or transfer for access to common bus or bus system with centralised access control using a time dependent access
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • G06F13/4221Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus being an input/output bus, e.g. ISA bus, EISA bus, PCI bus, SCSI bus
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3885Concurrent instruction execution, e.g. pipeline or look ahead using a plurality of independent parallel functional units
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/501Performance criteria
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/509Offload
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2213/00Indexing scheme relating to interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F2213/0026PCI express

Definitions

  • the present disclosure relates to technology for improving communication efficiency by unevenly distributing input data across respective devices when an AI model is processed in parallel.
  • Data parallelism is a parallelization technique in which, the same AI model is replicated to respective computation devices (e.g., GPUs) and input data is distributed there across so as to be concurrently processed.
  • Training of an AI model broadly includes a (forward) step for processing input data and a (backward) step for reflecting the processing result to the model.
  • the respective devices need to communicate with each other at the step for reflecting the processing result in order to synchronize the model.
  • An object of the present disclosure is to improve communication efficiency by unevenly distributing input data across respective devices when an AI model is processed in parallel.
  • Another object of the present disclosure is to alleviate a communication bottleneck occurring in a network environment in which a communication channel is shared.
  • a method for distributed training of an Artificial Intelligence (AI) model in a channel-sharing network environment includes determining whether data parallel processing is applied, calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
  • AI Artificial Intelligence
  • unevenly distributing the input data may comprise distributing the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
  • the smallest size, among the sizes of the unevenly distributed pieces of input data may be set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
  • the smallest size, among the sizes of the unevenly distributed pieces of input data may be set based on Equation (1) below:
  • Equation (1) above may denote the target computation time
  • t ori may denote the computation time
  • c may denote the communication time
  • d may denote the number of multiple computation devices.
  • the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
  • the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
  • the multiple computation devices may share a shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
  • an apparatus for distributed training of an AI model in a channel-sharing network environment includes a parallelism identification unit for determining whether data parallel processing is applied, a profiling unit for calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and a data distribution unit for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
  • the data distribution unit may distribute the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
  • the data distribution unit may set the smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
  • the data distribution unit may set the smallest size, among the sizes of the unevenly distributed pieces of input data, based on Equation (1) below:
  • Equation (1) above may denote the target computation time
  • t ori may denote the computation time
  • c may denote the communication time
  • d may denote the number of multiple computation devices.
  • the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
  • the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
  • the multiple computation devices may share a shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
  • FIG. 1 is a view conceptually illustrating an example of application of data parallelism
  • FIG. 2 is a view conceptually illustrating a data parallelism method in a mesh network environment
  • FIG. 3 conceptually illustrates a data parallelism method in a channel-sharing network environment
  • FIG. 4 is a flowchart illustrating a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure
  • FIG. 5 illustrates the configuration of an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure
  • FIG. 6 is a flowchart illustrating in detail a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure
  • FIG. 7 is a view of comparison of a communication time when a method according to an embodiment of the present disclosure is applied and a communication time when an existing method is applied;
  • FIG. 8 is a block diagram illustrating an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
  • FIG. 9 is a view illustrating the configuration of a computer system according to an embodiment.
  • each of expressions such as “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B, or C”, “at least one of A, B, and C”, and “at least one of A, B, or C” may include any one of the items listed in the expression or all possible combinations thereof.
  • FIG. 1 is a view conceptually illustrating an example of application of data parallelism.
  • FIG. 2 is a view conceptually illustrating a data parallelism method in a mesh network environment.
  • data parallelism is applied to four GPUs in a mesh network environment and a channel-sharing network environment.
  • a network in which a communication channel is shared such as PCIe, is used for communication between devices, but because such a channel-sharing network is used in a time-division manner, communication performance may be degraded when multiple devices simultaneously access the network.
  • FIG. 3 conceptually illustrates a data parallelism method in a channel-sharing network environment.
  • the present disclosure relates to a distributed training method capable of improving communication efficiency when an AI model is processed in a distributed manner using multiple computation devices in a network environment in which a communication channel is shared.
  • Data parallelism is a method of copying an AI model to respective computation devices and dividing input data so as to be processed in a distributed manner.
  • the respective computation devices process the input data in parallel, they communicate with each other in order to synchronize the model.
  • the present disclosure is a method for distributing input data such that the respective computation devices exclusively use the network at different times in order to alleviate degradation in AI model training performance caused due to the communication channel bottleneck.
  • FIG. 4 is a flowchart illustrating a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
  • the method for distributed training of an AI model in a channel-sharing network environment includes determining whether data parallel processing is applied at step S 110 , calculating a computation time and a communication time when evenly distributing input data across multiple computation devices at step S 120 , and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time at step S 130 .
  • unevenly distributing the input data at step S 130 may comprise distributing the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access the channel.
  • the smallest size, among the sizes of the unevenly distributed pieces of input data may be set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
  • Equation (1) the smallest size, among the sizes of the unevenly distributed pieces of input data, is set by Equation (1) below:
  • t new may be the target computation time
  • t ori may be the computation time
  • c may be the communication time
  • d may be the number of multiple computation devices.
  • the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
  • the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
  • the multiple computation devices may share the shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
  • FIG. 5 illustrates the configuration of an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
  • the apparatus for distributed training of an AI model in a channel-sharing network environment includes a data parallelism identification unit 210 , a profiling unit 220 , a data division unit 230 , and a data parallelism control unit 240 .
  • the data parallelism identification unit 210 determines whether a data parallelism technique can be applied, and the profiling unit 220 measures the execution time of the AI model to be trained. Also, the data division unit 230 determines division of the data to be input to each of computation devices based on the measured execution time, and the data parallelism control unit 240 transfers the divided data to each of the devices and performs data parallelism.
  • FIG. 6 is a flowchart illustrating in detail a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
  • step S 310 whether an AI model developer applies data parallelism is determined.
  • training of an AI model is started using an existing method (without parallelism or by applying another parallelism method) at step S 370 .
  • step S 320 whether the current network environment is a channel-sharing network is checked at step S 320 .
  • the current network environment is not a channel-sharing network, it is determined that there is no overhead resulting from network channel interference, so the existing data parallelism technique is applied at step S 360 .
  • the channel-sharing network is used, application of the present disclosure is started.
  • Application of the present disclosure requires information about a computation time and a communication time when the existing data parallelism is used, and the corresponding information may be acquired through a method such as advance profiling or online profiling at step S 330 .
  • a method such as advance profiling or online profiling at step S 330 .
  • how to divide the input data to be assigned to each of the devices is determined at step S 340 .
  • the method of dividing the input data to be assigned to each of the devices may be performed using Equation (1) below:
  • t new denotes the computation time corresponding to the data (having the smallest size) to be distributed to the first computation device
  • d denotes the number of computation devices to be used
  • t ori and c respectively denote the computation time and the communication time measured at the profiling step. That is, t ori and c can be acquired at the profiling step, and d is a value that can be input in advance. Accordingly, t new may be acquired.
  • data corresponding to the computation time, t new +c/d may be distributed to the second device
  • data corresponding to the computation time, t new +2c/d may be distributed to the third device, . . .
  • data corresponding to the computation time, t new +((d ⁇ 1)c/d) may be distributed to the last device. That is, the difference in computation time between the devices may correspond to the communication time divided by the number of devices.
  • t new may become a negative value.
  • t new is set to a minimum value (e.g., 1) that can be distributed, and data is distributed such that the difference between the data sizes to be transferred to the respective devices is constant.
  • FIG. 7 is a view of comparison of a communication time when the method according to an embodiment of the present disclosure is applied and a communication time when the existing method is applied.
  • respective devices sequentially access a shared network without interference, whereby the total execution time (AI model training time) may be reduced.
  • FIG. 8 is a block diagram illustrating an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
  • the apparatus for distributed training of an AI model in a channel-sharing network environment includes a parallelism identification unit 410 for determining whether data parallel processing is applied, a profiling unit 420 for calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and a data distribution unit 430 for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
  • the data distribution unit 430 may distribute the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
  • the data distribution unit 430 may set the smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
  • the data distribution unit 430 sets the smallest size, among the sizes of the unevenly distributed pieces of input data, using Equation (1) below:
  • t new may be the target computation time
  • t ori may be the computation time
  • c may be the communication time
  • d may be the number of multiple computation devices.
  • the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
  • the target computation time when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
  • the multiple computation devices may share the shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
  • FIG. 9 is a view illustrating the configuration of a computer system according to an embodiment.
  • the apparatus for distributed training of an AI model in a channel-sharing network environment may be implemented in a computer system 1000 including a computer-readable recording medium.
  • the computer system 1000 may include one or more processors 1010 , memory 1030 , a user-interface input device 1040 , a user-interface output device 1050 , and storage 1060 , which communicate with each other via a bus 1020 . Also, the computer system 1000 may further include a network interface 1070 connected to a network 1080 .
  • the processor 1010 may be a central processing unit or a semiconductor device for executing a program or processing instructions stored in the memory 1030 or the storage 1060 .
  • the memory 1030 and the storage 1060 may be storage media including at least one of a volatile medium, a nonvolatile medium, a detachable medium, a non-detachable medium, a communication medium, or an information delivery medium, or a combination thereof.
  • the memory 1030 may include ROM 1031 or RAM 1032 .
  • communication efficiency may be improved by unevenly distributing input data across respective devices when an AI model is processed in parallel.
  • the present disclosure may alleviate a communication bottleneck occurring in a network environment in which a communication channel is shared.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Neurology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

Disclosed herein is a method for distributed training of an AI model in a channel-sharing network environment. The method includes determining whether data parallel processing is applied, calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.

Description

CROSS REFERENCE TO RELATED APPLICATION
This application claims the benefit of Korean Patent Application No. 10-2022-0162976, filed Nov. 29, 2022, which is hereby incorporated by reference in its entirety into this application.
BACKGROUND OF THE INVENTION 1. Technical Field
The present disclosure relates generally to technology for distributed training of an Artificial Intelligence (AI) model using multiple computation devices in a network environment in which a channel is shared.
More particularly, the present disclosure relates to technology for improving communication efficiency by unevenly distributing input data across respective devices when an AI model is processed in parallel.
2. Description of the Related Art
Currently, the most commonly applied technique for parallel processing of an AI model is data parallelism. ‘Data parallelism’ is a parallelization technique in which, the same AI model is replicated to respective computation devices (e.g., GPUs) and input data is distributed there across so as to be concurrently processed. Training of an AI model broadly includes a (forward) step for processing input data and a (backward) step for reflecting the processing result to the model. When data parallelism is applied to the training of an AI model, the respective devices need to communicate with each other at the step for reflecting the processing result in order to synchronize the model.
Here, when communication between the devices is performed in a network environment such as PCIe in which a communication channel is shared, communication performance may be degraded because multiple devices simultaneously access the channel. Accordingly, technology for remedying such communication inefficiency is urgently required.
DOCUMENTS OF RELATED ART
    • (Patent Document 1) Korean Patent Application Publication No. 10-2022-0098949, titled “System and method for distributed training of deep-learning model”.
SUMMARY OF THE INVENTION
An object of the present disclosure is to improve communication efficiency by unevenly distributing input data across respective devices when an AI model is processed in parallel.
Another object of the present disclosure is to alleviate a communication bottleneck occurring in a network environment in which a communication channel is shared.
In order to accomplish the above objects, a method for distributed training of an Artificial Intelligence (AI) model in a channel-sharing network environment according to an embodiment of the present disclosure includes determining whether data parallel processing is applied, calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
Here, unevenly distributing the input data may comprise distributing the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
Here, the smallest size, among the sizes of the unevenly distributed pieces of input data, may be set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
Here, the smallest size, among the sizes of the unevenly distributed pieces of input data, may be set based on Equation (1) below:
t new = t ori - c d 2 * n = 0 d - 1 n ( 1 )
In Equation (1) above, tnew, may denote the target computation time, tori may denote the computation time, c may denote the communication time, and d may denote the number of multiple computation devices.
Here, the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
Here, when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
Here, the multiple computation devices may share a shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
Also, in order to accomplish the above objects, an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure includes a parallelism identification unit for determining whether data parallel processing is applied, a profiling unit for calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and a data distribution unit for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
Here, the data distribution unit may distribute the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
Here, the data distribution unit may set the smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
Here, the data distribution unit may set the smallest size, among the sizes of the unevenly distributed pieces of input data, based on Equation (1) below:
t n e w = t o τ i - c d 2 * n = 0 d - 1 n ( 1 )
In Equation (1) above, tnew, may denote the target computation time, tori may denote the computation time, c may denote the communication time, and d may denote the number of multiple computation devices.
Here, the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
Here, when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
Here, the multiple computation devices may share a shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
BRIEF DESCRIPTION OF THE DRAWINGS
The above and other objects, features, and advantages of the present disclosure will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a view conceptually illustrating an example of application of data parallelism;
FIG. 2 is a view conceptually illustrating a data parallelism method in a mesh network environment;
FIG. 3 conceptually illustrates a data parallelism method in a channel-sharing network environment;
FIG. 4 is a flowchart illustrating a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure;
FIG. 5 illustrates the configuration of an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating in detail a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure;
FIG. 7 is a view of comparison of a communication time when a method according to an embodiment of the present disclosure is applied and a communication time when an existing method is applied;
FIG. 8 is a block diagram illustrating an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure; and
FIG. 9 is a view illustrating the configuration of a computer system according to an embodiment.
DESCRIPTION OF THE PREFERRED EMBODIMENTS
The advantages and features of the present disclosure and methods of achieving them will be apparent from the following exemplary embodiments to be described in more detail with reference to the accompanying drawings. However, it should be noted that the present disclosure is not limited to the following exemplary embodiments, and may be implemented in various forms. Accordingly, the exemplary embodiments are provided only to disclose the present disclosure and to let those skilled in the art know the category of the present disclosure, and the present disclosure is to be defined based only on the claims. The same reference numerals or the same reference designators denote the same elements throughout the specification.
It will be understood that, although the terms “first,” “second,” etc. may be used herein to describe various elements, these elements are not intended to be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element discussed below could be referred to as a second element without departing from the technical spirit of the present disclosure.
The terms used herein are for the purpose of describing particular embodiments only and are not intended to limit the present disclosure. As used herein, the singular forms are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises,” “comprising,”, “includes” and/or “including,” when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In the present specification, each of expressions such as “A or B”, “at least one of A and B”, “at least one of A or B”, “A, B, or C”, “at least one of A, B, and C”, and “at least one of A, B, or C” may include any one of the items listed in the expression or all possible combinations thereof.
Unless differently defined, all terms used herein, including technical or scientific terms, have the same meanings as terms generally understood by those skilled in the art to which the present disclosure pertains. Terms identical to those defined in generally used dictionaries should be interpreted as having meanings identical to contextual meanings of the related art, and are not to be interpreted as having ideal or excessively formal meanings unless they are definitively defined in the present specification.
Hereinafter, embodiments of the present disclosure will be described in detail with reference to the accompanying drawings. In the following description of the present disclosure, the same reference numerals are used to designate the same or similar elements throughout the drawings, and repeated descriptions of the same components will be omitted.
FIG. 1 is a view conceptually illustrating an example of application of data parallelism.
In FIG. 1 , because computation occurring at the (backward) step for reflecting a processing result to a model coincides with communication, ‘computation’ is omitted.
Referring to FIG. 1 , it can be seen that communication occurs in order to synchronize the input-data-processing result between two GPUs.
FIG. 2 is a view conceptually illustrating a data parallelism method in a mesh network environment.
Referring to FIG. 2 , it can be seen that data parallelism is applied to four GPUs in a mesh network environment and a channel-sharing network environment.
When dedicated hardware, such as Nvidia's NVLink and NVSwitch, is used, even though the number of devices communicating with each other is increased by constructing a mesh network, no interference occurs in a channel. However, because this technology requires expensive dedicated hardware and is applicable only to GPUs of Nvidia, it does not support other NPUs such as GPUs of other manufacturers, an FPGA, and the like.
When dedicated hardware is not supported, a network in which a communication channel is shared, such as PCIe, is used for communication between devices, but because such a channel-sharing network is used in a time-division manner, communication performance may be degraded when multiple devices simultaneously access the network.
FIG. 3 conceptually illustrates a data parallelism method in a channel-sharing network environment.
Referring to FIG. 3 , it can be seen that communication is time-consuming compared to the case in FIG. 2 .
The present disclosure relates to a distributed training method capable of improving communication efficiency when an AI model is processed in a distributed manner using multiple computation devices in a network environment in which a communication channel is shared.
The most common method for training an AI model in a distributed manner is data parallelism. Data parallelism is a method of copying an AI model to respective computation devices and dividing input data so as to be processed in a distributed manner. Here, after the respective computation devices process the input data in parallel, they communicate with each other in order to synchronize the model. Here, if there is no hardware support, all of the computation devices simultaneously attempt communication, so a communication channel bottleneck may result in degradation in training performance. The present disclosure is a method for distributing input data such that the respective computation devices exclusively use the network at different times in order to alleviate degradation in AI model training performance caused due to the communication channel bottleneck.
FIG. 4 is a flowchart illustrating a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
Referring to FIG. 4 , the method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure includes determining whether data parallel processing is applied at step S110, calculating a computation time and a communication time when evenly distributing input data across multiple computation devices at step S120, and unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time at step S130.
Here, unevenly distributing the input data at step S130 may comprise distributing the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access the channel.
Here, the smallest size, among the sizes of the unevenly distributed pieces of input data, may be set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
Here, the smallest size, among the sizes of the unevenly distributed pieces of input data, is set by Equation (1) below:
t new = t ori - c d 2 * n = 0 d - 1 n ( 1 )
In Equation (1) above, tnew, may be the target computation time, tori may be the computation time, c may be the communication time, and d may be the number of multiple computation devices.
Here, the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
Here, when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
Here, the multiple computation devices may share the shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
FIG. 5 illustrates the configuration of an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
Referring to FIG. 5 , the apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure includes a data parallelism identification unit 210, a profiling unit 220, a data division unit 230, and a data parallelism control unit 240.
The data parallelism identification unit 210 determines whether a data parallelism technique can be applied, and the profiling unit 220 measures the execution time of the AI model to be trained. Also, the data division unit 230 determines division of the data to be input to each of computation devices based on the measured execution time, and the data parallelism control unit 240 transfers the divided data to each of the devices and performs data parallelism.
FIG. 6 is a flowchart illustrating in detail a method for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
Referring to FIG. 6 , in the method for distributed training of an AI model according to an embodiment of the present disclosure, whether an AI model developer applies data parallelism is determined at step S310. When data parallelism is not applied, training of an AI model is started using an existing method (without parallelism or by applying another parallelism method) at step S370. When data parallelism is applied, whether the current network environment is a channel-sharing network is checked at step S320. When the current network environment is not a channel-sharing network, it is determined that there is no overhead resulting from network channel interference, so the existing data parallelism technique is applied at step S360. When the channel-sharing network is used, application of the present disclosure is started.
Application of the present disclosure requires information about a computation time and a communication time when the existing data parallelism is used, and the corresponding information may be acquired through a method such as advance profiling or online profiling at step S330. When the information about the time consumed for computation and communication is acquired, how to divide the input data to be assigned to each of the devices is determined at step S340. The method of dividing the input data to be assigned to each of the devices may be performed using Equation (1) below:
t new = t ori - c d 2 * n = 0 d - 1 n ( 1 )
Here, tnew denotes the computation time corresponding to the data (having the smallest size) to be distributed to the first computation device, d denotes the number of computation devices to be used, and tori and c respectively denote the computation time and the communication time measured at the profiling step. That is, tori and c can be acquired at the profiling step, and d is a value that can be input in advance. Accordingly, tnew may be acquired.
When tnew is calculated, data corresponding to the computation time, tnew+c/d, may be distributed to the second device, data corresponding to the computation time, tnew+2c/d, may be distributed to the third device, . . . , and data corresponding to the computation time, tnew+((d−1)c/d), may be distributed to the last device. That is, the difference in computation time between the devices may correspond to the communication time divided by the number of devices.
If the value of c is much greater than tori in Equation (1) above, tnew may become a negative value. In this case, tnew is set to a minimum value (e.g., 1) that can be distributed, and data is distributed such that the difference between the data sizes to be transferred to the respective devices is constant. When the data to be transferred to each of the computation devices is set based on the corresponding equation, the actual input data is divided and transferred to the respective devices, data parallelism is applied, and training of an AI model is started.
FIG. 7 is a view of comparison of a communication time when the method according to an embodiment of the present disclosure is applied and a communication time when the existing method is applied.
Referring to FIG. 7 , when the method according to an embodiment of the present disclosure is applied, respective devices sequentially access a shared network without interference, whereby the total execution time (AI model training time) may be reduced.
FIG. 8 is a block diagram illustrating an apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure.
Referring to FIG. 8 , the apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment of the present disclosure includes a parallelism identification unit 410 for determining whether data parallel processing is applied, a profiling unit 420 for calculating a computation time and a communication time when input data is evenly distributed across multiple computation devices, and a data distribution unit 430 for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time.
Here, the data distribution unit 430 may distribute the input data such that a difference between the sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
Here, the data distribution unit 430 may set the smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
Here, the data distribution unit 430 sets the smallest size, among the sizes of the unevenly distributed pieces of input data, using Equation (1) below:
t new = t ori - c d 2 * n = 0 d - 1 n ( 1 )
In Equation (1) above, tnew, may be the target computation time, tori may be the computation time, c may be the communication time, and d may be the number of multiple computation devices.
Here, the difference between the sizes of the distributed pieces of input data may correspond to the communication time divided by the number of multiple computation devices.
Here, when the target computation time is calculated to be a negative value, a preset positive value may be used as the target computation time.
Here, the multiple computation devices may share the shared channel in a time-division manner based on the sizes of the unevenly distributed pieces of input data.
FIG. 9 is a view illustrating the configuration of a computer system according to an embodiment.
The apparatus for distributed training of an AI model in a channel-sharing network environment according to an embodiment may be implemented in a computer system 1000 including a computer-readable recording medium.
The computer system 1000 may include one or more processors 1010, memory 1030, a user-interface input device 1040, a user-interface output device 1050, and storage 1060, which communicate with each other via a bus 1020. Also, the computer system 1000 may further include a network interface 1070 connected to a network 1080. The processor 1010 may be a central processing unit or a semiconductor device for executing a program or processing instructions stored in the memory 1030 or the storage 1060. The memory 1030 and the storage 1060 may be storage media including at least one of a volatile medium, a nonvolatile medium, a detachable medium, a non-detachable medium, a communication medium, or an information delivery medium, or a combination thereof. For example, the memory 1030 may include ROM 1031 or RAM 1032.
According to the present disclosure, communication efficiency may be improved by unevenly distributing input data across respective devices when an AI model is processed in parallel.
Also, the present disclosure may alleviate a communication bottleneck occurring in a network environment in which a communication channel is shared.
Specific implementations described in the present disclosure are embodiments and are not intended to limit the scope of the present disclosure. For conciseness of the specification, descriptions of conventional electronic components, control systems, software, and other functional aspects thereof may be omitted. Also, lines connecting components or connecting members illustrated in the drawings show functional connections and/or physical or circuit connections, and may be represented as various functional connections, physical connections, or circuit connections that are capable of replacing or being added to an actual device. Also, unless specific terms, such as “essential”, “important”, or the like, are used, the corresponding components may not be absolutely necessary.
Accordingly, the spirit of the present disclosure should not be construed as being limited to the above-described embodiments, and the entire scope of the appended claims and their equivalents should be understood as defining the scope and spirit of the present disclosure.

Claims (12)

What is claimed is:
1. A method for distributed training of an Artificial Intelligence (AI) model in a channel-sharing network environment including multiple computation devices, comprising:
determining whether data parallel processing is applied;
calculating a computation time and a communication time when input data is evenly distributed across the multiple computation devices; and
unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time,
wherein unevenly distributing the input data comprises distributing the input data such that a difference between sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
2. The method of claim 1, wherein a smallest size, among the sizes of the unevenly distributed pieces of input data, is set to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
3. The method of claim 1, wherein a smallest size, among the sizes of the unevenly distributed pieces of input data, is set based on Equation (1) below:
t new = t ori - c d 2 * n = 0 d - 1 n ( 1 )
In Equation (1) above, tnew denotes a target computation time, tori denotes the computation time, c denotes the communication time, and d denotes a number of multiple computation devices.
4. The method of claim 3, wherein the difference between the sizes of the distributed pieces of input data corresponds to the communication time divided by the number of multiple computation devices.
5. The method of claim 4, wherein, when the target computation time is calculated to be a negative value, a preset positive value is used as the target computation time.
6. The method of claim 1, wherein the multiple computation devices share a shared channel in a time-division manner based on sizes of the unevenly distributed pieces of input data.
7. An apparatus for distributed training of an Artificial Intelligence (AI) model in a channel-sharing network environment including multiple computation devices, comprising:
a parallelism identification unit for determining whether data parallel processing is applied;
a profiling unit for calculating a computation time and a communication time when input data is evenly distributed across the multiple computation devices; and
a data distribution unit for unevenly distributing the input data across the multiple computation devices based on the computation time and the communication time,
wherein the data distribution unit distributes the input data such that a difference between sizes of the pieces of input data distributed to the respective computation devices is constant so as to enable the multiple computation devices to sequentially access a channel.
8. The apparatus of claim 7, wherein the data distribution unit sets a smallest size, among the sizes of the unevenly distributed pieces of input data, to correspond to a target computation time that is calculated by subtracting a value proportional to the communication time from the computation time.
9. The apparatus of claim 7, wherein the data distribution unit sets a smallest size, among the sizes of the unevenly distributed pieces of input data, based on Equation (1) below:
t n e w = t o r i - c d 2 * n = 0 d - 1 n ( 1 )
In Equation (1) above, tnew denotes a target computation time, tori denotes the computation time, c denotes the communication time, and d denotes a number of multiple computation devices.
10. The apparatus of claim 9, wherein the difference between the sizes of the distributed pieces of input data corresponds to the communication time divided by the number of multiple computation devices.
11. The apparatus of claim 10, wherein, when the target computation time is calculated to be a negative value, a preset positive value is used as the target computation time.
12. The apparatus of claim 7, wherein the multiple computation devices share a shared channel in a time-division manner based on sizes of the unevenly distributed pieces of input data.
US18/345,083 2022-11-29 2023-06-30 Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment Active 2043-09-05 US12314201B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020220162976A KR20240079749A (en) 2022-11-29 2022-11-29 Method and apparatus for distribution learning of artificial intelligence model in channel sharing network environment
KR10-2022-0162976 2022-11-29

Publications (2)

Publication Number Publication Date
US20240176756A1 US20240176756A1 (en) 2024-05-30
US12314201B2 true US12314201B2 (en) 2025-05-27

Family

ID=91191747

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/345,083 Active 2043-09-05 US12314201B2 (en) 2022-11-29 2023-06-30 Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment

Country Status (2)

Country Link
US (1) US12314201B2 (en)
KR (1) KR20240079749A (en)

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170061329A1 (en) * 2015-08-31 2017-03-02 Fujitsu Limited Machine learning management apparatus and method
US20180349313A1 (en) 2017-06-01 2018-12-06 Electronics And Telecommunications Research Institute Parameter server and method for sharing distributed deep learning parameter using the same
US10587776B2 (en) * 2017-07-24 2020-03-10 Samsung Electronics Co., Ltd. Electronic device and method for controlling the electronic device
US10748555B2 (en) * 2014-06-30 2020-08-18 Dolby Laboratories Licensing Corporation Perception based multimedia processing
US20210019152A1 (en) 2019-07-15 2021-01-21 Microsoft Technology Licensing, Llc Data parallelism in distributed training of artificial intelligence models
KR20210073145A (en) 2019-12-10 2021-06-18 한국전자통신연구원 Scheduling-based Training Data/Model Allocation Method and Apparatus for Distributed-Parallel Deep Learning
US11049011B2 (en) * 2016-11-16 2021-06-29 Indian Institute Of Technology Delhi Neural network classifier
KR20210092078A (en) 2020-01-15 2021-07-23 삼성전자주식회사 Memory Device performing parallel calculation process, Operating Method thereof and Operation Method of Memory controller controlling memory device
US11100370B2 (en) * 2017-07-13 2021-08-24 Peking University Shenzhen Graduate School Method of using deep discriminate network model for person re-identification in image or video
US20210406676A1 (en) * 2020-06-29 2021-12-30 Alibaba Group Holding Limited Variable input size techniques for neural networks
US20220076115A1 (en) * 2020-09-10 2022-03-10 SK Hynix Inc. Data processing based on neural network
US11308366B2 (en) * 2019-10-28 2022-04-19 MakinaRocks Co., Ltd. Method for determining optimal anomaly detection model for processing input data
KR20220098949A (en) 2021-01-05 2022-07-12 한국과학기술원 System and method for distributed training of deep learning model
US20220344049A1 (en) * 2019-09-23 2022-10-27 Presagen Pty Ltd Decentralized artificial intelligence (ai)/machine learning training system
US20220357985A1 (en) * 2021-05-07 2022-11-10 Google Llc Asynchronous distributed data flow for machine learning workloads
US11698863B1 (en) * 2020-09-04 2023-07-11 Inspur Suzhou Intelligent Technology Co., Ltd. Data set and node cache-based scheduling method and device
US20230351491A1 (en) * 2022-05-02 2023-11-02 Truist Bank Accelerated model training for real-time prediction of future events
US11863461B2 (en) * 2019-12-09 2024-01-02 Lynxi Technologies Co., Ltd. Data processing method, data processing apparatus, electronic device, storage medium, and program product
US12035007B2 (en) * 2020-03-19 2024-07-09 Samsung Electronics Co., Ltd. Computing device and operating method thereof
US12079720B2 (en) * 2020-10-14 2024-09-03 Samsung Sds Co., Ltd. Apparatus and method for scheduling data augmentation technique

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10748555B2 (en) * 2014-06-30 2020-08-18 Dolby Laboratories Licensing Corporation Perception based multimedia processing
US20170061329A1 (en) * 2015-08-31 2017-03-02 Fujitsu Limited Machine learning management apparatus and method
US11049011B2 (en) * 2016-11-16 2021-06-29 Indian Institute Of Technology Delhi Neural network classifier
US20180349313A1 (en) 2017-06-01 2018-12-06 Electronics And Telecommunications Research Institute Parameter server and method for sharing distributed deep learning parameter using the same
US11100370B2 (en) * 2017-07-13 2021-08-24 Peking University Shenzhen Graduate School Method of using deep discriminate network model for person re-identification in image or video
US10587776B2 (en) * 2017-07-24 2020-03-10 Samsung Electronics Co., Ltd. Electronic device and method for controlling the electronic device
US20210019152A1 (en) 2019-07-15 2021-01-21 Microsoft Technology Licensing, Llc Data parallelism in distributed training of artificial intelligence models
US20220344049A1 (en) * 2019-09-23 2022-10-27 Presagen Pty Ltd Decentralized artificial intelligence (ai)/machine learning training system
US11308366B2 (en) * 2019-10-28 2022-04-19 MakinaRocks Co., Ltd. Method for determining optimal anomaly detection model for processing input data
US11863461B2 (en) * 2019-12-09 2024-01-02 Lynxi Technologies Co., Ltd. Data processing method, data processing apparatus, electronic device, storage medium, and program product
KR20210073145A (en) 2019-12-10 2021-06-18 한국전자통신연구원 Scheduling-based Training Data/Model Allocation Method and Apparatus for Distributed-Parallel Deep Learning
KR20210092078A (en) 2020-01-15 2021-07-23 삼성전자주식회사 Memory Device performing parallel calculation process, Operating Method thereof and Operation Method of Memory controller controlling memory device
US11416178B2 (en) 2020-01-15 2022-08-16 Samsung Electronics Co., Ltd. Memory device performing parallel calculation processing, operating method thereof, and operating method of memory controller controlling the memory device
US12035007B2 (en) * 2020-03-19 2024-07-09 Samsung Electronics Co., Ltd. Computing device and operating method thereof
US20210406676A1 (en) * 2020-06-29 2021-12-30 Alibaba Group Holding Limited Variable input size techniques for neural networks
US11698863B1 (en) * 2020-09-04 2023-07-11 Inspur Suzhou Intelligent Technology Co., Ltd. Data set and node cache-based scheduling method and device
US20220076115A1 (en) * 2020-09-10 2022-03-10 SK Hynix Inc. Data processing based on neural network
US12079720B2 (en) * 2020-10-14 2024-09-03 Samsung Sds Co., Ltd. Apparatus and method for scheduling data augmentation technique
KR20220098949A (en) 2021-01-05 2022-07-12 한국과학기술원 System and method for distributed training of deep learning model
US20220357985A1 (en) * 2021-05-07 2022-11-10 Google Llc Asynchronous distributed data flow for machine learning workloads
US20230351491A1 (en) * 2022-05-02 2023-11-02 Truist Bank Accelerated model training for real-time prediction of future events

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Xianyan jia et al., "Whale: Efficient Giant Model Training over Heterogeneous GPUs", USENIX Association, Jul. 11, 2022, pp. 673-687.

Also Published As

Publication number Publication date
KR20240079749A (en) 2024-06-05
US20240176756A1 (en) 2024-05-30

Similar Documents

Publication Publication Date Title
EP3407203B1 (en) Statically schedulable feed and drain structure for systolic array architecture
Allen-Zhu Natasha: Faster non-convex stochastic optimization via strongly non-convex parameter
US6466946B1 (en) Computer implemented scalable, incremental and parallel clustering based on divide and conquer
CN110413507B (en) System test method, device, computer equipment and storage medium
US7689517B2 (en) Cost management of software application portfolio
US11789711B2 (en) Using artificial intelligence to optimize software to run on heterogeneous computing resource
CN111971694A (en) Collaborative heterogeneous processing of training data for deep neural networks
Li et al. Provable Bregman-divergence based methods for nonconvex and non-Lipschitz problems
CN112232426A (en) Training method, device and equipment of target detection model and readable storage medium
US20230252299A1 (en) Detecting and mitigating fault in sparsity computation in deep neural network
CN110599305A (en) Service processing method, device and storage medium
US20220083838A1 (en) Method and apparatus with neural network inference optimization implementation
CN111443999A (en) Data parallel processing method, executor, computer equipment and storage medium
US7181713B2 (en) Static timing and risk analysis tool
US12314201B2 (en) Method and apparatus for distributed training of artificial intelligence model in channel-sharing network environment
US7058912B2 (en) Notifying status of execution of jobs used to characterize cells in an integrated circuit
US20110029982A1 (en) Network balancing procedure that includes redistributing flows on arcs incident on a batch of vertices
US11372633B2 (en) Method, device and terminal apparatus for code execution and computer readable storage medium
US20230020929A1 (en) Write combine buffer (wcb) for deep neural network (dnn) accelerator
CN107526648A (en) A kind of node device that handles is delayed the method and device of machine
US20240111592A1 (en) Method, system, and computer readable media for elastic heterogeneous clustering and heterogeneity-aware job configuration
US11899967B2 (en) Vector processor data storage
US20230140239A1 (en) Method and apparatus with data loading
US20250200695A1 (en) Apparatus and method for 3-dimensional parallelization for heterogeneous gpu cluster
US20240193406A1 (en) Method and apparatus with scheduling neural network

Legal Events

Date Code Title Description
AS Assignment

Owner name: ELECTRONICS AND TELECOMMUNICATIONS RESEARCH INSTITUTE, KOREA, REPUBLIC OF

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:KANG, KI-DONG;KIM, HONG-YEON;AN, BAIK-SONG;AND OTHERS;REEL/FRAME:064127/0969

Effective date: 20230614

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO SMALL (ORIGINAL EVENT CODE: SMAL); ENTITY STATUS OF PATENT OWNER: SMALL ENTITY

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION

STPP Information on status: patent application and granting procedure in general

Free format text: NON FINAL ACTION MAILED

STPP Information on status: patent application and granting procedure in general

Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPP Information on status: patent application and granting procedure in general

Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS

STCF Information on status: patent grant

Free format text: PATENTED CASE