WO2021049828A1 - 복수 개의 코어를 갖는 연산 장치 - Google Patents
복수 개의 코어를 갖는 연산 장치 Download PDFInfo
- Publication number
- WO2021049828A1 WO2021049828A1 PCT/KR2020/012023 KR2020012023W WO2021049828A1 WO 2021049828 A1 WO2021049828 A1 WO 2021049828A1 KR 2020012023 W KR2020012023 W KR 2020012023W WO 2021049828 A1 WO2021049828 A1 WO 2021049828A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- cores
- core
- present
- computing device
- neural network
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
Definitions
- the present invention relates to a computing device having a plurality of cores.
- Artificial Neural Network is a generic term for a computing system implemented by focusing on a neural network of a human or animal brain, and corresponds to one of the detailed methodologies of machine learning. Artificial neural networks have the same form as a network in which multiple neurons, which are nerve cells, are connected in the brain. Artificial neural networks are classified into several types according to their structure and function, and the most common artificial neural network is a multilayer perceptron with multiple hidden layers between one input layer and an output layer. The artificial neural network has a form in which several neurons, which are basic computing units, are connected by weighted links, and the weights of the weighted links can be adjusted to adapt to a given environment.
- the inventor(s) now, through this specification, propose a computing device including a plurality of cores, which is arranged to increase the scalability of the computing device, and is particularly optimized for calculation required for the execution of the artificial neural network algorithm described above. It is a bar.
- An object of the present invention is to provide a computing device including a plurality of cores arranged to increase the expandability of the computing device.
- an object of the present invention is to provide a computing device capable of minimizing a bottleneck caused by movement of a large number of data accompanying artificial neural network processing.
- a typical configuration of the present invention for achieving the above object is as follows.
- the computing device includes n cores, the n cores are arranged in a rotational structure in which first to nth cores are cyclically connected in one direction,
- Each of the n cores includes an independent memory unit, an operation unit, and an accumulation register unit, and outputs of the accumulation register units of the first to n-1th cores are respectively connected to inputs of the accumulation register units of the second to nth cores.
- the output of the accumulation register unit of the n-th core is provided with an operation device connected to the input of the accumulation register unit of the first core.
- a computing device including a plurality of cores arranged to increase the scalability of the computing device can be provided.
- a computing device capable of minimizing a bottleneck caused by movement of numerous data accompanying artificial neural network processing may be provided.
- FIG. 1 is a diagram showing a plurality of cores having an arrangement structure contrasting with the present invention.
- FIG. 2 is a view showing a plurality of cores arranged in a rotating structure according to an embodiment of the present invention.
- FIG 3 is a view showing a plurality of cores arranged in a rotating structure according to another embodiment of the present invention.
- the core arrangement structure peculiar to the present invention can be referred to as a rotating structure. That this rotational structure is advantageous for the expansion of the number of cores can be understood through the contrast with the core arrangement structure, not the rotational structure as shown in FIG. 1.
- 1 is a diagram showing a plurality of cores having an arrangement structure contrasting with the present invention. As shown in FIG. 1, for example, it is assumed that eight cores are provided as a computing device mobilized for artificial neural network-related calculations. In this computing unit, eight cores can be arranged in a total of four rows in two columns. Suppose that the number of cores needs to be expanded while performing calculations with such a computing device. If the ninth core is installed as the first row of the third column, the core disposed in this third column is farther away from the existing core, resulting in poor connectivity, and is an important flexibility in artificial neural network-related calculations. ) Will fall.
- the division of functions is determined by logically or physically connecting a portion serving as a computer and a portion serving as a memory (ie, a register) in a computing device.
- some calculators share memory with other calculators, and some calculators use multiple memories to calculate, load (read from memory), and store (stored in memory) in a predetermined manner.
- the match between the calculator and the memory may be changed during the execution of the algorithm. For example, the A core may need to access other memory even if it is not only writing memory 1 and 2, but also mainly writing memory 1 and 2.
- a flexible match between core and memory is implemented by software.
- the collision problem caused by the above-described flexible match (for example, while the B core is writing the calculation result stored in the first memory, the other core must not store the calculation result in the first memory) can be controlled by software.
- the ninth core is installed in the first row of the third column while the existing cores are arranged in a total of four rows in the arrangement illustrated in FIG. 1, that is, in two columns, the ninth cores are different cores. Since the number of cores disposed far from and close to each other is small, flexibility that is advantageous for artificial neural network computation is degraded.
- FIG. 2 is a view showing a plurality of cores arranged in a rotating structure according to an embodiment of the present invention.
- FIG. 2 shows a case in which a ninth core is added in a state in which eight cores arranged while rotating clockwise from the first core are already installed on the board.
- the ninth new core is disposed very close to the first, second, and eighth cores.
- FIG. 1 when comparing the positional relationship of the ninth new core with the existing cores, it will be understood that the characteristics of the arrangement of the rotating structure according to FIG. 2 are shown.
- the rotational structure is arranged in this way, a flexible connection between the added core and the existing cores can be advantageously implemented, thereby increasing the expandability of the core.
- FIG. 3 is a view showing a plurality of cores arranged in a rotating structure according to another embodiment of the present invention.
- the rotational structure shown in FIG. 3 is a more specific form of the rotational structure shown in FIG. 2, and the connection of the core starts from the outer edge, and the connection is continued in such a way that it is rolled in, and then it is rolled out again. The connection is continued, and the connection is made back to the core from which the connection was started. According to this structure, in particular, it is possible to minimize a bottleneck that occurs in data movement.
- each core includes an independent memory unit and an operation unit, and in general, unlike the conventional artificial neural network operation devices are largely divided into an input memory unit, an operation unit, and an output memory unit and are independently configured, the memory unit and the The calculation unit is distributed and arranged over the entire area.
- the rotation structure according to the present embodiment In the structure of the existing computing device, it is necessary to move the data accumulated in the output memory unit to the input memory unit, whereas in the rotation structure according to the present embodiment, data is transmitted to each core through a rotation path at the time when the operation is completed. As it is completed, this process can be omitted. Through this, it is possible to minimize a bottleneck occurring in data movement compared to a conventional structure in which data movement and operations are concentrated in a certain area. Even if the number of cores increases in the future, the rotational structure can be applied in the same manner, and since the memory unit and the operation unit are distributed over all areas, it is advantageous to increase the scalability of the artificial neural network operation device.
- the computing device according to the present invention is not necessarily limited to being used for calculations related to an artificial neural network, but is particularly suitable for application to a vast number of calculations related to an artificial neural network.
- the embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and recorded in a computer-readable recording medium.
- the computer-readable recording medium may include program instructions, data files, data structures, and the like alone or in combination.
- the program instructions recorded in the computer-readable recording medium may be specially designed and constructed for the present invention or may be known and usable to those skilled in the computer software field.
- Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, magnetic-optical media such as floptical disks. medium), and a hardware device specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like.
- Examples of program instructions include not only machine language codes such as those produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.
- the hardware device can be changed to one or more software modules to perform the processing according to the present invention, and vice versa.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Neurology (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (1)
- 복수 개의 코어를 갖는 연산 장치로서,상기 연산 장치는 n개의 코어를 포함하고, 상기 n개의 코어는 제1 내지 제n 코어가 일방향으로 순환 연결되는 회전 구조로 배치되며,상기 n개의 코어 각각은 독립적인 메모리부, 연산부 및 누적 레지스터부를 포함하고,상기 제1 내지 제n-1 코어의 누적 레지스터부의 출력은 상기 제2 내지 제n 코어의 누적 레지스터부의 입력과 각각 연결되고, 상기 제n 코어의 누적 레지스터부의 출력은 상기 제1 코어의 누적 레지스터부의 입력과 연결되는 연산 장치.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2019-0112053 | 2019-09-10 | ||
KR1020190112053A KR20210030653A (ko) | 2019-09-10 | 2019-09-10 | 복수 개의 코어를 갖는 연산 장치 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021049828A1 true WO2021049828A1 (ko) | 2021-03-18 |
Family
ID=74866731
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/012023 WO2021049828A1 (ko) | 2019-09-10 | 2020-09-07 | 복수 개의 코어를 갖는 연산 장치 |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR20210030653A (ko) |
WO (1) | WO2021049828A1 (ko) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090027184A (ko) * | 2007-09-11 | 2009-03-16 | 서울대학교산학협력단 | 부동 소수점 연산을 지원하는 부동 소수점 유닛-프로세싱 요소(fpu-pe) 구조 및 그 fpu-pe 구조를 포함한 재구성 어레이 프로세서(rap) 및 그 rap를 포함한 멀티미디어 플랫폼 |
US20170103317A1 (en) * | 2015-05-21 | 2017-04-13 | Google Inc. | Batch processing in a neural network processor |
KR20190048347A (ko) * | 2017-10-31 | 2019-05-09 | 삼성전자주식회사 | 프로세서 및 그 제어 방법 |
KR20190063383A (ko) * | 2017-11-29 | 2019-06-07 | 한국전자통신연구원 | 재조직 가능한 뉴럴 네트워크 컴퓨팅 장치 |
JP2019095861A (ja) * | 2017-11-17 | 2019-06-20 | 株式会社東芝 | ニューラルネットワーク装置 |
-
2019
- 2019-09-10 KR KR1020190112053A patent/KR20210030653A/ko not_active Application Discontinuation
-
2020
- 2020-09-07 WO PCT/KR2020/012023 patent/WO2021049828A1/ko active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090027184A (ko) * | 2007-09-11 | 2009-03-16 | 서울대학교산학협력단 | 부동 소수점 연산을 지원하는 부동 소수점 유닛-프로세싱 요소(fpu-pe) 구조 및 그 fpu-pe 구조를 포함한 재구성 어레이 프로세서(rap) 및 그 rap를 포함한 멀티미디어 플랫폼 |
US20170103317A1 (en) * | 2015-05-21 | 2017-04-13 | Google Inc. | Batch processing in a neural network processor |
KR20190048347A (ko) * | 2017-10-31 | 2019-05-09 | 삼성전자주식회사 | 프로세서 및 그 제어 방법 |
JP2019095861A (ja) * | 2017-11-17 | 2019-06-20 | 株式会社東芝 | ニューラルネットワーク装置 |
KR20190063383A (ko) * | 2017-11-29 | 2019-06-07 | 한국전자통신연구원 | 재조직 가능한 뉴럴 네트워크 컴퓨팅 장치 |
Also Published As
Publication number | Publication date |
---|---|
KR20210030653A (ko) | 2021-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11461637B2 (en) | Real-time resource usage reduction in artificial neural networks | |
EP3602280B1 (en) | Accessing prologue and epilogue data | |
CN111587440B (zh) | 用于更新精确突触权重值的神经形态芯片 | |
US5517596A (en) | Learning machine synapse processor system apparatus | |
US20120259804A1 (en) | Reconfigurable and customizable general-purpose circuits for neural networks | |
JP2021506032A (ja) | オンチップの計算ネットワーク | |
US11205097B2 (en) | Training artificial neural networks using context-dependent gating with weight stabilization | |
US20200019847A1 (en) | Processor array for processing sparse binary neural networks | |
US20210201120A1 (en) | Inference apparatus, convolution operation execution method, and program | |
CN115803754A (zh) | 用于在神经网络中处理数据的硬件架构 | |
WO2019057552A1 (en) | TIME REGROUP AND CORRELATION IN AN ARTIFICIAL NEURAL NETWORK | |
CN109165734B (zh) | 一种矩阵局部响应归一化的向量化实现方法 | |
WO2021049828A1 (ko) | 복수 개의 코어를 갖는 연산 장치 | |
Caswell et al. | Loopy neural nets: Imitating feedback loops in the human brain | |
US20200117449A1 (en) | Accelerated Access to Computations Results Generated from Data Stored in Memory Devices | |
CN112154415A (zh) | 大型计算机系统中的高效事件管理 | |
US20220164639A1 (en) | A system for mapping a neural network architecture onto a computing core and a method of mapping a neural network architecture onto a computing core | |
CN113469354A (zh) | 受存储器限制的神经网络训练 | |
CN113283575A (zh) | 用于重构人工神经网络的处理器及其操作方法、电气设备 | |
De Camargo et al. | A multi‐GPU algorithm for large‐scale neuronal networks | |
JP2022541144A (ja) | ハードウェア・アクセラレータとインターフェースするための方法 | |
EP3855443B1 (en) | Memory calibration device, system and method | |
KR102358508B1 (ko) | 업데이트 횟수에 기초한 프루닝 방법 및 장치 | |
JP7251354B2 (ja) | 情報処理装置、情報処理プログラム、及び情報処理方法 | |
KR102211604B1 (ko) | Gpu 기반의 채널 단위 딥뉴럴 네트워크 구조 검색을 사용하는 인공지능 시스템 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20863652 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20863652 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205 DATED 08/09/2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20863652 Country of ref document: EP Kind code of ref document: A1 |