WO2022099573A1 - 一种面向亿级类脑计算机的芯片扩展方法 - Google Patents

一种面向亿级类脑计算机的芯片扩展方法 Download PDF

Info

Publication number
WO2022099573A1
WO2022099573A1 PCT/CN2020/128505 CN2020128505W WO2022099573A1 WO 2022099573 A1 WO2022099573 A1 WO 2022099573A1 CN 2020128505 W CN2020128505 W CN 2020128505W WO 2022099573 A1 WO2022099573 A1 WO 2022099573A1
Authority
WO
WIPO (PCT)
Prior art keywords
chip
data
computing
brain
address
Prior art date
Application number
PCT/CN2020/128505
Other languages
English (en)
French (fr)
Inventor
马德
戴书画
李一涛
潘纲
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2022099573A1 publication Critical patent/WO2022099573A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/40Bus structure
    • G06F13/4004Coupling between buses
    • G06F13/4027Coupling between buses using bus bridges
    • G06F13/4031Coupling between buses using bus bridges with arbitration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4204Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F13/00Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
    • G06F13/38Information transfer, e.g. on bus
    • G06F13/42Bus transfer protocol, e.g. handshake; Synchronisation
    • G06F13/4282Bus transfer protocol, e.g. handshake; Synchronisation on a serial bus, e.g. I2C bus, SPI bus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the invention belongs to the field of artificial intelligence computing chips, in particular to a chip expansion method for a billion-level brain-like computer.
  • the artificial neural network that has emerged in recent years is an imitation of the structure of the human brain, abstracting its hierarchical structure and the characteristics of neuron interconnection. . Although artificial neural network achieves good computing performance, it consumes a lot of energy. Therefore, people imitate the human brain at a biological level, resulting in brain-like computing chips.
  • Brain-like computing chips fundamentally solve the problem of the "memory wall" of the traditional von Neumann architecture.
  • the brain-like computing chip uses a network on chip (NoC) as its communication architecture, uses a mesh topology, and mounts a computing unit on each router.
  • NoC network on chip
  • Each computing unit has its own local storage.
  • This integrated structure of storage and computing greatly reduces the time and power consumption consumed by data transportation, and distributes the calculation on each node for large-scale parallel computing, which further improves the computing efficiency.
  • the biggest advantage of brain-like computing hardware devices is low power consumption, so it can be applied to fields that require high energy efficiency, such as smart wearable devices and Internet of Things technology.
  • the spiking neural network is the algorithmic cornerstone of brain-like computing chips. Neuroscientists believe that the brain's excellent performance is based on three properties: a large and wide range of connections, a way of transmitting information with both temporal and spatial characteristics, and a locally stored synaptic structure. SPINN The idea of power consumption design is easy to implement in hardware. Most of the spiking neural networks use small-sample, unsupervised learning methods. Compared with deep neural networks, the amount of learning data required is smaller, the calculation process is shorter, and the fault tolerance rate and robustness are higher. The spiking neural network has unique advantages for cognitive tasks, and the realization of the computing hardware of the spiking neural network is also a supplement and breakthrough to the traditional computer.
  • a single neuron in the human brain has only a simple function, but hundreds of millions of neurons form a huge neuron computing cluster, which can complete a variety of complex tasks through simple learning. Therefore, the large-scale expansion of brain-like computing chips is still a key issue in the development process of this field.
  • the communication efficiency between chips and the coordination and management of chip groups are the bottlenecks of scale expansion.
  • the purpose of the present invention is to provide a chip expansion method for a billion-level brain-like computer, which is efficient, flexible, and hierarchical, and can increase the scale of brain-like computing chips to hundreds of millions of levels.
  • a chip expansion method for a billion-level brain-like computer comprising the following steps:
  • a plurality of brain-like computing chips arranged in a matrix are expanded for each chip array through an asynchronous data communication module, and each brain-like computing chip includes a plurality of computing neuron nodes arranged in a matrix.
  • the asynchronous data communication module serves as a communication bridge for each brain-like computing chip, including an asynchronous transceiver interface, a parallel distribution unit, and a serial arbitration unit;
  • the asynchronous transceiver interface asynchronously receives and sends transmission data
  • the parallel distribution unit parses the asynchronously received transmission data and requests the data injection permission corresponding to the computing neuron node, and then injects the transmission data into the computing neuron node of the brain-like computing chip in parallel;
  • the serial arbitration unit merges the result data parallelly output by a plurality of computing neuron nodes into a serial queue as transmission data.
  • the asynchronous data communication module serves as a communication bridge for each brain-like computing chip, including an asynchronous transceiver interface, a parallel distribution unit, and a serial arbitration unit;
  • the asynchronous transceiver interface asynchronously receives and sends transmission data
  • the parallel distribution unit parses the asynchronously received transmission data and requests the data injection permission corresponding to the computing neuron node, and then injects the transmission data into the computing neuron node of the brain-like computing chip in parallel;
  • the serial arbitration unit merges the result data parallelly output by a plurality of computing neuron nodes into a serial output queue as transmission data.
  • the parallel distribution unit parses the packet header of the asynchronously received transmission data packet, extracts the destination address from the data packet header, requests permission according to the virtual channel of the computing neuron node corresponding to the destination address, and injects the transmission data into the brain-like computing chip in the computational neuron node.
  • the serial arbitration unit adopts a polling arbitration algorithm to merge the result data of the computing neuron nodes into a serial output queue as the transmission data.
  • the transmission data is sent out through the asynchronous transceiver interface, and then transmitted to other brain-like computing chips through the asynchronous four-phase handshake protocol.
  • an asynchronous data communication module is configured for each rectangular boundary of each brain-like computing chip, which can realize the communication transmission of transmission data in four directions.
  • the result data of the boundary computing neuron nodes will be merged into the same serial output queue according to the polling arbitration algorithm, sent out through the asynchronous transceiver interface, and then transmitted to other classes through the asynchronous four-phase handshake protocol.
  • Brain computing chip This saves on-chip I/O pins.
  • the data transfer station includes a sending distribution module, a receiving arbitration module, and a plurality of asynchronous communication modules, and each asynchronous communication module corresponds to a chip array;
  • the asynchronous communication module includes a receiving queue, a sending queue, an inter-chip data queue, an asynchronous communication interface and an address mapper, wherein the asynchronous communication interface receives transmission data to form a receiving queue, and simultaneously sends the transmission data in the sending queue,
  • the address mapper maps the transmission data in the receive queue to other chip arrays;
  • the sending distribution module coordinates and manages switches of the sending queue, the receiving queue and the data path of the inter-chip data queue in each asynchronous communication module;
  • the receiving arbitration module cooperatively manages and stores the data transmitted to other chip clusters in the sending queue in an orderly manner.
  • the address mapper includes two address mapping schemes;
  • Address mapping scheme 1 When mapping transmission data, part of the virtual address of the current chip array is directly mapped to the address area of the same shape of other chip arrays, so that the current chip array corresponds to the computing neuron nodes in other chip arrays one-to-one , to realize the mapping of transmission data;
  • Address mapping scheme 2 configure an address mapping table, and map the transmission data to corresponding computing neuron nodes in other chip arrays according to the mapping information in the address mapping table.
  • the address mapping scheme is used to solve the problem that one chip array cannot access the computing neuron nodes of another chip array due to the limited address space.
  • the first address mapping scheme is direct mapping, which maps part of an area of one chip array to an address area of the same shape of another chip array.
  • the two nodes correspond one by one, and the data is sent to a certain computing neuron node of one chip array.
  • the second address mapping scheme is free mapping, which requires an additional address mapping table. The corresponding relationship between the computing neuron nodes between the two chip arrays is determined through the address mapping table, and the address mapping table is queried according to the destination node information obtained by parsing the packet header.
  • This solution can distribute the forwarding nodes to various areas of other computing chips, and is relatively friendly to the connection relationship. In actual use, users can choose flexibly according to the connection scale and mapping efficiency.
  • mapping process of the address mapper to the transmission data is:
  • the packet header of the transmitted data When the packet header of the transmitted data arrives, parse the packet header and determine the destination address of the transmitted data according to the address mapping scheme, modify the virtual address of the packet header to the corresponding destination address and inject it into the sending queue, and record the destination address at the same time.
  • the data payload and the data packet trailer arrive, the data payload and the data packet trailer are forwarded to the destination address.
  • the destination address is recorded with the node port number and the virtual channel number of the data packet header sent by the boundary as an identifier, and the subsequent data load and packet tail are directly forwarded according to the destination address until the next data packet header is updated.
  • the address mapper of the present invention can realize not only the mapping of data transmission among multiple chip arrays belonging to the same chip cluster, but also the mapping of transmission data between multiple chip arrays belonging to different chip clusters.
  • the transmission data will be converted into an inter-chip data queue after being received by the receiving queue, and then mapped to the computing neuron nodes of other chip arrays through the inter-chip data queue.
  • the transmission data is injected into the send queue and sent out through the asynchronous handshake interface, and then transmitted to other chip clusters through the Ethernet communication module, and the data transfer station of other chip clusters. Transmit the received transmission data and map it to the computing neuron node of the internal chip array.
  • the Ethernet communication module configures an IP address for each chip cluster, and interconnects all the chip clusters through the TCP protocol for data exchange and management.
  • the transmission data will be dynamically stored in the Ethernet communication module using ping-pong buffering technology to improve the data throughput, and then transmitted to the data transfer station.
  • one chip cluster in the computing cluster is selected as the server, and the other chip clusters are used as the client.
  • the client and the server exchange data between the chip clusters through the Ethernet communication module, and the server is responsible for data coordination and task management. , and the server also needs to interact with the client.
  • the beneficial effects of the present invention at least include:
  • the hierarchical expansion method provided in the chip expansion method for the billion-level brain-like computer of the present invention can be selected according to the actual size of the neuron, the design of each layer is relatively independent, and the design of each layer can be adjusted under the condition that the interface remains unchanged , which is easy to maintain and has excellent scalability, which can reach the scale of 100 million neurons.
  • the inter-chip asynchronous data communication scheme provided in the chip expansion method for the billion-level brain-like computer of the present invention greatly reduces the demand for chip pins while ensuring efficient transmission.
  • the address mapping scheme provided in the chip expansion method for billion-level brain-like computers of the present invention breaks the constraint of address storage length, greatly reduces the memory size of the required storage addresses in the chip, and can effectively carry out large-scale brain-like computing chips. cascade.
  • the invention is oriented to the chip expansion method of the billion-level brain-like computer.
  • the brain-like computing chip cluster provides management of chips and tasks while expanding the chip scale, laying a foundation for the billion-level neuron brain-like computer.
  • FIG. 1 is a schematic diagram of an expansion example of a chip expansion method for a billion-level brain-like computer provided by an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of an asynchronous data communication module provided by an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a data transfer station provided by an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an address mapping solution provided by an embodiment of the present invention.
  • the embodiment of the present invention provides a chip expansion method for a billion-level brain-like computer.
  • the chip expansion method consists of a chip expansion scheme consisting of three levels: a first-level inter-chip asynchronous data communication module scheme, which is responsible for the brain-like computing chips. Communication, connecting multiple brain-like computing chips into a chip array; the second level is the chip array data transfer station, which is responsible for data exchange between chip arrays, completes the chip array cascade through address mapping, and expands the chips into a chip cluster;
  • the third-level brain-like computing cluster uses the Ethernet communication module to organize each chip cluster into a computing cluster, which is responsible for the data exchange of the chip cluster and the management of chip tasks.
  • FIG. 1 is a schematic diagram of an expansion example of a chip expansion method for a billion-level brain-like computer provided by an embodiment of the present invention.
  • four brain-like computing chips form a chip array.
  • the chips in the chip array can be directly connected through an asynchronous data communication module.
  • the expanded chip array is still a regular grid topology, which is convenient for Expand further.
  • 3 chip arrays can form a chip cluster, and data exchange is performed between the arrays through the data transfer station.
  • Each chip array has only one boundary connected to the data transfer station.
  • the other end of the data transfer station is responsible for communicating with other chip clusters and responsible for data.
  • Multiple chip clusters can form a brain-like computing cluster, and data is transmitted between them through TCP/IP.
  • the asynchronous data communication module serves as a communication bridge for each brain-like computing chip, including an asynchronous transceiver interface, a parallel distribution unit, and a serial arbitration unit; wherein, the asynchronous transceiver interface asynchronously receives and sends transmission data; the parallel distribution unit parses the asynchronous After receiving the transmission data and requesting the data injection permission of the corresponding computing neuron node, the transmission data is injected into the computing neuron node of the brain-like computing chip in parallel; the serial arbitration unit outputs the result data of multiple computing neuron nodes in parallel Merge into a serial output queue as transmit data.
  • FIG. 2 is a schematic structural diagram of an asynchronous data communication module provided by an embodiment of the present invention.
  • a single brain-like computing chip is composed of 24 ⁇ 24 neuron computing nodes, and each boundary is configured with an asynchronous data communication module.
  • the asynchronous transceiver interface is used to ensure the data Accurate input.
  • the data enters the brain-like computing chip, it is serial.
  • the parallel distribution unit needs to request permission from the virtual channel of the corresponding node according to its destination address. When the node is idle, it will inject the data into the network.
  • the brain-like computing chip sends data to the outside, all 24 boundary nodes may generate data.
  • the serial arbitration unit puts the boundary data into the output queue in turn through the polling arbitration algorithm, and then sends the data through the asynchronous transceiver interface.
  • the data transfer station includes a sending distribution module, a receiving arbitration module, and a plurality of asynchronous communication modules, and each asynchronous communication module corresponds to a chip array;
  • the asynchronous communication module includes a receiving queue, a sending queue, an inter-chip data queue, an asynchronous handshake interface and an address mapper, wherein the asynchronous handshake interface receives transmission data to form a receiving queue, and at the same time sends the transmission data in the sending queue,
  • the address mapper maps the transmission data in the receive queue to other chip arrays;
  • the sending distribution module coordinates and manages the switches of the sending queues, the receiving queues and the data paths of the inter-chip data queues in each asynchronous communication module; the receiving arbitration module coordinates and manages the data transmitted to other chip clusters and stores them in the sending queues in an orderly manner.
  • FIG. 3 is a schematic structural diagram of a data transfer station provided by an embodiment of the present invention.
  • the data transfer station consists of three asynchronous communication modules, each of which is equipped with: a sending data queue, a receiving data queue and an inter-chip data queue, which are respectively used to temporarily store the data sent to the array and receive data from the array. data, and data communicated with each other between chips in different arrays.
  • the asynchronous communication module implemented by FPGA is responsible for the sending and receiving of data, the received data will be temporarily stored in the receiving queue, and the address mapper will query the address mapping table according to the packet header of the data to determine whether the data is sent to other chip arrays or transmitted to Data for other chip clusters.
  • the header address of the data transmitted to other chip arrays will be modified according to the mapping table, so that it is configured as the address of the destination chip array and temporarily stored in the inter-chip data queue.
  • the data queues between each interface chip may request the same sending interface, so a sending allocation module is required to manage the order of queue requests, and data is transmitted between the two permitted queues through a data selector.
  • the data transmitted to other chip clusters are sequentially transferred to the queue storing the data between clusters through the receiving arbitration module, and then handed over to the cluster module for processing.
  • the address mapper includes two address mapping schemes
  • Address mapping scheme 1 When mapping transmission data, part of the virtual address of the current chip array is directly mapped to the address area of the same shape of other chip arrays, so that the current chip array corresponds to the computing neuron nodes in other chip arrays one-to-one , to realize the mapping of transmission data;
  • Address mapping scheme 2 configure an address mapping table, and map the transmission data to corresponding computing neuron nodes in other chip arrays according to the mapping information in the address mapping table.
  • FIG. 4 is a schematic diagram of an address mapping solution provided by an embodiment of the present invention. As shown in Figure 4, each chip has 24 ⁇ 24 computing nodes, four chips form a 2 ⁇ 2 chip array, and three chip arrays form a chip cluster. The addressable range of each chip is a 64 ⁇ 64 matrix, then 48 ⁇ 48 addresses in the chip have actual physical nodes, and the rest can be used as virtual forwarding nodes for address mapping.
  • the computing neuron node (47, 24) is connected to the virtual address (48, 25), then the computing neuron node will set the destination address of the packet header to (48, 25), the data sent to this address is actually sent off-chip, and is received by the chip array data transfer station.
  • the address mapping scheme adopts direct mapping, which directly maps the virtual address 16 ⁇ 24 matrix to another chip array within the matrix range of X-axis coordinates from 0 to 15 and Y-axis coordinates from 24 to 47. The two matrixes One-to-one correspondence with each node.
  • the data sent to the virtual node, through the address mapper changes the coordinates of its packet header to (0,25), which is the destination node coordinates corresponding to the target chip array, and then passes through the data transfer station. Sent to the corresponding chip array.
  • the upper half of the virtual node address can be mapped to another chip array.
  • the interconnection between the entire chip cluster can be completed, making the entire cluster work as a whole.
  • the Ethernet communication module configures an IP address for each chip cluster, and interconnects all the chip clusters through the TCP protocol for data exchange and management.
  • One chip cluster in the computing cluster is selected as the server, the rest of the chip clusters are used as the client, and the client and the server exchange data between the chip clusters through the Ethernet communication module.
  • the Ethernet communication module and the data transfer station are constructed by using the Zynq chip, wherein the ARM end of the Zynq core is used to construct the Ethernet communication module, and the FPGA end of the Zynq core is used to construct the data transfer station.
  • the ARM side will implement an LWIP protocol stack.
  • the data When the data is distributed to a chip cluster, it will be stored in the dynamic storage first, and then passed to the FPGA side through the AXI4 protocol for the next step of data distribution; the ping-pong cache technology is implemented on the ARM side. , which can improve data throughput.
  • a chip cluster is selected as the host, which is responsible for data coordination and task management. At the same time, the chip cluster needs to interact with the PC.
  • the expansion method of the hierarchical billion-level neuron brain-like computing chip proposed by the invention can efficiently and flexibly expand the chip to a complete brain-like computer system, solve the problem of data transmission address access through the address mapping scheme, and complete large-scale computing through the computing cluster scheme. Scale cascading and management of chip clusters.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Neurology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Multi Processors (AREA)
  • Computer And Data Communications (AREA)

Abstract

本发明公开了一种面向亿级类脑计算机的芯片扩展方法,包括以下步骤:通过以太网通信模块为所述面向亿级神经元类脑计算机拓展连接多个芯片簇,组成计算集群;通过数据中转站为每个芯片簇拓展连接多个芯片阵列;通过异步数据通信模块为每个芯片阵列拓展连接矩阵排列的多个类脑计算芯片,每个类脑计算芯片包含以矩阵排列的多个计算神经元节点。该芯片扩展方法高效率、灵活、且具有层次化,能够将类脑计算芯片神经元规模提升至上亿级别。

Description

一种面向亿级类脑计算机的芯片扩展方法 技术领域
本发明属于人工智能计算芯片领域,具体涉及一种面向亿级类脑计算机的芯片扩展方法。
背景技术
随着摩尔定律的到达物理器件的瓶颈,传统的冯诺依曼体系结构的计算机由于“内存墙”、“功耗墙”等原因,其计算性能已经无法维持高速的增长。如何在提高计算性能的同时降低功耗成为一个日益严峻的问题。随后人们将目光转向了人脑,人脑是一个高度发达的计算体系结构,其在完成高性能的计算的同时,仅仅用了不到20W的功耗。同时人脑在形象认知方面有其独特的优越性,也具有传统计算机架构无可比拟的鲁棒性、容错率。人类的大脑由许许多多的神经元组成,具有突触、轴突、胞体等结构,近些年来兴起的人工神经网络是对人脑结构的模仿,抽象出其层级结构和神经元互联的特性。人工神经网络虽然实现了较好的计算性能,但是消耗了大量的能量。因此人们对人脑进行生物级的模仿,产生了类脑计算芯片。
类脑计算芯片从根本上解决了传统冯诺依曼架构“内存墙”的问题。类脑计算芯片采用片上网络(NoC)作为其通信架构,使用网格式拓扑结构,每个路由器上挂载一个计算单元。每个计算单元都拥有自己的本地存储。这种存算一体的结构大大减少了数据的搬运所消耗的时间和功耗,并且将计算分布在各个节点,进行大规模的并行计算,进一步提高了计算效 率。类脑计算硬件设备最大的优势就是低功耗,因此它可以应用到对能效要求较高的领域,如智能穿戴设备及物联网技术等。
脉冲神经网络是类脑计算芯片的算法基石。神经学家认为大脑拥有如此出色的性能主要基于三个特性:大量而又广泛的连接、同时具有时间和空间特性的信息传递方式和本地存储的突触结构。脉冲神经网络正是应用这三个特性而诞生的第三代神经网络,相较于现行的深度神经网络,它采用时序脉冲作为信息传递的媒介,其算法本身具有事务驱动的特性,符合硬件低功耗设计的思想,易于硬件实现。脉冲神经网络大部分采用小样本、无监督式学习方法,相较于深度神经网络的学习数据量需求较小,计算流程较短,容错率和鲁棒性较高。脉冲神经网络对有认知型任务有着独特的优势,实现脉冲神经网络计算硬件也是对传统计算机的补充与突破。
人脑的单个神经元只有简单的功能,但是上亿个神经元组成一个庞大的神经元计算集群,通过简单的学习便可以完成各式各样的复杂任务。因此,类脑计算芯片的大规模扩展仍然该领域发展进程上的关键问题,芯片间的通信效率和芯片群的协调与管理都是规模扩展的瓶颈之处。
发明内容
本发明的目的就是提供一种面向亿级类脑计算机的芯片扩展方法,该芯片扩展方法高效率、灵活、且具有层次化,能够将类脑计算芯片规模提升至上亿级别。
为实现上述发明目的,本发明提供的技术方案为:
一种面向亿级类脑计算机的芯片扩展方法,包括以下步骤:
通过以太网通信模块为所述面向亿级神经元类脑计算机拓展连接多个芯片簇,组成计算集群;
通过数据中转站为每个芯片簇拓展连接多个芯片阵列;
通过异步数据通信模块为每个芯片阵列拓展连接矩阵排列的多个类脑计算芯片,每个类脑计算芯片包含以矩阵排列的多个计算神经元节点。
所述异步数据通信模块作为每个类脑计算芯片的通信桥梁,包括异步收发接口、并行分发单元、串行仲裁单元;
所述异步收发接口异步接收和发送传输数据;
所述并行分发单元解析异步接收的传输数据,并请求对应计算神经元节点的数据注入许可后,将传输数据并行注入类脑计算芯片的计算神经元节点中;
所述串行仲裁单元将多个计算神经元节点并行输出的结果数据归并到一个串行队列中作为传输数据。
优选地,所述异步数据通信模块作为每个类脑计算芯片的通信桥梁,包括异步收发接口、并行分发单元、串行仲裁单元;
所述异步收发接口异步接收和发送传输数据;
所述并行分发单元解析异步接收的传输数据,并请求对应计算神经元节点的数据注入许可后,将传输数据并行注入类脑计算芯片的计算神经元节点中;
所述串行仲裁单元将多个计算神经元节点并行输出的结果数据归并到一个串行输出队列中作为传输数据。
其中,所述并行分发单元解析异步接收的传输数据包的包头,从数据包头中提取目的地址,依据目的地址相对应的计算神经元节点的虚拟通道请求许可,将传输数据注入到类脑计算芯片的计算神经元节点中。
优选地,所述串行仲裁单元采用轮询仲裁算法将计算神经元节点的结果数据归并到一个串行输出队列中作为传输数据。该传输数据经过异步收 发接口被发送出去,然后通过异步四相握手协议传输给其他类脑计算芯片。
优选地,为每个类脑计算芯片的每个矩形边界配置一个异步数据通信模块,可以实现四个方向的传输数据的通信传输。这种方案中,边界计算神经元节点的结果数据会通过根据轮询仲裁算法归并到同一个串行输出队列中,通过经过异步收发接口被发送出去,然后通过异步四相握手协议传输给其他类脑计算芯片。这样能够节省芯片I/O引脚。
优选地,所述数据中转站包括发送分配模块、接收仲裁模块、多个异步通信模块,每个异步通信模块对应一个芯片阵列;
所述异步通信模块包括接收队列、发送队列、芯片间数据队列、异步通信接口和地址映射器,其中,所述异步通信接口接收传输数据形成接收队列,同时将发送队列内的传输数据发送出去,地址映射器将接收队列中的传输数据映射到其他芯片阵列;
所述发送分配模块协调管理各异步通信模块中发送队列、接收队列以及芯片间数据队列的数据通路的开关;
所述接收仲裁模块协同管理传输给其他芯片簇的数据有序存入发送队列。
优选地,所述地址映射器包含两种地址映射方案;
地址映射方案一:在进行传输数据映射时,将当前芯片阵列的部分虚拟地址直接映射到其他芯片阵列相同形状的地址区域,以使当前芯片阵列与其他芯片阵列中的计算神经元节点一一对应,实现传输数据的映射;
地址映射方案二:配置一张地址映射表,并依据地址映射表中的映射信息将传输数据映射到其他芯片阵列中相应的计算神经元节点。
本发明中,地址映射方案用来解决地址空间有限导致一个芯片阵列无法访问到另一个芯片阵列的计算神经元节点的问题。地址映射方案一为直 接映射,将一个芯片阵列的部分区域映射到另一芯片阵列相同形状的地址区域,两者节点一一对应,数据发往一个芯片阵列的某一计算神经元节点,便视作发往另一芯片阵列的对应计算神经元节点,该方案简单可靠。地址映射方案二为自由映射,需要额外一张地址映射表,将两个芯片阵列间的计算神经元节点的对应关系通过地址映射表确定下来,根据解析数据包头获得的目的节点信息查询地址映射表,来确定数据需要发往的芯片阵列以及具体地址,然后将数据注入对应接口的发送队列,该方案可将转发节点分散到其他计算芯片的各个区域,对连接关系相对友好。用户在实际使用过程中可以根据连接规模和映射效率灵活选择。
优选地,所述地址映射器对传输数据的映射过程为:
当传输数据的数据包头到达时,解析数据包头并依据地址映射方案确定传输数据的目的地址,将数据包头的虚拟地址修改为对应的目的地址并注入到发送队列,同时记录目的地址,当数据负载和数据包尾到达时,将数据负载和数据包尾转发至目的地址。本发明中,以边界发出数据包头的节点端口号和虚拟通道号作为标识来记录目的地址,后续数据负载和包尾根据目的地址直接转发,直至下一个数据包头进行更新。
本发明的地址映射器既可以实现属于同一芯片簇的多个芯片阵列间传输数据的映射,还可以实现属于不同芯片簇的多个芯片阵列间传输数据的映射。当进行属于同一芯片簇的多个芯片阵列间传输数据的映射时,传输数据经接收队列接收后会转换为芯片间数据队列,经过芯片间数据队列映射到其他芯片阵列的计算神经元节点。当进行属于不同芯片簇的多个芯片阵列间传输数据的映射时,传输数据注入发送队列后经过异步握手接口发送出去,并经过以太网通信模块传输到其他芯片簇,其他芯片簇的数据中转站对接收的传输数据进行中转,映射给内部芯片阵列的计算神经元节 点。
优选地,所述以太网通信模块为每个芯片簇配置一个IP地址,通过TCP协议将所有芯片簇互联,进行数据交换与管理。当传输数据发放给某个芯片簇时,传输数据会在以太网通信模块中采用乒乓缓存技术实现动态存储,以提高数据吞吐量,然后传输给数据中转站。运行时,选择计算集群中的一个芯片簇作为服务端,其余芯片簇作为客户端,客户端与服务端通过以太网通信模块进行芯片簇之间的数据交换,服务端负责数据的协调和任务管理,同时服务端还需要和客户端进行交互。
与现有技术相比,本发明具有的有益效果至少包括:
本发明面向亿级类脑计算机的芯片扩展方法中提供的层次化扩展方法可以根据实际神经元规模需求进行选择,各层之间设计相对独立,在接口不变的情况下便可以调整各层设计,便于维护,扩展性非常优越,可达亿级神经元规模。
本发明面向亿级类脑计算机的芯片扩展方法中提供的芯片间异步数据通信方案,在保证高效传输的同时,大大减少了对芯片引脚的需求。
本发明面向亿级类脑计算机的芯片扩展方法中提供的地址映射方案打破了地址存储长度的约束,大大减小芯片中所需存储地址的内存大小,可以有效的进行类脑计算芯片的大规模级联。
本发明面向亿级类脑计算机的芯片扩展方法,类脑计算芯片集群在扩大芯片规模的同时,提供了对芯片及任务的管理,为亿级神经元类脑计算机奠定了基础。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对 实施例或现有技术描述中所需要使用的附图做简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动前提下,还可以根据这些附图获得其他附图。
图1是本发明实施例提供的面向亿级类脑计算机的芯片扩展方法的拓展例子示意图;
图2是本发明实施例提供的异步数据通信模块的结构示意图;
图3是本发明实施例提供的数据中转站的结构示意图;
图4是本发明实施例提供的地址映射方案示意图。
具体实施方式
为使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例对本发明进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本发明,并不限定本发明的保护范围。
本发明实施例提供了一种面向亿级类脑计算机的芯片扩展方法,该芯片扩展方法由三个层次组成的芯片扩展方案:第一级芯片间异步数据通信模块方案,负责类脑计算芯片间通信,将多个类脑计算芯片连接成一个芯片阵列;第二级是芯片阵列数据中转站,负责芯片阵列间的数据交换,通过地址映射完成芯片阵列级联,将芯片扩展成一个芯片簇;第三级类脑计算集群,采用太网通信模块将各个芯片簇组织成计算集群,负责芯片簇的数据交换以及芯片任务管理。
图1是本发明实施例提供的面向亿级类脑计算机的芯片扩展方法的拓展例子示意图。如图1所示,4个类脑计算芯片组成了一个芯片阵列,芯片阵列中的芯片通过异步数据通信模块直连即可,同时,扩展后的芯片阵列仍是规整的网格拓扑结构,方便进一步扩展。3个芯片阵列可以组成一 个芯片簇,阵列之间通过数据中转站进行数据交换,每个芯片阵列仅有一个边界与数据中转站相连,数据中转站的另一端负责与其他芯片簇通信,负责数据的返回以及外部脉冲信息、配置信息的注入等。多个芯片簇可以组成一个类脑计算集群,他们之间通过TCP/IP进行数据的传输。
实施例中,异步数据通信模块作为每个类脑计算芯片的通信桥梁,包括异步收发接口、并行分发单元、串行仲裁单元;其中,异步收发接口异步接收和发送传输数据;并行分发单元解析异步接收的传输数据,并请求对应计算神经元节点的数据注入许可后,将传输数据并行注入类脑计算芯片的计算神经元节点中;串行仲裁单元将多个计算神经元节点并行输出的结果数据归并到一个串行输出队列中作为传输数据。
图2是本发明实施例提供的异步数据通信模块的结构示意图。如图2所示,单个类脑计算芯片由24×24个神经元计算节点组成,每个边界配置有一个异步数据通信模块,当数据注入类脑计算芯片时,首先经过异步收发接口确保数据的准确输入,此时数据进入类脑计算芯片时是串行的,并行分发单元需根据其目的地址向对应节点的虚拟通道请求许可,当该节点空闲时,便将数据注入网络。类脑计算芯片向外部发送数据时,24个边界节点均有可能产生数据,此时串行仲裁单元通过轮询仲裁算法,依次将边界数据放至输出队列中,再通过异步收发接口发放数据。
实施例中,数据中转站包括发送分配模块、接收仲裁模块、多个异步通信模块,每个异步通信模块对应一个芯片阵列;
其中,异步通信模块包括接收队列、发送队列、芯片间数据队列、异步握手接口和地址映射器,其中,所述异步握手接口接收传输数据形成接收队列,同时将发送队列内的传输数据发送出去,地址映射器将接收队列中的传输数据映射到其他芯片阵列;
发送分配模块协调管理各异步通信模块中发送队列、接收队列以及芯片间数据队列的数据通路的开关;接收仲裁模块协同管理传输给其他芯片簇的数据有序存入发送队列。
图3是本发明实施例提供的数据中转站的结构示意图。如图4所示,数据中转站由三个异步通信模块组成,每个模块配备有:发送数据队列,接收数据队列和芯片间数据队列,分别用于暂存发送给阵列的数据,从阵列接收的数据,和不同阵列的芯片间互相通信的数据。通过FPGA实现的异步通信模块负责数据的收发,接收的数据会暂存至接收队列,地址映射器会根据数据的包头查询地址映射表,来判断该数据是发往其他芯片阵列的数据还是传输给其他芯片簇的数据。传输给其他芯片阵列的数据的包头会根据映射表修改包头地址,使其配置成目的芯片阵列的地址,暂存至芯片间数据队列。各个接口芯片间数据队列可能会请求同一个发送接口,因此需要一个发送分配模块来管理队列请求的顺序,获得许可的两个队列之间通过数据选择器传输数据。传输给其他芯片簇的数据则通过接收仲裁模块依次将数据传递给存放簇间数据的队列,再交由集群模块处理。
实施例中,地址映射器包含两种地址映射方案;
地址映射方案一:在进行传输数据映射时,将当前芯片阵列的部分虚拟地址直接映射到其他芯片阵列相同形状的地址区域,以使当前芯片阵列与其他芯片阵列中的计算神经元节点一一对应,实现传输数据的映射;
地址映射方案二:配置一张地址映射表,并依据地址映射表中的映射信息将传输数据映射到其他芯片阵列中相应的计算神经元节点。
图4是本发明实施例提供的地址映射方案示意图。如图4所示,每个芯片具有24×24个计算节点,四个芯片组成一个2×2的芯片阵列,三个芯片阵列组成一个芯片簇。每个芯片的可寻址范围为64×64的矩阵,那 么芯片中有48×48个地址具有实际的物理节点,而其余部分均可作为虚拟转发节点进行地址映射。
如图4步骤①所示,在一个实施例中,计算神经元节点(47,24)连接到了虚拟地址(48,25),那么计算神经元节点会将数据包头的目地地址设为(48,25),数据发送到该地址实际是发往了片外,被芯片阵列数据中转站所接收。该实施例中,地址映射方案采用直接映射,将虚拟地址16×24的矩阵直接映射到另一芯片阵列X轴坐标从0到15,Y轴坐标从24到47的矩阵范围内,两个矩阵中每个节点一一对应。如图4步骤②所示,发往虚拟节点的数据,通过地址映射器,将其包头的坐标改为(0,25),即为目标芯片阵列所对应的目的节点坐标,然后通过数据中转站发往对应的芯片阵列。
如图4所示,虚拟节点地址的上半部份可以映射到另一个芯片阵列,通过合理的虚拟节点地址分配,可以完成整个芯片簇之间互联,使整个簇如同一个整体在工作。
实施例中,以太网通信模块为每个芯片簇配置一个IP地址,通过TCP协议将所有芯片簇互联,进行数据交换与管理。选择计算集群中的一个芯片簇作为服务端,其余芯片簇作为客户端,客户端与服务端通过以太网通信模块进行芯片簇之间的数据交换。
实施例中,通过Zynq芯片来构建以太网通信模块和数据中转站,其中,Zynq芯的ARM端来构建太网通信模块,Zynq芯的FPGA端来构建数据中转站。ARM端会实现一个LWIP协议栈,当数据发放给某芯片簇时,会先存入动态存储中,然后通过AXI4协议传递给FPGA端,进行数据的下一步分发;在ARM端实现了乒乓缓存技术,可以提高数据吞吐量。运行时选择一个芯片簇作为主机,负责数据的协调和任务管理,同时该芯 片簇需要和PC端进行交互。
本发明提出的层次化亿级神经元类脑计算芯片扩展方法可高效灵活的将芯片扩展至一个完整的类脑计算机系统,通过地址映射方案解决数据传输地址访问的问题,通过计算集群方案完成大规模级联以及芯片簇的管理。
以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明,应理解的是以上所述仅为本发明的最优选实施例,并不用于限制本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种面向亿级类脑计算机的芯片扩展方法,其特征在于,包括以下步骤:
    通过以太网通信模块为所述面向亿级神经元类脑计算机拓展连接多个芯片簇,组成计算集群;
    通过数据中转站为每个芯片簇拓展连接多个芯片阵列;
    通过异步数据通信模块为每个芯片阵列拓展连接矩阵排列的多个类脑计算芯片,每个类脑计算芯片包含以矩阵排列的多个计算神经元节点。
  2. 如权利要求1所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述异步数据通信模块作为每个类脑计算芯片的通信桥梁,包括异步收发接口、并行分发单元、串行仲裁单元;
    所述异步收发接口异步接收和发送传输数据;
    所述并行分发单元解析异步接收的传输数据,并请求对应计算神经元节点的数据注入许可后,将传输数据并行注入类脑计算芯片的计算神经元节点中;
    所述串行仲裁单元将多个计算神经元节点并行输出的结果数据归并到一个串行输出队列中作为传输数据。
  3. 如权利要求2所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述并行分发单元解析异步接收的传输数据包的包头,从数据包头中提取目的地址,依据目的地址相对应计算神经元节点的虚拟通道请求许可,将传输数据注入该计算神经元节点中。
  4. 如权利要求2所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述串行仲裁单元采用轮询仲裁算法将计算神经元节点的结点数 据归并到一个串行输出队列中作为传输数据。
  5. 如权利要求1~4任一项所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,为每个类脑计算芯片的每个矩形边界配置一个异步数据通信模块,可以实现四个方向的数据的通信传输。
  6. 如权利要求1所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述数据中转站包括发送分配模块、接收仲裁模块、多个异步通信模块,每个异步通信模块对应一个芯片阵列;
    所述异步通信模块包括接收队列、发送队列、芯片间数据队列、异步握手接口和地址映射器,其中,所述异步握手接口接收传输数据形成接收队列,同时将发送队列内的传输数据发送出去,地址映射器将接收队列中的传输数据映射到其他芯片阵列;
    所述发送分配模块协调管理各异步通信模块中发送队列、接收队列以及芯片间数据队列的数据通路的开关;
    所述接收仲裁模块协同管理传输给其他芯片簇的数据有序存入簇间发送队列。
  7. 如权利要求6所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述地址映射器包含两种地址映射方案;
    地址映射方案一:在进行传输数据映射时,将当前芯片阵列的部分虚拟地址直接映射到其他芯片阵列相同形状的地址区域,以使当前芯片阵列与其他芯片阵列中的计算神经元节点一一对应,实现传输数据的映射;
    地址映射方案二:配置一张地址映射表,并依据地址映射表中的映射信息将传输数据映射到其他芯片阵列中相应的计算神经元节点。
  8. 如权利要求7所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述地址映射器对传输数据的映射过程为:
    当传输数据的数据包头到达时,解析数据包头并依据映射方案确定传输数据的目的地址,并将数据包头的虚拟地址修改为对应的目的地址后,注入到发送队列,同时记录目的地址,当数据负载和数据包尾到达时,将数据负载和数据包尾转发至目的地址。
  9. 如权利要求1所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,所述以太网通信模块为每个芯片簇配置一个IP地址,通过TCP协议将所有芯片簇互联,进行数据交换与管理。
  10. 如权利要求1~9任一项所述的面向亿级类脑计算机的芯片扩展方法,其特征在于,选择计算集群中的一个芯片簇作为服务端,其余芯片簇作为客户端,客户端与服务端通过以太网通信模块进行芯片簇之间的数据交换。
PCT/CN2020/128505 2020-11-12 2020-11-13 一种面向亿级类脑计算机的芯片扩展方法 WO2022099573A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011261807.2A CN112269751B (zh) 2020-11-12 2020-11-12 一种面向亿级神经元类脑计算机的芯片扩展方法
CN202011261807.2 2020-11-12

Publications (1)

Publication Number Publication Date
WO2022099573A1 true WO2022099573A1 (zh) 2022-05-19

Family

ID=74339102

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/128505 WO2022099573A1 (zh) 2020-11-12 2020-11-13 一种面向亿级类脑计算机的芯片扩展方法

Country Status (2)

Country Link
CN (1) CN112269751B (zh)
WO (1) WO2022099573A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113218437B (zh) * 2021-04-30 2022-05-13 华中师范大学 高密度电荷传感器芯片大面阵可容错网络读出装置及方法
CN113312304B (zh) * 2021-06-04 2023-04-21 海光信息技术股份有限公司 一种互联装置、主板及服务器
CN113709039B (zh) * 2021-08-26 2022-11-11 上海新氦类脑智能科技有限公司 管理芯片与芯片网格阵列的通信方法、装置、设备和介质
CN114399033B (zh) * 2022-03-25 2022-07-19 浙江大学 基于神经元指令编码的类脑计算系统和计算方法
CN115102896B (zh) * 2022-07-22 2022-11-15 北京象帝先计算技术有限公司 数据广播方法、广播加速器、noc、soc及电子设备
CN115392443B (zh) * 2022-10-27 2023-03-10 之江实验室 类脑计算机操作系统的脉冲神经网络应用表示方法及装置
CN115576889B (zh) * 2022-11-15 2023-03-03 南京芯驰半导体科技有限公司 链式的多芯片系统及通讯方法
CN117634550B (zh) * 2024-01-25 2024-06-04 之江实验室 一种面向多类脑芯片级联系统的时间同步方法与装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201651A (zh) * 2016-06-27 2016-12-07 鄞州浙江清华长三角研究院创新中心 神经形态芯片的模拟器
US20180078193A1 (en) * 2016-09-16 2018-03-22 International Business Machines Corporation Flexible neural probes
CN110163016A (zh) * 2019-04-29 2019-08-23 清华大学 混合计算系统和混合计算方法
CN111159093A (zh) * 2019-11-25 2020-05-15 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 异构智能计算系统

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104809501B (zh) * 2014-01-24 2018-05-01 清华大学 一种基于类脑协处理器的计算机系统
CN105740946B (zh) * 2015-07-29 2019-02-12 上海磁宇信息科技有限公司 一种应用细胞阵列计算系统实现神经网络计算的方法
CN105718996B (zh) * 2015-07-29 2019-02-19 上海磁宇信息科技有限公司 细胞阵列计算系统以及其中的通信方法
CN105913119B (zh) * 2016-04-06 2018-04-17 中国科学院上海微系统与信息技术研究所 行列互联的异构多核心类脑芯片及其使用方法
WO2018058426A1 (zh) * 2016-09-29 2018-04-05 清华大学 硬件神经网络转换方法、计算装置、编译方法和神经网络软硬件协作系统
US20190349318A1 (en) * 2018-05-08 2019-11-14 The Board Of Trustees Of The Leland Stanford Junior University Methods and apparatus for serialized routing within a fractal node array
CN110568559A (zh) * 2019-07-24 2019-12-13 浙江大学 一种基于大规模光开关拓扑阵列的芯片架构
CN110705702A (zh) * 2019-09-29 2020-01-17 东南大学 一种动态可扩展的卷积神经网络加速器
CN111082949B (zh) * 2019-10-29 2022-01-28 广东工业大学 一种类脑计算机中脉冲数据包高效传输方法
CN110909869B (zh) * 2019-11-21 2022-08-23 浙江大学 一种基于脉冲神经网络的类脑计算芯片

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106201651A (zh) * 2016-06-27 2016-12-07 鄞州浙江清华长三角研究院创新中心 神经形态芯片的模拟器
US20180078193A1 (en) * 2016-09-16 2018-03-22 International Business Machines Corporation Flexible neural probes
CN110163016A (zh) * 2019-04-29 2019-08-23 清华大学 混合计算系统和混合计算方法
CN111159093A (zh) * 2019-11-25 2020-05-15 华东计算技术研究所(中国电子科技集团公司第三十二研究所) 异构智能计算系统

Also Published As

Publication number Publication date
CN112269751A (zh) 2021-01-26
CN112269751B (zh) 2022-08-23

Similar Documents

Publication Publication Date Title
WO2022099573A1 (zh) 一种面向亿级类脑计算机的芯片扩展方法
WO2022099559A1 (zh) 支持亿级神经元的类脑计算机
CN107454003B (zh) 一种可动态切换工作模式的片上网络路由器及方法
CN113011591A (zh) 一种用于多比特量子反馈控制的量子测控系统
Wu et al. A multicast routing scheme for a universal spiking neural network architecture
Biswas et al. Accelerating tensorflow with adaptive rdma-based grpc
CN102866980B (zh) 用于多核微处理器片上互连网络的网络通信胞元
CN103106173A (zh) 多核处理器核间互联的方法
WO2021244168A1 (zh) 片上系统、数据传送方法及广播模块
CN114564434B (zh) 一种通用多核类脑处理器、加速卡及计算机设备
JP2023508791A (ja) マルチビット量子フィードバック制御のための量子測定・制御システム
Yang et al. SwitchAgg: A further step towards in-network computing
Duan et al. Research on double-layer networks-on-chip for inter-chiplet data switching on active interposers
Ouyang et al. URMP: using reconfigurable multicast path for NoC-based deep neural network accelerators
Rahman et al. Dynamic communication performance of a TESH network under the nonuniform traffic patterns
Rengasamy et al. Using packet information for efficient communication in NoCs
Matsumoto et al. Distributed Shared Memory Architecture for JUMP-1 a general-purpose MPP prototype
Diguet Power-gated MRAMs for Memory-Based Computing with improved broadcast capabilities
Jia et al. FACL: A Flexible and High-Performance ACL engine on FPGA-based SmartNIC
Hasan et al. Routing bandwidth model for feed forward neural networks on multicore neuromorphic architectures
CN215186814U (zh) 一种片上网络
Shen et al. PN-TMS: Pruned Node-fusion Tree-based Multicast Scheme for Efficient Neuromorphic Systems
Reddy et al. An Efficient Fault Tolerant Routing Interconnect System for Neural NOC
Reddy et al. An Efficient Interconnection System for Neural NOC Using Fault Tolerant Routing Method
Fang et al. Exploration on routing configuration of HNoC with reasonable energy consumption

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961130

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20961130

Country of ref document: EP

Kind code of ref document: A1