WO2020156177A1 - 一种可重构的处理器架构及计算设备 - Google Patents

一种可重构的处理器架构及计算设备 Download PDF

Info

Publication number
WO2020156177A1
WO2020156177A1 PCT/CN2020/072257 CN2020072257W WO2020156177A1 WO 2020156177 A1 WO2020156177 A1 WO 2020156177A1 CN 2020072257 W CN2020072257 W CN 2020072257W WO 2020156177 A1 WO2020156177 A1 WO 2020156177A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage
units
computing
interface
storage unit
Prior art date
Application number
PCT/CN2020/072257
Other languages
English (en)
French (fr)
Inventor
祝夭龙
何伟
冯杰
Original Assignee
北京灵汐科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京灵汐科技有限公司 filed Critical 北京灵汐科技有限公司
Publication of WO2020156177A1 publication Critical patent/WO2020156177A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0631Configuration or reconfiguration of storage systems by allocating resources to storage systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines

Definitions

  • the present invention relates to the technical field of processors, in particular to a reconfigurable processor architecture and computing equipment.
  • the present invention provides a reconfigurable processor architecture and computing device that overcomes the above problems or at least partially solves the above problems.
  • a reconfigurable processor architecture which is characterized in that it includes:
  • the control component coupled between the storage unit and the computing unit is used to control the working mode of the storage unit and/or the access mode of the computing unit to the storage unit. Configure the working mode of the storage unit and the access authority of the computing unit to the storage unit, effectively improving the access rate of the storage unit and the computing power utilization rate of the computing unit.
  • each of the plurality of storage units has an independent first interface, and the computing unit independently accesses the storage unit corresponding to the first interface through the first interface ;
  • the multiple storage units as a whole have a unified second interface, and the computing unit uniformly accesses the multiple storage units through the second interface;
  • the multiple storage units are divided into multiple storage groups, and each of the multiple storage groups includes at least one storage unit; wherein each storage group has a third Interface, through the third interface to access the storage group corresponding to the third interface.
  • a plurality of the computing units may simultaneously access the storage units corresponding to the computing units.
  • the interface width of the second interface is a width after the interfaces of the multiple storage units are spliced in parallel; or any one not less than the interface width corresponding to the storage unit with the smallest interface width among the multiple storage units.
  • the interface width of the third interface is the width of the parallel splicing of interfaces of all storage units in the storage group corresponding to the third interface; or any storage unit that is not less than the smallest interface width among all the storage units The corresponding interface width.
  • the access mode of the calculation unit to the storage unit includes:
  • the multiple computing units access the multiple storage units in a one-to-one correspondence;
  • Cross-correspondence mode where the multiple computing units cross-correspond to access the multiple storage units
  • any one of the multiple computing units accesses any one of the multiple storage units.
  • bit width and number of storage units and computing units are equal, different computing units can reuse some storage units, which effectively improves the utilization of storage units and reduces the difficulty of accessing parameters and data.
  • the parameters are split or copied to independent storage units.
  • a plurality of the calculation units may simultaneously access the storage units respectively corresponding to the calculation units.
  • only one storage unit can be accessed at the same time.
  • the access mode of the calculation unit to the storage unit includes:
  • each computing unit accesses at least one storage unit with the same bit width
  • At least one composite storage unit is generated based on the first preset number of storage units; each calculation unit accesses the storage unit or composite storage unit with the same bit width;
  • At least one composite storage unit is generated based on the second preset number of storage units, and at least one combined access interface is generated by combining the third preset number of computing units; each of the computing units is accessed through its own access interface or combined access
  • the interface accesses at least one storage unit or composite storage unit with the same bit width as the access interface or the combined access interface. That is to say, the control component can reconstruct the storage unit and the access interface of the computing unit according to the bit width of each storage unit and the access interface of the computing unit, so that multiple storage units can be reconstructed into composite storage units of different sizes. , The interface of the storage unit can be reconstructed into different bit widths. After the access interface of the storage unit is reconstructed, different storage units can be accessed in parallel, which greatly increases the memory access bandwidth and can increase the computing power utilization rate of the computing unit.
  • a computing device including a processor, characterized in that:
  • the architecture of the processor is the aforementioned reconfigurable processor architecture, which is used to run a computer program.
  • the computing device further includes:
  • the storage device is used to store a computer program, which is loaded and executed by the processor when the computer program is running in the computing device.
  • the embodiment of the present invention provides a local shared storage architecture, which controls the working mode of the local storage unit and the access mode of the computing unit to the storage unit through a control component.
  • a control register can be set in the control component, and then the working mode of the storage unit and the access authority of the computing unit to the storage unit can be reasonably configured, and the access rate of the storage unit and the computing power utilization rate of the computing unit can be effectively improved.
  • Figure 1 shows a schematic diagram of a traditional multi-processing unit integrating storage and processing
  • Figure 2 shows a schematic diagram of a reconfigurable processor architecture according to an embodiment of the present invention
  • Fig. 3 shows a schematic diagram of a processor architecture according to the first embodiment of the present invention
  • FIG. 5 shows a schematic diagram of a processor architecture according to the second embodiment of the present invention.
  • 6A-C respectively show schematic diagrams of reconfigurable logic according to the second embodiment of the present invention.
  • An effective processor design method to improve the computing power and efficiency of the chip is to adopt a storage-processing integrated many-core architecture.
  • Storage and processing integration that is, storage and processing functions are in the same core to realize storage localization, which can greatly save data energy consumption and improve computing efficiency.
  • FIG. 1 shows a schematic diagram of multiple processing units integrating storage and processing.
  • C 1 and CN represent processing cores, which are typical representatives of multiple processing cores; each processing core has an independent computing unit (PU, Processing Unit) and storage unit (Mem, Memory).
  • the CN processing core contains a computing unit PU N and a storage unit Mem N.
  • each core calls its own independent parameters and data in Mem. This solution realizes the local storage core calling of data, which can greatly save energy consumption in data transportation compared with off-chip storage.
  • Mem The capacity of Mem is fixed. In actual use, it cannot be adjusted according to application needs, which will cause a decrease in Mem utilization or insufficient capacity;
  • each core can only read and write its own Mem, but cannot share the Mem of other cores;
  • Mem such as Cache
  • Mem such as a many-core chip with integrated storage and computing
  • Fig. 2 shows a schematic diagram of a reconfigurable processor architecture according to an embodiment of the present invention.
  • the reconfigurable processor architecture provided by an embodiment of the present invention may include:
  • a plurality of storage units 210 (that is, storage unit 1 to storage unit N), for storing data;
  • a plurality of calculation units 220 (that is, calculation unit 1 to calculation unit N) are used to access data stored in the storage unit 210 and perform calculation processing on the data;
  • the control component 230 coupled between the storage unit 210 and the calculation unit 220 is used to control the working mode of the multiple storage units 210 and/or the multiple calculation units 220 to the multiple storage units 210 Access mode.
  • the number of storage units 210, the storage capacity and bit width of each storage unit 210 can be set according to different requirements, and the number and bit width of computing units 220 can also be set according to different requirements. The present invention Not limited.
  • the embodiment of the present invention provides a reconfigurable processor architecture, which controls the operating mode of the local storage unit 210 and the access mode of the storage unit 210 by the computing unit 220 through the control component 230 (Controller).
  • a control register can be set in the control component to configure the working mode of the storage unit 210 and the access authority of the computing unit 220 to the storage unit 210, effectively improving the access rate of the storage unit and the computing power utilization of the computing unit.
  • the processor architecture of the present invention is a many-core architecture, and the processor architecture includes multiple cores.
  • the structure of the multiple cores may be, for example: each of the multiple cores includes a computing unit 1-N and a storage unit 1. -N and control components; it can also be: multiple cores including computing cores and storage cores, where the computing cores include computing units, and the storage cores include at least one storage unit and a control unit; regarding multiple cores in the processor architecture
  • the specific structure is not limited in the present invention, as long as the working mode of the storage unit and the access mode of the computing unit to the storage unit can be controlled.
  • the working modes of the multiple storage units 210 may include:
  • each storage unit 210 in the multiple storage units has an independent first interface, and the computing unit 220 can independently access the storage unit corresponding to the first interface through each first interface; in this mode, multiple The computing units can access their corresponding storage units at the same time.
  • the storage unit 210 can be accessed through the first interface of any storage unit 210, and the corresponding access relationship between the multiple computing units 220 and the multiple storage units 210 can be passed through according to non-passing requirements.
  • the setting of the control component 230 is not limited in the present invention. However, it should be noted that one storage unit 210 can only be accessed by one computing unit 220 at a time.
  • the multiple storage units 210 as a whole have a unified second interface, and the computing unit 220 uniformly accesses the multiple storage units through the second interface; all the storage units 210 are a whole and can have a unified address allocation rule.
  • all storage units 210 have only one interface.
  • the computing unit 220 can access all the storage units 210 through this interface. Any one computing unit 220 can access the multiple storage units 210 through the second interface.
  • the calculation requirements are set. Wherein, only one computing unit 220 can access the multiple storage units 210 at a time.
  • the interface width of the second interface is the width after the interfaces of the multiple storage units are spliced in parallel; or any one not less than the interface width corresponding to the storage unit with the smallest interface width among the multiple storage units.
  • the multiple storage units 210 are divided into multiple storage groups, and each of the multiple storage groups includes at least one storage unit; wherein, each storage group has a third interface, any One computing unit 220 can access the corresponding storage group through the third interface of each storage group. That is, all storage units are divided into several storage groups, and each storage group is composed of at least one storage unit 210, wherein the interface width of the third interface of each storage group is equal to that of all the storage groups in the storage group corresponding to the third interface. The width of the parallel splicing of the interfaces of the storage units; or any one not less than the interface width corresponding to the storage unit with the smallest interface width among all the storage units.
  • the storage capacity of each storage group is the sum of the storage capacity of the storage units included in the storage group.
  • control component 230 coupled between the multiple storage units 210 and the multiple computing units 220 can be used to control the working mode of the multiple storage units 210, and can also control the multiple computing units 220 to perform the operation of multiple storage units. Access mode of unit 210.
  • the number and bit width of the storage unit 210 and the number and bit width of the calculation unit 220 may be equal or unequal. The following description will be based on two cases of equal and unequal.
  • bit width and number of the multiple storage units 210 and the multiple calculation units 220 are equal.
  • Fig. 3 shows a schematic diagram of a processor architecture according to the first embodiment of the present invention.
  • the processor architecture in this embodiment may include a computing unit 1, a computing unit 2, a storage unit 1, and a storage unit 2, and are coupled with the computing unit 1, computing unit 2, storage unit 1, and storage unit 2.
  • Control components are included in the bus bit widths of the calculation unit 1 and the calculation unit 2 are all 64 bits, and the data bit widths of the storage unit 1 and the storage unit 2 are also 64 bits.
  • control component controls the access mode of the computing unit to multiple storage units
  • specific configuration can be as follows:
  • each calculation unit of multiple calculation units corresponds to a storage unit.
  • each calculation unit of multiple calculation units can work at the same time, that is, the storage corresponding to each calculation unit can be accessed at the same time unit.
  • the computing unit 1 only accesses the storage unit 1, and the computing unit 2 only accesses the storage unit 2, and the computing unit 2 can access the storage unit 2 while the computing unit 1 accesses the storage unit 1.
  • Cross-correspondence mode in which multiple computing units cross-correspond to access the multiple storage units.
  • multiple computing units can simultaneously access the corresponding storage units.
  • computing unit 1 only accesses storage unit 2
  • computing unit 2 only accesses storage unit 1
  • computing unit 1 accesses storage unit 2
  • computing unit 2 can also access storage unit 1 at the same time
  • each of the multiple computing units accesses any one of the multiple storage units.
  • both computing unit 1 and computing unit 2 can access both storage unit 1 and storage unit 2, but at the same time, only one storage unit can be accessed.
  • bit widths and numbers of the multiple storage units 210 and the multiple calculation units 220 are not equal.
  • Fig. 5 shows a schematic diagram of a processor architecture according to the second embodiment of the present invention.
  • the processor architecture in this embodiment may include three computing units: computing unit 1, computing unit 2, computing unit 3, and four storage units: storage unit 1, storage unit 2, storage unit 3, and storage unit 4.
  • a control unit coupled with the above-mentioned calculation unit 1-3 and storage unit 1-4.
  • the bus bit width of computing unit 1 and computing unit 2 are both 64 bits
  • the bus bit width of computing unit 3 is 128 bits
  • the data bit width of storage unit 1, storage unit 2 and storage unit 3 are all 64 bits
  • the data bit width of storage unit 4 is 64 bits.
  • the data bit width is 128bits.
  • control component controls the access mode of the computing unit to multiple storage units
  • specific configuration can be as follows:
  • each computing unit accesses at least one storage unit with the same bit width.
  • computing unit 1 only accesses storage unit 1
  • computing unit 2 can access storage unit 2 or storage unit 3
  • computing unit 3 only accesses storage unit 4
  • three computing units can access their corresponding storage units at the same time .
  • the implementation shown in FIG. 6A is only an exemplary implementation. In practical applications, the calculation unit 1 and the calculation unit 2 can be selected from the storage unit 1, the storage unit 2, and the storage unit 3 for unit access. .
  • the second access mode generating at least one composite storage unit based on the first preset number of storage units; each of the computing units accesses the storage unit or composite storage unit with the same bit width.
  • both computing unit 1 and computing unit 2 can access storage unit 1; storage unit 2 and storage unit 3 will be combined into a composite storage unit with an interface width of 128 bits, which can be accessed by computing unit 3; , The computing unit 3 can access the storage unit 4, and can also access the composite storage unit composed of the storage unit 2 and the storage unit 3.
  • the third access mode generating at least one composite storage unit based on the second preset number of storage units, and combining the third preset number of computing units to generate at least one combined access interface; each of the computing units uses its own access interface or The combined access interface accesses at least one storage unit or composite storage unit with the same bit width as the access interface or the combined access interface.
  • the calculation unit 1 and the calculation unit C are combined into a 128-bit wide access interface, and the storage unit 1 and the storage unit 2 are combined into a composite storage unit with an interface width of 128 bits; in this way, the calculation unit 3 can be It is configured to access the composite storage unit composed of storage unit 1 and storage unit 2.
  • the storage unit 4 is accessed, and the 128bits bit width data read from the storage unit 4 will be divided into two 64bits. Wide data are allocated to calculation unit 1 and calculation unit 2 respectively.
  • an embodiment of the present invention also provides a computing device, including a processor, the architecture of the processor is the aforementioned reconfigurable processor architecture, and is used to run a computer program.
  • the computing device further includes: a storage device, configured to store a computer program, and the computer program is loaded and executed by the processor when the computer program runs in the computing device.
  • the embodiment of the present invention provides a local shared storage architecture, which controls the working mode of the local storage unit and the access mode of the computing unit to the storage unit through a control component.
  • the control component can also reconstruct the access interface of the storage unit and the computing unit according to the bit width of each storage unit and the access interface of the computing unit, so that multiple storage units can be reconstructed into composite storage units of different sizes.
  • the interface of the storage unit can be reconstructed into different bit widths.
  • different storage units can be accessed in parallel, which greatly increases the memory access bandwidth and can increase the computing power utilization rate of the computing unit; moreover, different computing units can reuse some storage units, This effectively improves the utilization of storage units; at the same time, different computing units can reuse some storage units, which can reduce the difficulty of accessing parameters and data, without the need to split or copy the same parameters to independent storage units. in.
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Software Systems (AREA)
  • Logic Circuits (AREA)
  • Advance Control (AREA)

Abstract

提供了一种可重构的处理器架构及计算设备,所述可重构的处理器架构包括:多个存储单元,用于存储数据;多个计算单元,用于访问所述存储单元中存储的数据,并对所述数据进行计算处理;同时与所述存储单元和计算单元耦合的控制部件,用于控制所述多个存储单元的工作模式和/或所述多个计算单元对所述多个存储单元的访问模式。基于本发明的通过处理器架构在控制部件中设置控制寄存器,进而合理配置存储单元的工作模式以及计算单元对存储单元的访问权限,有效提升存储单元的访问率以及计算单元的算力利用率。

Description

一种可重构的处理器架构及计算设备 技术领域
本发明涉及处理器技术领域,特别是涉及一种可重构的处理器架构及计算设备。
背景技术
当今时代,人工智能技术日新月异、澎湃发展,从各方面影响着人们的生产和生活,推动着世界的发展和进步。近几年,研究者们发现神经网络算法对处理非结构化数据非常有效,比如人脸识别、语音识别、图像分类等任务。随着这些非结构化数据的指数级增长,对处理器算力的要求越来越高。传统的中央处理器CPU和数字信号处理器DSP的算力已经不能满足需求,因此,如何提升处理器的算力和效率是亟待解决的问题。
发明内容
鉴于上述问题,本发明提供了一种克服上述问题或至少部分地解决了上述问题的一种可重构的处理器架构及计算设备。
根据本发明的一个方面,提供了一种可重构的处理器架构,其特征在于,包括:
多个存储单元,用于存储数据;
多个计算单元,用于访问所述存储单元中存储的数据,并对所述数据进行计算处理;
耦合于所述存储单元和所述计算单元之间的控制部件,用于控制所述存储单元的工作模式和/或所述计算单元对所述存储单元的访问模式。配置存储单元的工作模式以及计算单元对存储单元的访问权限,有效提升存储单元的访问率以及计算单元的算力利用率。
可选地,独立工作模式,所述多个存储单元中的每个所述存储单元具备独立的第一接口,所述计算单元通过所述第一接口独立访问所述第一接口对应的存储单元;
整体工作模式,所述多个存储单元作为一个整体具有统一的第二接口,所 述计算单元通过所述第二接口统一访问所述多个存储单元;
组合工作模式,所述多个存储单元被划分为多个存储组,所述多个存储组的每个所述存储组至少包括一个所述存储单元;其中,每个所述存储组具有第三接口,通过所述第三接口访问所述第三接口对应的存储组。
可选地,在所述独立工作模式下,多个所述计算单元可同时访问所述计算单元分别对应的存储单元。
可选地,所述第二接口的接口宽度为所述多个存储单元的接口并行拼接后的宽度;或任一不小于所述多个存储单元中接口宽度最小的存储单元对应的接口宽度。
可选地,所述第三接口的接口宽度为所述第三接口对应的存储组中所有存储单元的接口并行拼接的宽度;或任一不小于所述所有存储单元中接口宽度最小的存储单元对应的接口宽度。
可选地,所述多个存储单元和所述多个计算单元的位宽及数量对等;所述计算单元对所述存储单元的访问模式包括:
一一对应模式,所述多个计算单元一一对应访问所述多个存储单元;
交叉对应模式,所述多个计算单元交叉对应访问所述多个存储单元;
多选一模式,所述多个计算单元中任意所述计算单元访问所述多个存储单元中的任一存储单元。在存储单元和计算单元的位宽及数量对等时,不同计算单元可复用某些存储单元,从而有效提升了存储单元的利用率,可以降低参数和数据的访问难度,不需要将相同的参数切分或者复制到独立的各个独立的存储单元中。
可选地,在所述一一对应模式和/或所述交叉对应模式下,多个所述计算单元可同时访问所述计算单元分别对应的存储单元。
可选地,在所述多选一模式下,同一时刻只有一个存储单元可被访问。
可选地,所述多个存储单元和所述多个计算单元的位宽及数量不对等;所述计算单元对所述存储单元的访问模式包括:
第一访问模式,各所述计算单元访问至少一个与其位宽相等的存储单元;
第二访问模式,基于第一预设数量存储单元生成至少一个复合存储单元;各所述计算单元访问与其位宽相等的存储单元或复合存储单元;
第三访问模式,基于第二预设数量的存储单元生成至少一个复合存储单元,合并第三预设数量的计算单元生成至少一个合并访问接口;各所述计算单元通过各自的访问接口或合并访问接口访问至少一个与所述访问接口或合并访问接 口等位宽的存储单元或复合存储单元。也就是说,控制部件可以根据各存储单元的位宽以及计算单元的访问接口的位宽对存储单元以及计算单元的访问接口进行重构,将多个存储单元可重构成不同大小的复合存储单元,存储单元的接口可重构成不同的位宽。对存储单元的访问接口重构后,不同的存储单元可以被并行访问,从而大大增加了访存带宽,能增加计算单元的算力利用率。
根据本发明的另一方面,还提供了一种计算设备,包括处理器,其特征在于,
所述处理器的架构为上述所述的可重构的处理器架构,用于运行计算机程序。
可选地,所述计算设备还包括:
存储设备,用于存储计算机程序,所述计算机程序在所述计算设备中运行时由处理器加载并执行。
本发明实施例提供了一种局部共享存储架构,通过控制部件对局部的存储单元的工作模式以及计算单元对存储单元的访问模式进行控制。具体地,可在控制部件中设置控制寄存器,进而合理配置存储单元的工作模式以及计算单元对存储单元的访问权限,有效提升存储单元的访问率以及计算单元的算力利用率。
上述说明仅是本发明技术方案的概述,为了能够更清楚了解本发明的技术手段,而可依照说明书的内容予以实施,并且为了让本发明的上述和其它目的、特征和优点能够更明显易懂,以下特举本发明的具体实施方式。
根据下文结合附图对本发明具体实施例的详细描述,本领域技术人员将会更加明了本发明的上述以及其他目的、优点和特征。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本发明的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了传统的存储处理一体的众处理单元示意图;
图2示出了根据本发明实施例的可重构的处理器架构示意图;
图3示出了根据本发明实施例一的处理器架构示意图;
图4A-C分别示出了根据本发明实施例一的可重构逻辑示意图;
图5示出了根据本发明实施例二的处理器架构示意图;
图6A-C分别示出了根据本发明实施例二的可重构逻辑示意图。
具体实施方式
下面将参照附图更详细地描述本公开的示例性实施例。虽然附图中显示了本公开的示例性实施例,然而应当理解,可以以各种形式实现本公开而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本公开,并且能够将本公开的范围完整的传达给本领域的技术人员。
提升芯片算力和效率的一种行之有效的处理器设计方法为采用存储处理一体众核架构。存储处理一体化,即将存储功能与处理功能在同一个核中,实现存储本地化,这样可极大的节省数据的能耗,提升运算效率。
图1示出了存储处理一体的众处理单元示意图,如图1所示,C 1、C N代表处理核,是众处理核的典型代表;每个处理核中拥有独立的计算单元(PU,Processing Unit)和存储单元(Mem,Memory),C N处理核中含有计算单元PU N和存储单元Mem N。在计算过程中,各核调用自己独立的Mem中的参数以及数据。该方案实现了数据的本地存储核调用,相对于片外存储,可极大的节省数据搬运中的能耗。
另一方面,该方案也存在一定的局限,比如:
1、Mem的容量大小是固定的,在实际使用过程中,无法根据应用需要作出调整,会造成Mem利用率下降或者容量的不足;
2、Mem的接口宽度是固定的,在实际使用过程中,无法根据应用需要作出调整,会造成Mem访问带宽的不足或者浪费;
3、在计算过程中,各核只能读写自己的Mem,而不能共享其他核的Mem;
4、每个核因为Mem受限,所以在完成比较复杂的任务时,效率会降低。
在实际运行过程中,多核或者众核芯片在做某些计算时,有时希望访问共同的Mem(例如Cache),有时又希望各核分别访问不同的Mem(例如存算一体的众核芯片),以达到既能高效利用Mem,又能根据配置改变Mem块的个数核存储量的大小。
图2示出了根据本发明实施例的可重构的处理器架构示意图,参见图2可知,本发明实施例提供的可重构的处理器架构可以包括:
多个存储单元210(即存储单元1~存储单元N),用于存储数据;
多个计算单元220(即计算单元1~计算单元N),用于访问存储单元210 中存储的数据,并对所述数据进行计算处理;
耦合于所述存储单元210和计算单元220之间的控制部件230,用于控制所述多个存储单元210的工作模式和/或所述多个计算单元220对所述多个存储单元210的访问模式。本发明实施例中,存储单元210的数量、每个存储单元210的存储量以及位宽可根据不同的需求进行设置,计算单元220的数量以及位宽同样可根据不同的需求进行设置,本发明不做限定。
本发明实施例提供了一种可重构的处理器架构,通过控制部件230(Controller)对局部的存储单元210的工作模式以及计算单元220对存储单元210的访问模式进行控制。具体地,可在控制部件中设置控制寄存器,进而配置存储单元210的工作模式以及计算单元220对存储单元210的访问权限,有效提升存储单元的访问率以及计算单元的算力利用率。
其中,本发明的处理器架构为众核架构,处理器架构中包括多个核,该多个核的结构例如可以为:多个核中的每个核包括计算单元1-N、存储单元1-N和控制部件;还可以为:多个核包括计算核和存储核,其中,计算核中包括计算单元,存储核中包括至少一个存储单元和控制单元;关于处理器架构中多个核的具体结构在本发明中不做限定,只要能够实现对存储单元的工作模式以及计算单元对存储单元的访问模式的控制即可。
可选地,本发明实施例中,多个存储单元210的工作模式可以包括:
独立工作模式,多个存储单元中的各存储单元210各自具备独立的第一接口,计算单元220可通过各第一接口独立访问所述第一接口对应的存储单元;在该模式下,多个计算单元可同时访问各自对应的存储单元。其中,对于任意一个计算单元220来讲,均可通过任意一个存储单元210的第一接口访问该存储单元210,多个计算单元220和多个存储单元210的对应访问关系可根据不通过需求通过控制部件230进行设置,本发明不做限定。但是需要说明的时,一个存储单元210在同一时刻只能被一个计算单元220访问。
整体工作模式,多个存储单元210作为一个整体具有统一的第二接口,计算单元220通过第二接口统一访问所述多个存储单元;所有的存储单元210为一个整体,可具有统一的地址分配规则。但所有的存储单元210只有一个接口,计算单元220通过此接口可以统一访问所有的存储单元210,任意一个计算单元220均可通过该第二接口访问所述多个存储单元210,具体可根据不同的计算需求进行设置。其中,同一时刻只能有一个计算单元220访问所述多个存储单元210。第二接口的接口宽度为多个存储单元的接口并行拼接后的宽度;或 任一不小于多个存储单元中接口宽度最小的存储单元对应的接口宽度。
组合工作模式,多个存储单元210被划分为多个存储组,所述多个存储组的每个所述存储组至少包括一个存储单元;其中,每个所述存储组具有第三接口,任意一个所述计算单元220均可通过各存储组的第三接口访问对应的存储组。即所有的存储单元分成若干个存储组,每一个存储组由至少一个存储单元210组合而成,其中,每个存储组的第三接口的接口宽度为所述第三接口对应的存储组中所有存储单元的接口并行拼接的宽度;或任一不小于所述所有存储单元中接口宽度最小的存储单元对应的接口宽度。各存储组的存储量为该存储组中包括的存储单元的存储量之和。
上文提及,耦合于多个存储单元210和多个计算单元220之间的控制部件230可用于控制多个存储单元210的工作模式之外,还可以控制多个计算单元220对多个存储单元210的访问模式。本实施例中,存储单元210的数量、位宽与计算单元220的数量、位宽可以对等,也可以不对等,以下将分别基于对等和不对等的两种情况进行说明。
一、多个存储单元210和多个计算单元220的位宽及数量对等。
图3示出了根据本发明实施例一的处理器架构示意图。参见图3可知,本实施例中的处理器架构可包括计算单元1、计算单元2,存储单元1、存储单元2,以及与计算单元1、计算单元2、存储单元1、存储单元2均耦合的控制部件。其中,计算单元1、计算单元2的总线位宽都是64bits,存储单元1、存储单元2的数据位宽也都是64bits。
控制部件控制计算单元对多个存储单元的访问模式时,具体配置可以如下:
1.一一对应模式,多个计算单元的每个计算单元对应一个存储单元,在该模式下,多个计算单元的每个计算单元可同时工作,即可以同时访问每个计算单元对应的存储单元。
如图4A所示,计算单元1只访问存储单元1,计算单元2只访问存储单元2,而且计算单元1访问存储单元1的同时计算单元2可以访问存储单元2。
2.交叉对应模式,多个计算单元交叉对应访问所述多个存储单元,在该模式下,多个计算单元可同时访问对应的存储单元。
如图4B所示,计算单元1只访问存储单元2,计算单元2只访问存储单元1,而且计算单元1在访问存储单元2的同时,计算单元2也可以同时访问存储单元1;
3.多选一模式,多个计算单元中各计算单元访问多个存储单元中的任一存 储单元。
如图4C所示,计算单元1和计算单元2都能既访问存储单元1,又能访问存储单元2,但是同一时刻,只能有一个存储单元被访问。
上述多种工作模式可以在工作中根据需要进行配置并自由切换,本发明不做限定。
二、多个存储单元210和多个计算单元220的位宽及数量不对等。
图5示出了根据本发明实施例二的处理器架构示意图。参见图5可知,本实施例中的处理器架构可包括计算单元1、计算单元2、计算单元3三个计算单元,存储单元1、存储单元2、存储单元3、存储单元4四个存储单元,与上述计算单元1-3和存储单元1-4均耦合的控制部件。其中,计算单元1和计算单元2的总线位宽都是64bits,计算单元3的总线位宽是128bits,存储单元1、存储单元2和存储单元3的数据位宽都是64bits,存储单元4的数据位宽是128bits。
控制部件控制计算单元对多个存储单元的访问模式时,具体配置可以如下:
1.第一访问模式,各所述计算单元访问至少一个与其位宽相等的存储单元。
如图6A所示,计算单元1只访问存储单元1,计算单元2可以访问存储单元2或存储单元3,计算单元3只访问存储单元4,而且三个计算单元可以同时访问其对应的存储单元。需要说明的是,图6A所示只是示例性的一种实现方案,实际应用中,计算单元1和计算单元2可分别在存储单元1、存储单元2、存储单元3中任意选取一个进行单元访问。
2.第二访问模式,基于第一预设数量存储单元生成至少一个复合存储单元;各所述计算单元访问与其位宽相等的存储单元或复合存储单元。
如图6B所示,计算单元1和计算单元2都能访问存储单元1;存储单元2和存储单元3会组合成一个复合存储单元,其接口位宽为128bits,可以供计算单元3访问;这样,计算单元3可访问存储单元4,也能访问存储单元2和存储单元3组成的复合存储单元。
3.第三访问模式,基于第二预设数量的存储单元生成至少一个复合存储单元,合并第三预设数量的计算单元生成至少一个合并访问接口;各所述计算单元通过各自的访问接口或合并访问接口访问至少一个与所述访问接口或合并访问接口等位宽的存储单元或复合存储单元。
如图6C所示,计算单元1和计算单元C合并成128位宽的访问接口,存储 单元1和存储单元2组合成一个复合存储单元,其接口位宽为128bits;这样,计算单元3可被配置成访问存储单元1和存储单元2组成的复合存储单元,计算单元1和计算单元2合并后访问存储单元4,存储单元4中读出的128bits的位宽数据将被分成两个64bits的位宽数据,分别分配给计算单元1和计算单元2。
上述多种工作模式可以在工作中根据需要进行配置并自由切换,本发明不做限定。
基于同一发明构思,本发明实施例还提供了一种计算设备,包括处理器,所述处理器的架构为上述所述的可重构的处理器架构,用于运行计算机程序。在本发明一可选实施例中,所述计算设备还包括:存储设备,用于存储计算机程序,所述计算机程序在所述计算设备中运行时由处理器加载并执行。根据上述任意一个可选实施例或多个可选实施例的组合,本发明实施例能够达到如下有益效果:
本发明实施例提供了一种局部共享存储架构,通过控制部件对局部的存储单元的工作模式以及计算单元对存储单元的访问模式进行控制。另外,控制部件还可以根据各存储单元的位宽以及计算单元的访问接口的位宽对存储单元以及计算单元的访问接口进行重构,将多个存储单元可重构成不同大小的复合存储单元,存储单元的接口可重构成不同的位宽。对存储单元的访问接口重构后,不同的存储单元可以被并行访问,从而大大增加了访存带宽,能增加计算单元的算力利用率;而且,不同计算单元可复用某些存储单元,从而有效提升了存储单元的利用率;同时,不同计算单元可复用某些存储单元,可以降低参数和数据的访问难度,不需要将相同的参数切分或者复制到独立的各个独立的存储单元中。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本发明的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本公开并帮助理解各个发明方面中的一个或多个,在上面对本发明的示例性实施例的描述中,本发明的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本发明要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式 的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本发明的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本发明的范围之内并且形成不同的实施例。例如,在权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
应该注意的是上述实施例对本发明进行说明而不是对本发明进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本发明可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。
至此,本领域技术人员应认识到,虽然本文已详尽示出和描述了本发明的多个示例性实施例,但是,在不脱离本发明精神和范围的情况下,仍可根据本发明公开的内容直接确定或推导出符合本发明原理的许多其他变型或修改。因此,本发明的范围应被理解和认定为覆盖了所有这些其他变型或修改。

Claims (11)

  1. 一种可重构的处理器架构,其特征在于,包括:
    多个存储单元,用于存储数据;
    多个计算单元,用于访问所述存储单元中存储的数据,并对所述数据进行计算处理;
    耦合于所述存储单元和所述计算单元之间的控制部件,用于控制所述存储单元的工作模式和/或所述计算单元对所述存储单元的访问模式。
  2. 根据权利要求1所述的处理器架构,其特征在于,所述存储单元的工作模式包括:
    独立工作模式,所述多个存储单元中的每个所述存储单元具备独立的第一接口,所述计算单元通过所述第一接口独立访问所述第一接口对应的存储单元;
    整体工作模式,所述多个存储单元作为一个整体具有统一的第二接口,所述计算单元通过所述第二接口统一访问所述多个存储单元;
    组合工作模式,所述多个存储单元被划分为多个存储组,所述多个存储组的每个所述存储组至少包括一个所述存储单元;其中,每个所述存储组具有第三接口,所述计算单元通过所述第三接口访问所述第三接口对应的存储组。
  3. 根据权利要求2所述的处理器架构,其特征在于,在所述独立工作模式下,多个所述计算单元可同时访问所述计算单元分别对应的存储单元。
  4. 根据权利要求2所述的处理器架构,其特征在于,所述第二接口的接口宽度为所述多个存储单元的接口并行拼接后的宽度;或任一不小于所述多个存储单元中接口宽度最小的存储单元对应的接口宽度。
  5. 根据权利要求2所述的处理器架构,其特征在于,所述第三接口的接口宽度为所述第三接口对应的存储组中所有存储单元的接口并行拼接的宽度;或任一不小于所述所有存储单元中接口宽度最小的存储单元对应的接口宽度。
  6. 根据权利要求1-5任一项所述的处理器架构,其特征在于,所述多个存储单元和所述多个计算单元的位宽及数量对等;所述计算单元对所述存储单元的访问模式包括:
    一一对应模式,所述多个计算单元一一对应访问所述多个存储单元;
    交叉对应模式,所述多个计算单元交叉对应访问所述多个存储单元;
    多选一模式,所述多个计算单元中任意所述计算单元访问所述多个存储单元中的任一存储单元。
  7. 根据权利要求6所述的处理器架构,其特征在于,在所述一一对应模式和/或所述交叉对应模式下,多个所述计算单元可同时访问所述计算单元分别对应的存储单元。
  8. 根据权利要求6所述的处理器架构,其特征在于,在所述多选一模式下,同一时刻只有一个存储单元可被访问。
  9. 根据权利要求1-5任一项所述的处理器架构,其特征在于,所述多个存储单元和所述多个计算单元的位宽及数量不对等;所述计算单元对所述存储单元的访问模式包括:
    第一访问模式,各所述计算单元访问至少一个与其位宽相等的存储单元;
    第二访问模式,基于第一预设数量存储单元生成至少一个复合存储单元;各所述计算单元访问与其位宽相等的存储单元或复合存储单元;
    第三访问模式,基于第二预设数量的存储单元生成至少一个复合存储单元,合并第三预设数量的计算单元生成至少一个合并访问接口;各所述计算单元通过各自的访问接口或合并访问接口访问至少一个与所述访问接口或合并访问接口等位宽的存储单元或复合存储单元。
  10. 一种计算设备,包括处理器,其特征在于,
    所述处理器的架构为权利要求1-9任一项所述的可重构的处理器架构,用于运行计算机程序。
  11. 根据权利要求10所述的计算设备,其特征在于,所述计算设备还包括:
    存储设备,用于存储计算机程序,所述计算机程序在所述计算设备中运行时由处理器加载并执行。
PCT/CN2020/072257 2019-01-28 2020-01-15 一种可重构的处理器架构及计算设备 WO2020156177A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910078998.XA CN111488114B (zh) 2019-01-28 2019-01-28 一种可重构的处理器架构及计算设备
CN201910078998.X 2019-01-28

Publications (1)

Publication Number Publication Date
WO2020156177A1 true WO2020156177A1 (zh) 2020-08-06

Family

ID=71791357

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/072257 WO2020156177A1 (zh) 2019-01-28 2020-01-15 一种可重构的处理器架构及计算设备

Country Status (2)

Country Link
CN (1) CN111488114B (zh)
WO (1) WO2020156177A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837731A (zh) * 2020-12-31 2021-05-25 中国科学院上海微系统与信息技术研究所 存算复用的静态存储单元
CN112948300A (zh) * 2021-01-19 2021-06-11 浙江大华技术股份有限公司 服务器、存算一体设备以及服务器系统

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380169A (zh) * 2020-11-20 2021-02-19 北京灵汐科技有限公司 存储装置、数据处理方法、装置、设备、介质和系统
CN112732202B (zh) * 2021-03-30 2021-06-29 浙江力德仪器有限公司 一种数据存储系统
CN113032329B (zh) * 2021-05-21 2021-09-14 千芯半导体科技(北京)有限公司 基于可重构存算芯片的计算结构、硬件架构及计算方法
CN113656345B (zh) * 2021-09-03 2024-04-12 西安紫光国芯半导体有限公司 一种计算器件、计算系统及计算方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140331031A1 (en) * 2013-05-03 2014-11-06 Samsung Electronics Co., Ltd. Reconfigurable processor having constant storage register
CN104375805A (zh) * 2014-11-17 2015-02-25 天津大学 采用多核处理器仿真可重构处理器并行计算过程的方法
CN105930201A (zh) * 2016-04-25 2016-09-07 南京大学 一种可重构专用处理器核的功能模拟器

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2548412C (en) * 2003-12-08 2011-04-19 Qualcomm Incorporated High data rate interface with improved link synchronization
CN101599808B (zh) * 2008-06-03 2013-04-24 华为技术有限公司 一种交叉板测试方法及系统
US8571350B2 (en) * 2010-08-26 2013-10-29 Sony Corporation Image processing system with image alignment mechanism and method of operation thereof
CN105159611B (zh) * 2015-09-01 2018-04-06 南京伍安信息科技有限公司 一种具有数据抽取加密功能的微控制器芯片
CN105512088B (zh) * 2015-11-27 2018-08-10 中国电子科技集团公司第三十八研究所 一种可重构的处理器架构及其重构方法
US10649771B2 (en) * 2017-03-31 2020-05-12 Samsung Electronics Co., Ltd. Semiconductor device
US10795836B2 (en) * 2017-04-17 2020-10-06 Microsoft Technology Licensing, Llc Data processing performance enhancement for neural networks using a virtualized data iterator
US10360374B2 (en) * 2017-05-25 2019-07-23 Intel Corporation Techniques for control flow protection

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140331031A1 (en) * 2013-05-03 2014-11-06 Samsung Electronics Co., Ltd. Reconfigurable processor having constant storage register
CN104375805A (zh) * 2014-11-17 2015-02-25 天津大学 采用多核处理器仿真可重构处理器并行计算过程的方法
CN105930201A (zh) * 2016-04-25 2016-09-07 南京大学 一种可重构专用处理器核的功能模拟器

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIN GAO : "Prototype Design of Reconfigurable System and Technology Implementation of Dynamic Reconstruction", TECHNOLOGY INNOVATION AND APPLICATION, no. 15, 28 May 2016 (2016-05-28), pages 57 - 59, XP009522521, ISSN: 2095-2945 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837731A (zh) * 2020-12-31 2021-05-25 中国科学院上海微系统与信息技术研究所 存算复用的静态存储单元
CN112948300A (zh) * 2021-01-19 2021-06-11 浙江大华技术股份有限公司 服务器、存算一体设备以及服务器系统
CN112948300B (zh) * 2021-01-19 2023-02-10 浙江大华技术股份有限公司 服务器、存算一体设备以及服务器系统

Also Published As

Publication number Publication date
CN111488114B (zh) 2021-12-21
CN111488114A (zh) 2020-08-04

Similar Documents

Publication Publication Date Title
WO2020156177A1 (zh) 一种可重构的处理器架构及计算设备
US10705960B2 (en) Processors having virtually clustered cores and cache slices
TWI714803B (zh) 處理器及控制工作流的方法
US9244629B2 (en) Method and system for asymmetrical processing with managed data affinity
US9734056B2 (en) Cache structure and management method for use in implementing reconfigurable system configuration information storage
TWI574204B (zh) 對每一核心提供電壓及頻率控制之技術
US20200226080A1 (en) Solid state drive with external software execution to effect internal solid-state drive operations
CN105144082B (zh) 基于平台热以及功率预算约束,对于给定工作负荷的最佳逻辑处理器计数和类型选择
US20230169319A1 (en) Spatially sparse neural network accelerator for multi-dimension visual analytics
US20170132039A1 (en) Monitoring accesses of a thread to multiple memory controllers and selecting a thread processor for the thread based on the monitoring
KR20130010442A (ko) 가상 gpu
CN108885586B (zh) 用于以有保证的完成将数据取出到所指示的高速缓存层级的处理器、方法、系统和指令
CN104011621A (zh) 包括增强的基于温度的电压控制的用于能效和节能的方法、装置和系统
US20140143570A1 (en) Thread consolidation in processor cores
CN107017014B (zh) 用于低能量mcu的动态集装箱化系统存储器保护
KR20120082928A (ko) 내부 프로세서들을 구비한 메모리 및 메모리 액세스 제어 방법들
CN104011624A (zh) 包括设备中自主的基于硬件的深度掉电的用于能效和节能的方法、装置和系统
US20200301739A1 (en) Maximizing resource utilization of neural network computing system
CN111656339B (zh) 存储器装置及其控制方法
US5860101A (en) Scalable symmetric multiprocessor data-processing system with data allocation among private caches and segments of system memory
WO2021155669A1 (zh) 一种基于分布式权值存储加速神经网络计算的架构及方法
US8566532B2 (en) Management of multipurpose command queues in a multilevel cache hierarchy
US6038642A (en) Method and system for assigning cache memory utilization within a symmetric multiprocessor data-processing system
WO2020253383A1 (zh) 一种基于众核处理器的流式数据处理方法及计算设备
Min et al. NeuralHMC: An efficient HMC-based accelerator for deep neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20747568

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20747568

Country of ref document: EP

Kind code of ref document: A1