CN111340185A - Convolutional neural network acceleration method, system, terminal and storage medium - Google Patents

Convolutional neural network acceleration method, system, terminal and storage medium Download PDF

Info

Publication number
CN111340185A
CN111340185A CN202010094798.6A CN202010094798A CN111340185A CN 111340185 A CN111340185 A CN 111340185A CN 202010094798 A CN202010094798 A CN 202010094798A CN 111340185 A CN111340185 A CN 111340185A
Authority
CN
China
Prior art keywords
risc
core
neural network
convolutional neural
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010094798.6A
Other languages
Chinese (zh)
Inventor
邹晓峰
李拓
刘同强
周玉龙
王朝辉
李仁刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010094798.6A priority Critical patent/CN111340185A/en
Publication of CN111340185A publication Critical patent/CN111340185A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a convolutional neural network acceleration method, a system, a terminal and a storage medium, wherein the method comprises the following steps: generating a soft core of the RISC-V processor by using a source code generator; the RISC-V single core is constructed by setting an extended DMA (direct memory access) of the soft core of the RISC-V processor, a memory controller and a distributed memory module; constructing a multi-core acceleration array with a preset specification by using the RISC-V single core; and accessing the many-core acceleration array into a convolutional neural network system, wherein the convolutional neural network system comprises a main processor and convolutional neural network hardware. The invention can greatly improve the memory access bandwidth in the calculation process, reduce the memory access delay, simultaneously improve the calculation performance of the convolutional neural network and realize the calculation acceleration of the convolutional neural network.

Description

Convolutional neural network acceleration method, system, terminal and storage medium
Technical Field
The invention relates to the technical field of convolutional neural networks, in particular to a convolutional neural network acceleration method, a convolutional neural network acceleration system, a convolutional neural network acceleration terminal and a storage medium.
Background
With the advent of the big data era, mass data shows exponential explosive growth along with the improvement of computer performance, and various deep learning algorithms represented by a convolutional neural network are widely applied. However, based on the hierarchical and convolution calculation structures of the neural network, the huge calculation amount and parameters become the performance bottleneck of the convolution neural network, and especially, the large amount of parameter storage and memory access delay become the calculation bottleneck.
Disclosure of Invention
In view of the above-mentioned deficiencies of the prior art, the present invention provides a convolutional neural network acceleration method, system, terminal and storage medium, so as to solve the above-mentioned technical problems.
In a first aspect, the present invention provides a convolutional neural network acceleration method, including:
generating a soft core of the RISC-V processor by using a source code generator;
the RISC-V single core is constructed by setting an extended DMA (direct memory access) of the soft core of the RISC-V processor, a memory controller and a distributed memory module;
constructing a multi-core acceleration array with a preset specification by using the RISC-V single core;
and accessing the many-core acceleration array into a convolutional neural network system, wherein the convolutional neural network system comprises a main processor and convolutional neural network hardware.
Further, the generating of the soft core of the RISC-V processor by the source code generator includes:
generating parameter configuration by an open source RISC-V RockChip generator through a kernel;
and generating a soft core RTL source code of the RISC-V32-bit processor according to the parameter configuration.
Further, the method for constructing the RISC-V single core by setting the extended DMA, the memory controller and the distributed memory module of the soft core of the RISC-V processor comprises the following steps:
the method comprises the following steps that an AXI bus interface based on a soft core of the RISC-V processor extends a direct memory access module, a memory controller and a distributed memory module, wherein the direct memory access module is connected with convolutional neural network hardware.
Further, the method for constructing a multi-core acceleration array with a preset specification by using a RISC-V single core includes:
setting the number of RISC-V single cores of the many-core acceleration array according to the calculation amount requirement of the convolutional neural network;
and constructing a set number of RISC-V single cores to form a many-core acceleration array.
Further, the method further comprises:
generating a 64-bit RISC-V dual-core processor by utilizing an open source RISC-V tool chain;
adding a direct memory access module and a memory device to the RISC-V dual-core processor;
configuring a dual-core RISC-V system by utilizing the open source firmware and the Linux system in the RISC-V ecology;
and setting a RoCC conversion interface in the dual-core RISC-V system.
Further, the setting of the RoCC conversion interface in the dual-core RISC-V system includes:
generating a RoCC conversion interface by utilizing an open source RISC-V tool chain;
and respectively connecting the many-core acceleration array and the convolutional neural network hardware by using the RoCC conversion interface.
In a second aspect, the present invention provides a convolutional neural network acceleration system, including:
the system comprises a main processor, convolutional neural network hardware and a many-core acceleration array, wherein the main processor is in communication connection with the convolutional neural network; the many-core acceleration array is respectively interconnected with the main processor and the convolutional neural network hardware;
the many-core acceleration array comprises a plurality of RISC-V single cores, a RISC-V single core RISC-V32-bit processor, a direct memory access module, a memory controller and a distributed memory module.
Further, the many-core acceleration array is interconnected with the main processor through a RoCC conversion interface; the many-core acceleration array is interconnected with the convolutional neural network through the direct memory access module.
In a third aspect, a terminal is provided, including:
a processor, a memory, wherein,
the memory is used for storing a computer program which,
the processor is used for calling and running the computer program from the memory so as to make the terminal execute the method of the terminal.
In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.
The beneficial effect of the invention is that,
the convolutional neural network acceleration method, the system, the terminal and the storage medium provided by the invention have the advantages that the many-core acceleration sequence based on the RISC-V many-core framework is constructed, the many-core acceleration sequence is accessed into the convolutional neural network system, the concurrent access of the parameters of the convolutional calculation in the convolutional neural network is realized in a parallel access mode, and the high-speed parameter access is provided for the convolutional calculation of the convolutional neural network, so that the access bandwidth in the convolutional calculation is increased, and the access bandwidth bottleneck faced by the existing neural network is eliminated. The invention can greatly improve the memory access bandwidth in the calculation process, reduce the memory access delay, simultaneously improve the calculation performance of the convolutional neural network and realize the calculation acceleration of the convolutional neural network.
In addition, the invention has reliable design principle, simple structure and very wide application prospect.
Drawings
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention.
Fig. 2 is a schematic architecture diagram of a system of one embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The following explains key terms appearing in the present invention.
The RISC-V architecture is a latest generation open Instruction Set Architecture (ISA), belongs to a reduced instruction set, uses a BSDLicense open source protocol, and has the characteristics of light weight and low power consumption. Users can rapidly design and realize processors based on the RISC-V instruction set based on open source software and hardware ecology of the RISC-V instruction set, wherein the ecology comprises ISA specifications, complete software stacks of embedded type and general computation, various RISC-V processors and system-level hardware basic architectures. The RISC-V design mode adopts a modular design, can meet different application requirements through the combination of different module instructions, and also has an extended instruction function, so that a user can customize the instruction function and realize the corresponding function according to actual requirements. Based on the characteristics, the RISC-V is particularly suitable for application scenes of lightweight and many-core, and is particularly suitable for the design and implementation of many-core accelerators.
DMA (Direct Memory Access) allows hardware devices of different speeds to communicate without relying on a large amount of interrupt load of a CPU, so that peripheral devices can directly Access a Memory through a DMA controller.
FIG. 1 is a schematic flow diagram of a method of one embodiment of the invention. The execution subject in fig. 1 may be a convolutional neural network acceleration system.
As shown in fig. 1, the method 100 includes:
step 110, generating a soft core of the RISC-V processor by using a source code generator;
step 120, a RISC-V single core is constructed by setting the extended DMA of the soft core of the RISC-V processor, the memory controller and the distributed memory module;
step 130, constructing a multi-core acceleration array with a preset specification by using the RISC-V single core;
step 140, accessing the many-core acceleration array to a convolutional neural network system, wherein the convolutional neural network system comprises a main processor and convolutional neural network hardware.
In order to facilitate understanding of the present invention, the acceleration method of the convolutional neural network provided in the present invention is further described below with reference to the principle of the acceleration method of the convolutional neural network of the present invention and the acceleration process of the convolutional neural network in the embodiment.
Specifically, the convolutional neural network acceleration method includes:
and S1, generating a soft core of the RISC-V processor by using the source code generator.
Generating a soft core of the RISC-V processor: an open-source RISC-V RocktChip generator (a set of processor source code generator based on a RISC-V reduced instruction set developed by Berkeley division of California university) is utilized to generate soft-core RTL source codes of a RISC-V32-bit processor through kernel generation parameter configuration.
S2, the RISC-V single core is constructed by setting the extension DMA, the memory controller and the distributed memory module of the soft core of the RISC-V processor.
And for the processor source code generated in the step S1, constructing the simplest 32-bit RISC-V single-core processing system based on the generated AXI bus interface extended DMA, the memory controller and the distributed memory module.
And S3, constructing a multi-core acceleration array with a preset specification by using the RISC-V single core.
According to the specification of the many-core acceleration array set according to the calculation amount of the convolutional neural network, the embodiment sets the many-core acceleration array of 8 × 8, namely, 64 32-bit RISC-V single-core processing systems are created to form the many-core acceleration array.
And S4, accessing the many-core acceleration array into a convolutional neural network system, wherein the convolutional neural network system comprises a main processor and convolutional neural network hardware.
Constructing a RISC-V main processor system: a64-bit RISC-V dual-core processor is generated by utilizing an open source RISC-V tool chain, a DDR controller and a memory device are added, and a dual-core RISC-V system is designed by utilizing open source firmware and a Linux system in a RISC-V ecology. The main processor system can also adopt other existing architecture processors, but the RoCC conversion interface is finally designed to realize the interconnection with the acceleration array. The method for constructing the RoCC interface module comprises the following steps: the RoCC interface module is generated by an open source RISC-V tool chain and is interconnected with a neural network and an RV-32 calculation acceleration array.
Constructing a neural network processing module: the convolutional neural network can be realized by adopting a common multilayer convolutional neural network architecture, and comprises a data input layer, a convolutional calculation layer, an excitation layer, a pooling layer, a full-link layer and the like. In the invention, all storage and access interfaces of convolution calculation are required to be extracted, and are interconnected with the many-core acceleration array through a Buffer (data cache module) by a DMA interface.
Integrating the designed or generated modules, loading system firmware, starting the system, and loading an application program for testing and debugging.
As shown in fig. 2, the present embodiment provides a convolutional neural network acceleration system, including:
the system comprises a main processor, convolutional neural network hardware and a many-core acceleration array, wherein the main processor is in communication connection with the convolutional neural network; the many-core acceleration array is respectively interconnected with the main processor and the convolutional neural network hardware;
the many-core acceleration array comprises a plurality of RISC-V single cores, a RISC-V single core RISC-V32-bit processor, a direct memory access module, a memory controller and a distributed memory module.
Specifically, the main processor includes: 64-bit RISC-V dual-core processor, DDR controller, memory device, open source firmware and Linux system. The system also comprises two identical RoCC conversion interfaces which are respectively interconnected with the convolutional neural network hardware and the many-core acceleration array.
The many-core acceleration array comprises: an 8x8 RV _32 array, i.e., 64 RV _32 computational units. The RV _32 calculation unit includes: RISC-V32-bit processor, AXI bus interface extended DMA, memory controller and distributed memory module.
The convolutional neural network hardware includes: a data input layer, a convolution calculation layer, an excitation layer, a pooling layer, a full-link layer and the like. The convolutional neural network implementation of the present invention depends on the actual neural network type that needs acceleration of the user, and the exemplary system of the present invention employs a common multi-layer convolutional neural network.
In this embodiment, a common multilayer convolutional neural network hardware is adopted, all storage and access interfaces of the convolutional neural network hardware are extracted, and the DMA interface is interconnected with the many-core acceleration array through a Buffer (data cache module).
Fig. 3 is a schematic structural diagram of a terminal system 300 according to an embodiment of the present invention, where the terminal system 300 may be used to execute the convolutional neural network acceleration method according to the embodiment of the present invention.
The terminal system 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not intended to be limiting, and may be a bus architecture, a star architecture, a combination of more or less components than those shown, or a different arrangement of components.
The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.
The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiment of the present invention, the CPU may be a single operation core, or may include multiple operation cores.
A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.
The present invention also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided by the present invention when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).
Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A convolutional neural network acceleration method, comprising:
generating a soft core of the RISC-V processor by using a source code generator;
the RISC-V single core is constructed by setting an extended DMA (direct memory access) of the soft core of the RISC-V processor, a memory controller and a distributed memory module;
constructing a multi-core acceleration array with a preset specification by using the RISC-V single core;
and accessing the many-core acceleration array into a convolutional neural network system, wherein the convolutional neural network system comprises a main processor and convolutional neural network hardware.
2. The method of claim 1, wherein generating a RISC-V processor soft core using a source code generator comprises:
generating parameter configuration by an open source RISC-V RockChip generator through a kernel;
and generating a soft core RTL source code of the RISC-V32-bit processor according to the parameter configuration.
3. The method of claim 1, wherein said constructing a RISC-V single core by configuring extended DMA, memory controller and distributed memory module of said RISC-V processor soft core comprises:
the method comprises the following steps that an AXI bus interface based on a soft core of the RISC-V processor extends a direct memory access module, a memory controller and a distributed memory module, wherein the direct memory access module is connected with convolutional neural network hardware.
4. The method of claim 1, wherein constructing a pre-specified many-core accelerator array using RISC-V single cores comprises:
setting the number of RISC-V single cores of the many-core acceleration array according to the calculation amount requirement of the convolutional neural network;
and constructing a set number of RISC-V single cores to form a many-core acceleration array.
5. The method of claim 1, further comprising:
generating a 64-bit RISC-V dual-core processor by utilizing an open source RISC-V tool chain;
adding a direct memory access module and a memory device to the RISC-V dual-core processor;
configuring a dual-core RISC-V system by utilizing the open source firmware and the Linux system in the RISC-V ecology;
and setting a RoCC conversion interface in the dual-core RISC-V system.
6. The method of claim 5, wherein the setting of the RoCC conversion interface in the dual-core RISC-V system comprises:
generating a RoCC conversion interface by utilizing an open source RISC-V tool chain;
and respectively connecting the many-core acceleration array and the convolutional neural network hardware by using the RoCC conversion interface.
7. A convolutional neural network acceleration system, comprising:
the system comprises a main processor, convolutional neural network hardware and a many-core acceleration array, wherein the main processor is in communication connection with the convolutional neural network; the many-core acceleration array is respectively interconnected with the main processor and the convolutional neural network hardware;
the many-core acceleration array comprises a plurality of RISC-V single cores, a RISC-V single core RISC-V32-bit processor, a direct memory access module, a memory controller and a distributed memory module.
8. The system of claim 7, wherein the many-core acceleration array is interconnected with a main processor through a RoCC translation interface; the many-core acceleration array is interconnected with the convolutional neural network through the direct memory access module.
9. A terminal, comprising:
a processor;
a memory for storing instructions for execution by the processor;
wherein the processor is configured to perform the method of any one of claims 1-6.
10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN202010094798.6A 2020-02-16 2020-02-16 Convolutional neural network acceleration method, system, terminal and storage medium Pending CN111340185A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010094798.6A CN111340185A (en) 2020-02-16 2020-02-16 Convolutional neural network acceleration method, system, terminal and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010094798.6A CN111340185A (en) 2020-02-16 2020-02-16 Convolutional neural network acceleration method, system, terminal and storage medium

Publications (1)

Publication Number Publication Date
CN111340185A true CN111340185A (en) 2020-06-26

Family

ID=71186291

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010094798.6A Pending CN111340185A (en) 2020-02-16 2020-02-16 Convolutional neural network acceleration method, system, terminal and storage medium

Country Status (1)

Country Link
CN (1) CN111340185A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306663A (en) * 2020-11-12 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Parallel computing accelerator and embedded system
CN112988238A (en) * 2021-05-06 2021-06-18 成都启英泰伦科技有限公司 Extensible operation device and method based on extensible instruction set CPU kernel
CN113160062A (en) * 2021-05-25 2021-07-23 烟台艾睿光电科技有限公司 Infrared image target detection method, device, equipment and storage medium
WO2023092620A1 (en) * 2021-11-29 2023-06-01 山东领能电子科技有限公司 Risc-v-based three-dimensional interconnection many-core processor architecture and operating method therefor
US11714649B2 (en) 2021-11-29 2023-08-01 Shandong Lingneng Electronic Technology Co., Ltd. RISC-V-based 3D interconnected multi-core processor architecture and working method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110443214A (en) * 2019-08-12 2019-11-12 山东浪潮人工智能研究院有限公司 A kind of recognition of face accelerating circuit system and accelerated method based on RISC-V
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework
CN110443214A (en) * 2019-08-12 2019-11-12 山东浪潮人工智能研究院有限公司 A kind of recognition of face accelerating circuit system and accelerated method based on RISC-V

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
中国电子商情: "《兆易创新大胆启用双赛道策略,率先实现RISC-V通用MCU商用落地》", 《HTTPS://WWW.FX361.COM/PAGE/2019/0910/9764675.SHTML》 *
杨维科: "《基于RISC-V开源处理器的卷积神经网络加速器设计方法研究》", 《CNKI硕士论文数据库》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306663A (en) * 2020-11-12 2021-02-02 山东云海国创云计算装备产业创新中心有限公司 Parallel computing accelerator and embedded system
CN112988238A (en) * 2021-05-06 2021-06-18 成都启英泰伦科技有限公司 Extensible operation device and method based on extensible instruction set CPU kernel
CN113160062A (en) * 2021-05-25 2021-07-23 烟台艾睿光电科技有限公司 Infrared image target detection method, device, equipment and storage medium
WO2023092620A1 (en) * 2021-11-29 2023-06-01 山东领能电子科技有限公司 Risc-v-based three-dimensional interconnection many-core processor architecture and operating method therefor
US11714649B2 (en) 2021-11-29 2023-08-01 Shandong Lingneng Electronic Technology Co., Ltd. RISC-V-based 3D interconnected multi-core processor architecture and working method thereof

Similar Documents

Publication Publication Date Title
CN111340185A (en) Convolutional neural network acceleration method, system, terminal and storage medium
US11562213B2 (en) Methods and arrangements to manage memory in cascaded neural networks
CN110096309B (en) Operation method, operation device, computer equipment and storage medium
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN109284824B (en) Reconfigurable technology-based device for accelerating convolution and pooling operation
US20150363318A1 (en) Cache way prediction
CN111275179B (en) Architecture and method for accelerating neural network calculation based on distributed weight storage
CN114327399B (en) Distributed training method, device, computer equipment, storage medium and product
CN112633505B (en) RISC-V based artificial intelligence reasoning method and system
CN111399911B (en) Artificial intelligence development method and device based on multi-core heterogeneous computation
CN115456155A (en) Multi-core storage and calculation processor architecture
CN115600664B (en) Operator processing method, electronic device and storage medium
Colangelo et al. Application of convolutional neural networks on Intel® Xeon® processor with integrated FPGA
Li et al. HatRPC: Hint-accelerated thrift RPC over RDMA
Pati et al. T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Zhuang et al. SSR: Spatial Sequential Hybrid Architecture for Latency Throughput Tradeoff in Transformer Acceleration
US20190272460A1 (en) Configurable neural network processor for machine learning workloads
CN116402091A (en) Hybrid engine intelligent computing method and device for artificial intelligent chip
Afonso et al. Heterogeneous CPU/FPGA reconfigurable computing system for avionic test application
CN111722930B (en) Data preprocessing system
CN111832714B (en) Operation method and device
Naruko et al. FOLCS: A lightweight implementation of a cycle-accurate NoC simulator on FPGAs
CN111105015A (en) General CNN reasoning accelerator, control method thereof and readable storage medium
Giefers et al. Extending the power architecture with transprecision co-processors
Ewo et al. Hardware mpi-2 functions for multi-processing reconfigurable system on chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200626

RJ01 Rejection of invention patent application after publication