CN111027688A - Neural network calculator generation method and device based on FPGA - Google Patents

Neural network calculator generation method and device based on FPGA Download PDF

Info

Publication number
CN111027688A
CN111027688A CN201911002447.1A CN201911002447A CN111027688A CN 111027688 A CN111027688 A CN 111027688A CN 201911002447 A CN201911002447 A CN 201911002447A CN 111027688 A CN111027688 A CN 111027688A
Authority
CN
China
Prior art keywords
network
network layer
group
neural network
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911002447.1A
Other languages
Chinese (zh)
Inventor
罗国杰
戴拓
章嘉玺
张文泰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN201911002447.1A priority Critical patent/CN111027688A/en
Publication of CN111027688A publication Critical patent/CN111027688A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • G06N3/105Shells for specifying net layout

Abstract

The invention discloses a neural network calculator generating method and a device based on an FPGA (field programmable gate array), wherein the method comprises the following steps: determining the dependency relationship of each network layer in the neural network; grouping the network layers, wherein the network layers in each group have the same dependency relationship, and determining the size of an array block required by each group; and deploying an FPGA according to the dependency relationship of the network layer in each group and the size of the array block required by each group to obtain the calculator of the neural network. By analyzing the dependency relationship of the neural network, network layers which are not dependent (namely, the dependency relationships are all the same) are divided into a group, so that parallel calculation can be realized in array blocks distributed for the network layers, the length of a critical path is effectively shortened, and the calculation efficiency is improved. In addition, the positions of array blocks required by each group are arranged on the FPGA according to the dependency relationship, so that data exchange among arrays can be reduced, and the calculation efficiency is further improved.

Description

Neural network calculator generation method and device based on FPGA
Technical Field
The invention relates to the technical field of pulse array application, in particular to a neural network calculator generation method and device based on an FPGA.
Background
Neural networks are computational structures commonly used in deep learning applications, and the implementation of such computational structures is implemented on FPGAs using systolic array computational architectures. Therefore, the systolic array architecture on the FPGA can perform large-scale neural network calculation quickly and accurately, which is a challenging design problem, and needs to consider two factors, namely calculation and communication.
In the prior art, a computing architecture designed for a neural network generally has a problem that the array needs serial computation in a plurality of arrays, which results in a longer critical path length and a lower computing efficiency of the computing architecture, thereby reducing the use efficiency of the systolic array architecture.
Disclosure of Invention
The invention aims to provide a neural network calculator generating method and device based on an FPGA (field programmable gate array), aiming at the defects of the prior art, and the aim is realized by the following technical scheme.
The invention provides a neural network calculator generating method based on an FPGA (field programmable gate array), which comprises the following steps of:
determining the dependency relationship of each network layer in the neural network;
grouping the network layers, wherein the network layers in each group have the same dependency relationship, and determining the size of an array block required by each group;
and deploying the FPGA according to the dependency relationship of the network layer in each group and the size of the array block required by each group to obtain the calculator of the neural network.
A second aspect of the present invention provides an FPGA-based neural network calculator generating apparatus, the apparatus comprising:
the first determining module is used for determining the dependency relationship of each network layer in the neural network;
the second determining module is used for grouping the network layers, the dependency relationship of the network layers in each group is the same, and the size of the array block required by each group is determined;
and the generating module is used for deploying the FPGA according to the dependency relationship of the network layer in each group and the size of the array block required by each group to obtain the calculator of the neural network.
In the embodiment of the invention, the dependency relationship of each network layer in the neural network is determined, the network layers are grouped, the dependency relationship of the network layers in each group is the same, then the size of an array block required by each group is determined, and an FPGA is deployed according to the dependency relationship of the network layers in each group and the size of the array block required by each group to obtain the calculator of the neural network.
Based on the above description, by analyzing the dependency relationship of the neural network, network layers that are not dependent (i.e., the dependency relationships are all the same) are divided into a group, so that parallel computation can be realized in the array blocks allocated to the network layers, the length of the critical path is effectively shortened, and the computation efficiency is improved. In addition, the positions of array blocks required by each group are arranged on the FPGA according to the dependency relationship, so that data exchange among arrays can be reduced, and the calculation efficiency is further improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of a neural network architecture in accordance with an exemplary embodiment of the present invention;
FIG. 2 is a schematic diagram of the computational architecture of the neural network of FIG. 1 on an FPGA in accordance with the present invention;
FIG. 3A is a flow chart of an embodiment of a method for generating an FPGA-based neural network calculator in accordance with an exemplary embodiment of the present invention;
FIG. 3B is a schematic diagram of the present invention according to the computational architecture of the neural network of FIG. 1 on an FPGA;
FIG. 4 is a diagram illustrating a hardware configuration of an electronic device in accordance with an exemplary embodiment of the present invention;
fig. 5 is a flowchart illustrating an embodiment of an FPGA-based neural network calculator generating device according to an exemplary embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In the prior art, a computing architecture based on a systolic array is usually designed for a neural network by using a Vivado tool, but the problem of serial computing of arrays divided for a network layer in the designed computing architecture is more, so that the length of a critical path is longer, the computing efficiency is not high, and the use efficiency of the systolic array architecture is further reduced.
As shown in fig. 1, the neural network includes 5 network layers, a computing architecture designed for the neural network by using an existing design tool, as shown in fig. 2, on the FPGA, an array a is divided for the network layer 1 and the network layer 2, an array b is divided for the network layer 4 and the network layer 5, and an array c is separately divided for the network layer 3, since an output of the network layer 1 is to be used as an input of the network layer 2, the network layer 3, and the network layer 4, respectively, an input of the network layer 5 includes an output of the network layer 2, an output of the network layer 3, and an output of the network layer 4, a serial computation of the network layer 1 and the network layer 2 is required in the array a, a serial computation of the network layer 4 and the network layer 5 is required in the array c, which results in a long critical path length, and the array a needs to perform data transfer with both the array b and the array c, which results in a lot of data exchange between different arrays, this can be inferred that the computing architecture is computationally inefficient.
In order to solve the technical problem, the invention provides a neural network calculator generation scheme based on an FPGA (field programmable gate array). the neural network calculator is obtained by determining the dependency relationship of each network layer in a neural network, grouping the network layers, wherein the dependency relationship of the network layers in each group is the same, then determining the size of an array block required by each group, generating a binary file according to the dependency relationship of the network layers in each group and the size of the array block required by each group, and deploying the binary file into the FPGA.
Based on the above description, by analyzing the dependency relationship of the neural network, network layers that are not dependent (i.e., the dependency relationships are all the same) are divided into a group, so that parallel computation can be realized in the array blocks allocated to the network layers, the length of the critical path is effectively shortened, and the computation efficiency is improved. In addition, the positions of array blocks required by each group are arranged on the FPGA according to the dependency relationship, so that data exchange among arrays can be reduced, and the calculation efficiency is further improved.
The specific embodiment of the invention will be described in detail below for the FPGA-based neural network calculator generation scheme.
Fig. 3A is a flowchart illustrating an embodiment of a method for generating an FPGA-based neural network calculator, according to an exemplary embodiment of the present invention, where the method for generating an FPGA-based neural network calculator can be applied to an electronic device (e.g., a terminal, a server, etc.). As shown in fig. 3A, the method for generating the neural network calculator based on the FPGA includes the following steps:
step 301: and determining the dependency relationship of each network layer in the neural network.
In one embodiment, the dependency relationship of each network layer may be determined according to the data transfer relationship between each network layer in the neural network.
In an exemplary scenario, as described above for the neural network shown in fig. 1, the data transfer relationship between the network layers is: the output of the network layer 1 is to be used as the input of the network layer 2, the network layer 3 and the network layer 4, respectively, and the input of the network layer 5 comprises the output of the network layer 2, the output of the network layer 3 and the output of the network layer 4, so that the dependency relationship of the network layer 1 can be determined as the output to the network layer 2, the network layer 3 and the network layer 4; the dependence relationship of the network layer 2 is that the input is the network layer 1 and the output is the network layer 5; the dependence relationship of the network layer 3 is that the input is the network layer 1 and the output is the network layer 5; the dependence relationship of the network layer 4 is that the input is the network layer 1 and the output is the network layer 5; the network layer 5 has the dependencies of the inputs network layer 2, network layer 3 and network layer 4.
Step 302: and grouping the network layers, wherein the network layers in each group have the same dependency relationship, and determining the size of the array block required by each group.
In the invention, independent network layers (namely network layers with the same dependency relationship) are taken as a group, and network layers with different dependency relationships are taken as a group, so that the purpose of parallel calculation of each network layer can be realized by an array block allocated for the group in an FPGA, and the length of a critical path is effectively shortened.
Based on the exemplary scenario shown in step 301, since the network layer 2, the network layer 3, and the network layer 4 have the same dependency relationship and are independent of each other, the network layer 2, the network layer 3, and the network layer 4 may be regarded as one packet 1, the network layer 1 and the network layer 5 have the different dependency relationship, the network layer 1 may be regarded as one packet 2, and the network layer 5 may be regarded as one packet 3, so that the network layer in the neural network shown in fig. 1 may be divided into three packets.
In an embodiment, since computation of each network layer requires a certain amount of computation resources, the size of the array block required by each packet needs to be determined to ensure that an array block satisfying the computation resources required by the network layer is allocated in the FPGA.
Based on this, for the process of determining the required array block size for each packet, the number of computational resources required by the network layer in the packet may be determined for each packet, and then the required array block size may be determined according to the number of computational resources required by the packet.
In addition, a large number of multiply-add operations are involved in the operation process of the network layer, so the amount of computing resources required by the network layer may be the sum of the required number of multiply operations and the required number of add operations.
In the systolic array, the minimum processing units are PEs (processing elements), and each PE is a multiplier-adder MAC (multi-accumulator Unit), i.e. a multiplication operation or an addition operation is represented. The array block size required for each packet thus refers to the number of processing element PEs that the array block needs to contain, and the number of computational resources that the array block can provide also refers to the number of contained PEs.
It follows that each PE can compute in parallel, since each PE can share peripherals and a front-end. When there are multiple independent network layers in a packet, the multiple network layers in the array block allocated to the packet can be computed in parallel to effectively shorten the critical path length.
In one example, for the process of determining the amount of computing resources required by the network layers in the packet, the amount of computing resources required may be determined by obtaining network parameters included in each network layer in the packet from the neural network and determining the amount of computing resources required according to the network parameters included in each network layer.
For network layers with different functions, the network parameters included therein are different, and further the required computing resources are different, so that the amount of the computing resources required by each network layer needs to be determined respectively.
Illustratively, the network parameters involved for the convolutional layer are convolutional kernel size, the network parameters involved for the pooling layer are pooling layer size and step size, the network parameters involved for the fully-connected layer are output channel number, and so on.
Step 303: and deploying the FPGA according to the dependency relationship of the network layer in each group and the size of the array block required by each group to obtain the calculator of the neural network.
In an embodiment, when the FPGA is deployed, a binary file may be generated according to a dependency relationship of a network layer in each packet and an array block size required by each packet, and then the binary file is loaded into the FPGA to implement the deployment.
Exemplary, exemplary scenarios as shown in steps 301 and 302 above:
the dependency relationships of the network layers in the packet 1 are input as the network layer 1 and output to the network layer 5; the dependency relationships of the network layers in the packet 2 are all output to the network layer 2, the network layer 3 and the network layer 4; the network layer dependency relationships in the group 3 are all input as the network layer 2, the network layer 3 and the network layer 4, so that when the FPGA is deployed, the array block divided for the group 1 and the array block divided for the group 2 have a connection relationship, and the array block of the group 2 and the array block divided for the group 3 have a connection relationship, so that the array block of the group 1 is adjacent to the array block of the group 2 in position, and the array block of the group 2 is adjacent to the array block of the group 3 in position.
As shown in fig. 3B, in the structure of the calculator of the neural network shown in fig. 1, the array block of group 1 is connected to the array block of group 2, and the array block of group 2 is further connected to the array block of group 3.
Comparing fig. 3B with the conventional computing architecture shown in fig. 2, the computing architecture obtained by the present invention has fewer inter-array connections and fewer inter-array data exchanges, so that the computing efficiency is high.
It should be noted that, during the deployment process, the network parameters included in the network layer in each packet need to be added to the array block allocated to the packet according to the operation rule.
In this embodiment, the calculator of the neural network is obtained by determining the dependency relationship of each network layer in the neural network, grouping the network layers, determining the size of an array block required by each group, and deploying the FPGA according to the dependency relationship of the network layers in each group and the size of the array block required by each group.
Based on the above description, by analyzing the dependency relationship of the neural network, network layers that are not dependent (i.e., the dependency relationships are all the same) are divided into a group, so that parallel computation can be realized in the array blocks allocated to the network layers, the length of the critical path is effectively shortened, and the computation efficiency is improved. In addition, the positions of array blocks required by each group are arranged on the FPGA according to the dependency relationship, so that data exchange among arrays can be reduced, and the calculation efficiency is further improved.
Fig. 4 is a hardware block diagram of an electronic device according to an exemplary embodiment of the present invention, the electronic device including: a communication interface 401, a processor 402, a machine-readable storage medium 403, and a bus 404; wherein the communication interface 401, the processor 402 and the machine-readable storage medium 403 communicate with each other via a bus 404. The processor 402 can execute the FPGA-based neural network calculator generation method described above by reading and executing machine-executable instructions in the machine-readable storage medium 403 corresponding to the control logic of the FPGA-based neural network calculator generation method, and the details of the method are described in the above embodiments and will not be described again here.
The machine-readable storage medium 403 referred to in this disclosure may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: volatile memory, non-volatile memory, or similar storage media. In particular, the machine-readable storage medium 403 may be a RAM (random Access Memory), a flash Memory, a storage drive (e.g., a hard disk drive), any type of storage disk (e.g., an optical disk, a DVD, etc.), or similar storage medium, or a combination thereof.
Corresponding to the embodiment of the neural network calculator generating method based on the FPGA, the invention also provides an embodiment of a neural network calculator generating device based on the FPGA.
Fig. 5 is a flow chart illustrating an embodiment of an FPGA-based neural network calculator generating device according to an exemplary embodiment of the present invention, which can be applied to an electronic device. As shown in fig. 5, the FPGA-based neural network calculator generating apparatus includes:
a first determining module 510, configured to determine a dependency relationship of each network layer in the neural network;
a second determining module 520, configured to group network layers, where dependency relationships of the network layers in each group are the same, and determine a size of an array block required by each group;
a generating module 530, configured to deploy an FPGA to obtain a calculator of the neural network according to the dependency relationship of the network layer in each packet and the size of the array block required by each packet.
In an optional implementation manner, the first determining module 510 is specifically configured to determine the dependency relationship of each network layer according to a data transfer relationship between each network layer in the neural network.
In an optional implementation manner, the second determining module 520 is specifically configured to, in the process of determining the size of the array block required by each packet, determine, for each packet, the number of computing resources required by the network layer in the packet; the required array block size is determined according to the amount of computational resources required for the packet.
In an optional implementation manner, the second determining module 520 is specifically configured to, in a process of determining the number of computing resources required by the network layer in the packet, obtain network parameters included in each network layer in the packet from the neural network; and determining the required amount of computing resources according to the network parameters contained in each network layer.
In an alternative implementation, the array block size required for each packet refers to the number of processing element PEs that the array block needs to contain.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This invention is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. An FPGA-based neural network calculator generation method, the method comprising:
determining the dependency relationship of each network layer in the neural network;
grouping the network layers, wherein the network layers in each group have the same dependency relationship, and determining the size of an array block required by each group;
and deploying the FPGA according to the dependency relationship of the network layer in each group and the size of the array block required by each group to obtain the calculator of the neural network.
2. The method of claim 1, wherein determining the dependencies of the network layers in the neural network comprises:
and determining the dependency relationship of each network layer according to the data transfer relationship among the network layers in the neural network.
3. The method of claim 1, wherein determining the required array block size for each packet comprises:
for each packet, determining the amount of computing resources required by the network layer in the packet;
the required array block size is determined according to the amount of computational resources required for the packet.
4. The method of claim 3, wherein determining the amount of computing resources required by the network layer in the packet comprises:
acquiring network parameters contained in each network layer in the packet from the neural network;
and determining the required amount of computing resources according to the network parameters contained in each network layer.
5. The method according to any of claims 1-4, wherein the array block size required for each packet refers to the number of processing elements PE that an array block needs to contain.
6. An FPGA-based neural network calculator generating apparatus, the apparatus comprising:
the first determining module is used for determining the dependency relationship of each network layer in the neural network;
the second determining module is used for grouping the network layers, the dependency relationship of the network layers in each group is the same, and the size of the array block required by each group is determined;
and the generating module is used for deploying the FPGA according to the dependency relationship of the network layer in each group and the size of the array block required by each group to obtain the calculator of the neural network.
7. The apparatus of claim 6, wherein the first determining module is specifically configured to determine the dependency relationship of each network layer according to a data transfer relationship between each network layer in the neural network.
8. The apparatus according to claim 6, wherein the second determining module is specifically configured to determine, for each packet, a number of computing resources required by a network layer in the packet in the process of determining the array block size required by each packet; the required array block size is determined according to the amount of computational resources required for the packet.
9. The apparatus according to claim 8, wherein the second determining module is specifically configured to, in the process of determining the number of computing resources required by the network layer in the packet, obtain network parameters included in each network layer in the packet from the neural network; and determining the required amount of computing resources according to the network parameters contained in each network layer.
10. The apparatus according to any of claims 6-9, wherein the array block size required for each packet refers to the number of processing elements PE that an array block needs to contain.
CN201911002447.1A 2019-10-21 2019-10-21 Neural network calculator generation method and device based on FPGA Pending CN111027688A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911002447.1A CN111027688A (en) 2019-10-21 2019-10-21 Neural network calculator generation method and device based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911002447.1A CN111027688A (en) 2019-10-21 2019-10-21 Neural network calculator generation method and device based on FPGA

Publications (1)

Publication Number Publication Date
CN111027688A true CN111027688A (en) 2020-04-17

Family

ID=70201295

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911002447.1A Pending CN111027688A (en) 2019-10-21 2019-10-21 Neural network calculator generation method and device based on FPGA

Country Status (1)

Country Link
CN (1) CN111027688A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111736904A (en) * 2020-08-03 2020-10-02 北京灵汐科技有限公司 Multitask parallel processing method and device, computer equipment and storage medium
CN115454905A (en) * 2022-08-22 2022-12-09 杭州未名信科科技有限公司 PCIE interface card for chip FPGA prototype verification stage

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114548A1 (en) * 2017-10-17 2019-04-18 Xilinx, Inc. Static block scheduling in massively parallel software defined hardware systems
CN109697500A (en) * 2018-12-29 2019-04-30 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and storage medium
US20190236438A1 (en) * 2018-01-30 2019-08-01 Google Llc Adjusting neural network resource usage
US20190303762A1 (en) * 2018-03-30 2019-10-03 Xilinx, Inc. Methods of optimization of computational graphs of neural networks

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190114548A1 (en) * 2017-10-17 2019-04-18 Xilinx, Inc. Static block scheduling in massively parallel software defined hardware systems
US20190236438A1 (en) * 2018-01-30 2019-08-01 Google Llc Adjusting neural network resource usage
US20190303762A1 (en) * 2018-03-30 2019-10-03 Xilinx, Inc. Methods of optimization of computational graphs of neural networks
CN109697500A (en) * 2018-12-29 2019-04-30 北京中科寒武纪科技有限公司 Data processing method, device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111736904A (en) * 2020-08-03 2020-10-02 北京灵汐科技有限公司 Multitask parallel processing method and device, computer equipment and storage medium
US11392426B2 (en) 2020-08-03 2022-07-19 Lynxi Technologies Co., Ltd. Multitask parallel processing method and apparatus, computer device and storage medium
CN115454905A (en) * 2022-08-22 2022-12-09 杭州未名信科科技有限公司 PCIE interface card for chip FPGA prototype verification stage
CN115454905B (en) * 2022-08-22 2024-02-20 杭州未名信科科技有限公司 PCIE interface card for chip FPGA prototype verification stage

Similar Documents

Publication Publication Date Title
US10592241B2 (en) Apparatus and methods for matrix multiplication
US11816045B2 (en) Exploiting input data sparsity in neural network compute units
CN107533459B (en) Data processing method and unit using resistance memory array
US9886377B2 (en) Pipelined convolutional operations for processing clusters
US11915139B2 (en) Modifying machine learning models to improve locality
CN113703775B (en) Compiling method, compiling device, compiling equipment and storage medium
EP3432157B1 (en) Data table joining mode processing method and apparatus
CN115186821B (en) Core particle-oriented neural network inference overhead estimation method and device and electronic equipment
CN110058882B (en) OPU instruction set definition method for CNN acceleration
US9965343B2 (en) System and method for determining concurrency factors for dispatch size of parallel processor kernels
CN111027688A (en) Neural network calculator generation method and device based on FPGA
CN114138231B (en) Method, circuit and SOC for executing matrix multiplication operation
CN110399972B (en) Data processing method and device and electronic equipment
US11003448B2 (en) DSP slice configured to forward operands to associated DSP slices
CN115080496A (en) Network mapping method, data processing method and device, equipment, system and medium
CN115061825A (en) Heterogeneous computing system and method for private computing, private data and federal learning
US10761847B2 (en) Linear feedback shift register for a reconfigurable logic unit
CN113705795A (en) Convolution processing method and device, convolution neural network accelerator and storage medium
CN110955380A (en) Access data generation method, storage medium, computer device and apparatus
CN113704687B (en) Tensor calculation operation method, device and operation system
US11842169B1 (en) Systolic multiply delayed accumulate processor architecture
GB2556413A (en) Exploiting input data sparsity in neural network compute units
CN115688894A (en) Processing method and device based on processing array, electronic equipment and storage medium
US20240126617A1 (en) Deep fusion of kernel execution
CN111767980B (en) Model optimization method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 101, building 1, block C, Qianjiang Century Park, ningwei street, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant after: Hangzhou Weiming Information Technology Co.,Ltd.

Applicant after: Institute of Information Technology, Zhejiang Peking University

Address before: Room 288-1, 857 Xinbei Road, Ningwei Town, Xiaoshan District, Hangzhou City, Zhejiang Province

Applicant before: Institute of Information Technology, Zhejiang Peking University

Applicant before: Hangzhou Weiming Information Technology Co.,Ltd.

CB02 Change of applicant information