CN114492781A - Hardware accelerator, data processing method, system, equipment and medium - Google Patents

Hardware accelerator, data processing method, system, equipment and medium Download PDF

Info

Publication number
CN114492781A
CN114492781A CN202210340279.2A CN202210340279A CN114492781A CN 114492781 A CN114492781 A CN 114492781A CN 202210340279 A CN202210340279 A CN 202210340279A CN 114492781 A CN114492781 A CN 114492781A
Authority
CN
China
Prior art keywords
neural network
instruction
data
network operation
operation instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210340279.2A
Other languages
Chinese (zh)
Inventor
曹其春
董刚
胡克坤
杨宏斌
尹文枫
王斌强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202210340279.2A priority Critical patent/CN114492781A/en
Publication of CN114492781A publication Critical patent/CN114492781A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)

Abstract

The application discloses a hardware accelerator and a data processing method, system, equipment and medium, wherein the method comprises the steps of obtaining a neural network operation instruction; splitting a neural network operation instruction into a convolution instruction and other instructions; acquiring feature data and filter data corresponding to the neural network operation instruction, and partitioning the feature data and the filter data to obtain block data; and performing parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result. In the application, after the hardware accelerator acquires the neural network operation instruction, the neural network operation instruction is split into the convolution instruction and other instructions, the neural network operation instruction is partitioned into feature data and filter data corresponding to the feature data to obtain block data, and finally the block data is operated in parallel based on the convolution instruction and other instructions, so that a target operation result can be quickly obtained, and the efficiency is high.

Description

Hardware accelerator, data processing method, system, equipment and medium
Technical Field
The present application relates to the field of neural network technologies, and in particular, to a hardware accelerator, and a data processing method, system, device, and medium.
Background
With the development of artificial intelligence in various fields such as agriculture, finance, security, health care, manufacturing and the like, people urgently expect that an algorithm can be calculated faster, has higher precision and lower power consumption. CNN (convolutional neural network), one of the most important representatives in the field of artificial intelligence algorithms, has made a lot of breakthrough progress in the field of image analysis and processing, and has been widely applied to various image-related applications.
However, due to the special calculation mode of the CNN, the general purpose processor is not efficient for implementing the CNN, and cannot meet the performance requirement. Therefore, various hardware accelerators designed based on FPGA (Field-Programmable Gate Array), GPU (graphics processing unit) or even ASIC (Application Specific Integrated Circuit) have been proposed recently to improve the performance of CNN design. If the hardware accelerator architecture is not carefully designed, its computational throughput does not match the memory bandwidth that provides the FPGA platform. This means that performance will be degraded due to under-utilization of logic resources or memory bandwidth.
In summary, how to improve the operation efficiency of the hardware accelerator on the neural network is a problem to be solved urgently by those skilled in the art.
Disclosure of Invention
The application aims to provide a data processing method which can solve the technical problem of improving the operation efficiency of a hardware accelerator on a neural network to a certain extent. The application also provides a hardware accelerator, a data processing system, a device and a computer readable storage medium.
In order to achieve the above purpose, the present application provides the following technical solutions:
a data processing method is applied to a hardware accelerator and comprises the following steps:
acquiring a neural network operation instruction;
splitting the neural network operation instruction into a convolution instruction and other instructions;
acquiring feature data and filter data corresponding to the neural network operation instruction, and blocking the feature data and the filter data to obtain block data;
and performing parallel operation on the block data based on the convolution instruction and the other instructions to obtain a target operation result.
Preferably, the splitting the neural network operation instruction into a convolution instruction and another instruction includes:
and splitting the neural network operation instruction into the convolution instruction and the other instructions according to the channel correlation.
Preferably, the other instructions include a pooling instruction, an activation instruction, a splicing instruction, and a splitting instruction.
Preferably, the obtaining of the neural network operation instruction includes:
acquiring the neural network operation instruction;
the neural network operation instruction comprises a current node number, a father node type, a child node number, a child node type, a batch size, a weight kernel size, a padding number in the height direction, a padding number in the width direction, a stride, an input width, an input height, an input channel number, an output channel number, an input featuremap address, a weight address, a quantization parameter address, an output address and a size of a calculation block.
Preferably, the obtaining of the neural network operation instruction includes:
acquiring a neural network computational graph in a json file format;
and reading the neural network computation graph based on python and analyzing to obtain the neural network computation instruction in a dit format.
A data processing system for use with a hardware accelerator, comprising:
the first acquisition module is used for acquiring a neural network operation instruction;
the first splitting module is used for splitting the neural network operation instruction into a convolution instruction and other instructions;
the second acquisition module is used for acquiring feature data and filter data corresponding to the neural network operation instruction, and blocking the feature data and the filter data to obtain block data;
and the first operation module is used for operating the block data in parallel based on the convolution instruction and the other instructions to obtain a target operation result.
A data processing apparatus comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data processing method as described above when executing the computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data processing method according to any one of the preceding claims.
A hardware accelerator, comprising:
the memory is used for acquiring and storing the neural network operation instruction, feature data and filter data;
the splitter is used for splitting the neural network operation instruction into a convolution instruction and other instructions; partitioning the feature data and the filter data to obtain block data;
the convolution arithmetic unit is used for carrying out parallel arithmetic on the block data based on the convolution instruction to obtain a target arithmetic result;
and the other arithmetic units are used for parallelly operating the block data based on the other instructions to obtain a target operation result.
Preferably, the convolution arithmetic unit is formed based on a DSP array core; the other operators are constructed based on tensor ALUs.
Preferably, the method further comprises the following steps:
and the buffer is used for buffering data.
The data processing method is applied to a hardware accelerator and used for obtaining a neural network operation instruction; splitting a neural network operation instruction into a convolution instruction and other instructions; acquiring feature data and filter data corresponding to the neural network operation instruction, and partitioning the feature data and the filter data to obtain block data; and performing parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result. In the application, after the hardware accelerator acquires the neural network operation instruction, the neural network operation instruction is split into the convolution instruction and other instructions, the neural network operation instruction is partitioned into feature data and filter data corresponding to the feature data to obtain block data, and finally the block data is operated in parallel based on the convolution instruction and other instructions, so that a target operation result can be quickly obtained, and the efficiency is high. The hardware accelerator, the data processing system, the data processing equipment and the computer readable storage medium solve the corresponding technical problems.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a neural network computational graph;
FIG. 3 is a schematic diagram of type structure;
fig. 4 is a schematic structural diagram of a data processing system according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a hardware accelerator according to an embodiment of the present application;
FIG. 6 is a diagram illustrating data transmission of a hardware accelerator according to an embodiment of the present application;
FIG. 7 is a schematic diagram of data processing of a convolution operator;
fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present application;
fig. 9 is another schematic structural diagram of a data processing apparatus according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Referring to fig. 1, fig. 1 is a flowchart of a data processing method according to an embodiment of the present disclosure.
The data processing method provided by the embodiment of the application is applied to a hardware accelerator and can comprise the following steps:
step S101: and acquiring a neural network operation instruction.
In practical applications, the hardware accelerator may obtain the neural network operation instruction first, and the type and content of the neural network operation instruction may be determined according to actual needs, which is not specifically limited herein.
In a specific application scenario, in order to facilitate processing of the neural network operation instruction, the neural network operation instruction can be obtained in the process of obtaining the neural network operation instruction; the neural network operation instruction comprises a current node number, a father node type, a child node number, a child node type, a batch size, a weight kernel size, a padding number in the height direction, a padding number in the width direction, a stride, an input width, an input height, an input channel number, an output channel number, an input feature map (characteristic map) address, a weight address, a quantization parameter address, an output address and a size of a calculation block.
In a specific application scenario, in the process of obtaining the neural network operation instruction, the neural network operation instruction may be obtained based on a neural network computation graph, and specifically, the neural network computation graph in a json (JavaScript Object Notation) file format may be obtained; and reading the neural network computation graph based on python and analyzing to obtain a neural network operation instruction in a ditt (dictionary) format. Among them, python was designed by the institute of mathematics and computer science of netherlands, gibo fanosu, in the early 1990's, as a substitute for the language named ABC.
It should be noted that the generation manner of the Neural Network computation graph may be determined according to actual needs, for example, the Neural Network computation graph may be generated based on open Neural Network exchange ONNX (open Neural Network exchange), that is, for a standard IR (intermediate representation) representation such as ONNX, parameter information of each operation is analyzed, some operations are transformed and fused, for example, operations such as converting shape (property) information of input and output in the operation to corresponding parameters such as input length, width, channel, and kernel size in HW gggraph IR, etc., Batch Normalization (BN), Scale (Scale), add _ bais (increased offset), etc. are fused into a convolution operation, and input and output addresses, context node numbers, sizes of blocks, etc. are calculated, and a model file thereof is converted into a unified Neural Network computation graph supporting hardware applications. Furthermore, ONNX is an open format representing a deep neural network model, introduced by Microsoft and Facebook in 2017, and currently supported by major frameworks such as Caffe2, PyTorch, Apache MXNet, and other frameworks such as Tensorflow, which also have open source scripts to provide conversion.
For the sake of understanding, assuming that the neural network computational graph is shown in fig. 2, the resulting neural network operation instructions may be: { "runid":3, "entities": [1,2], "entries _ type": [0x000,0x000], "children": [4,5], "children _ type": [0x000,0x000], "batch _ size":1, "type": 0x030, "kernel _ size":7, "h _ pad":3, "v _ pad":3, "strand": 2, "input _ width":224, "input _ height":224, "input _ channel":3, "output _ channel":64, "input _ addr": 0x20000,0x30000], "filter _ addr": 0x130000, "quant _ addr": 0x230000, "output _ addr [0x40000]," size _ addr } 64. Where node 3 in the instruction represents an eltwise, node 1 represents the first conv2d input to the eltwise, node 2 represents the second conv2d input to the eltwise, node 4 represents the first conv2d that receives the eltwise output, and node 5 represents the second conv2d that receives the eltwise output.
Where runid denotes a current node number, entries denotes a parent node number, entries _ type denotes a parent node type (e.g., type), child denotes a child node number, child _ type denotes a child node type (e.g., type), bay _ size denotes a bay size, kernel _ size denotes a weight kernel size, h _ pad denotes a number of padding in a height direction, v _ pad denotes a number of padding in a width direction, stride denotes a stride, input _ width denotes an input width, input _ height denotes an input height, input _ channel denotes a number of input channels, output _ channel denotes a number of output channels, input _ addr denotes an input featuremap address, e.g., when a residual is input [0x20000,0x30000], and filter _ addr denotes a weight address; quant _ addr represents the quantization parameter address; output _ addr represents an output address; block _ size represents the size of the computation block, i.e. the number of parallel channels.
It should be noted that the structure of type in the present application can be flexibly determined according to actual needs, for example, the structure of type can be as shown in fig. 3, that is, type can be composed of five-bit binary number, the first four bits can represent the total type of the neural network instruction, the last bit can represent the specific type in the total type, and the specific description can be as shown in table 1.
TABLE 1 type instruction set Specification
Figure 875114DEST_PATH_IMAGE001
It should be noted that the type of the neural network operation instruction may be determined according to actual needs, for example, it may be a binary instruction, and the description thereof may be as shown in table 2, and the like.
TABLE 2 binary instruction type Specification
Figure 228735DEST_PATH_IMAGE002
Wherein InvalidWait represents invalid wait; InputFeatureAddress represents the input characteristic address; InputChannel represents the channel of the input; the OutputChannel represents the channel of the output.
Step S102: and splitting the neural network operation instruction into a convolution instruction and other instructions.
In practical application, after acquiring the neural network operation instruction, the hardware accelerator can split the neural network operation instruction into a convolution instruction and other instructions, and then process the convolution instruction and other instructions in parallel.
In a specific application scenario, because the convolution instruction is related to the channel and other instructions are not related to the channel, in the process of splitting the neural network operation instruction into the convolution instruction and other instructions, the neural network operation instruction can be split into the convolution instruction and other instructions according to the channel correlation.
In a specific application scenario, the other instructions may include instructions other than convolution in the neural network operation process, such as a pooling instruction, an activation instruction, a splicing instruction, a splitting instruction, and the like, and the application is not specifically limited herein.
Step S103: acquiring feature data and filter data corresponding to the neural network operation instruction, and partitioning the feature data and the filter data to obtain block data.
In practical application, the neural network operation instruction cannot be processed without corresponding data, so that after the neural network operation instruction is split into a convolution instruction and other instructions, the hardware accelerator needs to acquire feature data and filter data corresponding to the neural network operation instruction, and block the feature data and the filter data to obtain block data. The specific blocking manner may be determined according to actual needs, and the present application is not specifically limited herein.
Step S104: and performing parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result.
In practical application, after the hardware accelerator acquires feature data and filter data corresponding to a neural network operation instruction, and blocks the feature data and the filter data to obtain block data, the hardware accelerator can operate the block data in parallel based on a convolution instruction and other instructions to obtain a target operation result. It should be noted that, the block data may be operated in parallel based on a plurality of convolution instructions, may also be operated in parallel based on a convolution instruction and other instructions, may also be operated in parallel based on a plurality of other instructions, and the like, which is not limited in this application.
The data processing method is applied to a hardware accelerator and used for obtaining a neural network operation instruction; splitting a neural network operation instruction into a convolution instruction and other instructions; acquiring feature data and filter data corresponding to the neural network operation instruction, and partitioning the feature data and the filter data to obtain block data; and performing parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result. In the application, after the hardware accelerator acquires the neural network operation instruction, the neural network operation instruction is split into the convolution instruction and other instructions, the neural network operation instruction is partitioned into feature data and filter data corresponding to the feature data to obtain block data, and finally the block data is operated in parallel based on the convolution instruction and other instructions, so that a target operation result can be quickly obtained, and the efficiency is high.
Referring to fig. 4, fig. 4 is a schematic structural diagram of a data processing system according to an embodiment of the present disclosure.
The data processing system provided by the embodiment of the application is applied to a hardware accelerator, and may include:
a first obtaining module 101, configured to obtain a neural network operation instruction;
the first splitting module 102 is configured to split the neural network operation instruction into a convolution instruction and another instruction;
the second obtaining module 103 is configured to obtain feature data and filter data corresponding to the neural network operation instruction, and block the feature data and the filter data to obtain block data;
and the first operation module 104 is configured to perform parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result.
The data processing system provided in the embodiment of the present application is applied to a hardware accelerator, and the first splitting module may include:
and the first splitting unit is used for splitting the neural network operation instruction into a convolution instruction and other instructions according to the channel correlation.
The data processing system provided by the embodiment of the application is applied to a hardware accelerator, and other instructions comprise a pooling instruction, an activating instruction, a splicing instruction and a splitting instruction.
The data processing system provided in the embodiment of the present application is applied to a hardware accelerator, and the first obtaining module may include:
the first acquisition unit is used for acquiring a neural network operation instruction;
the neural network operation instruction comprises a current node number, a father node type, a child node number, a child node type, a batch size, a weight kernel size, a padding number in the height direction, a padding number in the width direction, a stride, an input width, an input height, an input channel number, an output channel number, an input featuremap address, a weight address, a quantization parameter address, an output address and a size of a calculation block.
The data processing system provided in the embodiment of the present application is applied to a hardware accelerator, and the first obtaining module may include:
the second acquisition unit is used for acquiring the neural network calculation graph in the json file format;
and the first analysis unit is used for reading the neural network computation graph based on python and analyzing to obtain the neural network computation instruction in the ditt format.
Referring to fig. 5 and fig. 6, fig. 5 is a schematic structural diagram of a hardware accelerator according to an embodiment of the present disclosure, and fig. 6 is a schematic data transmission diagram of the hardware accelerator according to the embodiment of the present disclosure.
The hardware accelerator provided in the embodiment of the present application may include:
the memory 11 is used for acquiring and storing a neural network operation instruction, feature data and filter data;
the splitter 12 is used for splitting the neural network operation instruction into a convolution instruction and other instructions; partitioning the feature data and the filter data to obtain block data;
the convolution arithmetic unit 13 is used for carrying out parallel operation on the block data based on a convolution instruction to obtain a target operation result;
and the other arithmetic unit 14 is used for carrying out parallel operation on the block data based on other instructions to obtain a target operation result.
In the hardware accelerator provided by the embodiment of the application, the convolution arithmetic unit can be formed based on a DSP array core; other operators may be constructed based on tensor ALU (arithmetic and logic unit).
In a specific application scenario, the splitter, the convolution operator and other operators may communicate with each other via a fifo (First Input First output) queue and a Memory block of a single writer/single reader SRAM (Static Random-Access Memory), so as to implement task-level pipeline parallelism. In addition, as shown in fig. 6, the convolution element calculation can be divided into a plurality of blocks to participate in calculation, after the Block1 calculation is completed, other modules can process the data Block of Block1, as shown in fig. 7, after the Block1 data Block of CONV1 is completed, other modules can process the data of Block1, so that after the CONV1 operation processing is completed, the CONV2 can process the data Block of Block1, and the processing time of other modules is hidden.
The hardware accelerator provided in the embodiment of the present application may further include: and the buffer is used for buffering data.
It should be noted that the type of the hardware accelerator provided in this application may be determined according to actual needs, for example, the hardware accelerator may be an FPGA (Field Programmable Gate Array), and at this time, the hardware accelerator may perform data interaction with an external CPU (central processing unit) through Runtime (for example, the Runtime may use C + + language to read a device file of the FPGA, add python to access a pybind11 library package of C + +, implement an interface function of python calling the interaction between the CPU and the FPGA, implement different data preprocessing operations for different networks, write data into the FPGA to wait for return information, read a final result, and calculate a performance index of the network. In addition, the pressure of designing a hardware accelerator by a user can be relieved by means of a hardware design template, for example, the hardware design template is set to provide modularization for the user, and the hardware data type, the memory architecture, the core dimension of a Digital Signal Processor (DSP) array, a hardware operator and a pipeline stage can be selectively modified; exposing multiple hardware design variants to a compiler stack facilitates compiler development; the core dimension of the DSP array can be modified to influence the utilization of hardware resources, the input, the weight and the shape of an accumulator tensor of a DSP array core unit are modified, and the number of multipliers to be instantiated and the width required by an SRAM port are directly influenced; in addition, each data type can be customized to a different integer precision: the weight and input type may be 8 bits or less, and the accumulation type may be 32 bits or less; integer precision control allows a user to extend the arithmetic density on a chip when resources are limited.
The application also provides a data processing device and a computer readable storage medium, which have the corresponding effects of the data processing method provided by the embodiment of the application. Referring to fig. 8, fig. 8 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present disclosure.
The data processing device provided by the embodiment of the application comprises a memory 201 and a processor 202, wherein a computer program is stored in the memory 201, and the processor 202 realizes the following steps when executing the computer program:
acquiring a neural network operation instruction;
splitting a neural network operation instruction into a convolution instruction and other instructions;
acquiring feature data and filter data corresponding to the neural network operation instruction, and partitioning the feature data and the filter data to obtain block data;
and performing parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result.
The data processing device provided by the embodiment of the application comprises a memory 201 and a processor 202, wherein a computer program is stored in the memory 201, and the processor 202 realizes the following steps when executing the computer program: and splitting the neural network operation instruction into a convolution instruction and other instructions according to the channel correlation.
The data processing device provided by the embodiment of the application comprises a memory 201 and a processor 202, wherein a computer program is stored in the memory 201, and the processor 202 realizes the following steps when executing the computer program: other instructions include pooling instructions, activation instructions, splicing instructions, splitting instructions.
The data processing device provided by the embodiment of the application comprises a memory 201 and a processor 202, wherein a computer program is stored in the memory 201, and the processor 202 realizes the following steps when executing the computer program: acquiring a neural network operation instruction; the neural network operation instruction comprises a current node number, a father node type, a child node number, a child node type, a batch size, a weight kernel size, a padding number in the height direction, a padding number in the width direction, a stride, an input width, an input height, an input channel number, an output channel number, an input featuremap address, a weight address, a quantization parameter address, an output address and a size of a calculation block.
The data processing device provided by the embodiment of the application comprises a memory 201 and a processor 202, wherein a computer program is stored in the memory 201, and the processor 202 realizes the following steps when executing the computer program: acquiring a neural network computational graph in a json file format; and reading the neural network computation graph based on python and analyzing to obtain the neural network computation instruction in the ditt format.
Referring to fig. 9, another data processing apparatus provided in the embodiment of the present application may further include: an input port 203 connected to the processor 202, for transmitting externally input commands to the processor 202; a display unit 204 connected to the processor 202, for displaying the processing result of the processor 202 to the outside; and the communication module 205 is connected with the processor 202 and is used for realizing the communication between the data processing device and the outside. The display unit 204 may be a display panel, a laser scanning display, or the like; the communication method adopted by the communication module 205 includes, but is not limited to, mobile high definition link technology (HML), Universal Serial Bus (USB), High Definition Multimedia Interface (HDMI), and wireless connection: wireless fidelity technology (WiFi), bluetooth communication technology, bluetooth low energy communication technology, ieee802.11s based communication technology.
A computer-readable storage medium is provided in an embodiment of the present application, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps:
acquiring a neural network operation instruction;
splitting a neural network operation instruction into a convolution instruction and other instructions;
acquiring feature data and filter data corresponding to the neural network operation instruction, and partitioning the feature data and the filter data to obtain block data;
and performing parallel operation on the block data based on the convolution instruction and other instructions to obtain a target operation result.
A computer-readable storage medium is provided in an embodiment of the present application, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps: and splitting the neural network operation instruction into a convolution instruction and other instructions according to the channel correlation.
A computer-readable storage medium provided in an embodiment of the present application stores a computer program, and when executed by a processor, the computer program implements the following steps: other instructions include pooling instructions, activation instructions, splicing instructions, splitting instructions.
A computer-readable storage medium is provided in an embodiment of the present application, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps: acquiring a neural network operation instruction; the neural network operation instruction comprises a current node number, a father node type, a child node number, a child node type, a batch size, a weight kernel size, a padding number in the height direction, a padding number in the width direction, a stride, an input width, an input height, an input channel number, an output channel number, an input featuremap address, a weight address, a quantization parameter address, an output address and a size of a calculation block.
A computer-readable storage medium is provided in an embodiment of the present application, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the following steps: acquiring a neural network computational graph in a json file format; and reading the neural network computation graph based on python and analyzing to obtain the neural network computation instruction in the ditt format.
The computer-readable storage media to which this application relates include Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage media known in the art.
For a description of a hardware accelerator, a data processing system, a device, and a related part in a computer readable storage medium provided in the embodiments of the present application, refer to a detailed description of a corresponding part in a data processing method provided in the embodiments of the present application, and are not described herein again. In addition, parts of the above technical solutions provided in the embodiments of the present application, which are consistent with the implementation principles of corresponding technical solutions in the prior art, are not described in detail so as to avoid redundant description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A data processing method is applied to a hardware accelerator and comprises the following steps:
acquiring a neural network operation instruction;
splitting the neural network operation instruction into a convolution instruction and other instructions;
acquiring feature data and filter data corresponding to the neural network operation instruction, and blocking the feature data and the filter data to obtain block data;
and performing parallel operation on the block data based on the convolution instruction and the other instructions to obtain a target operation result.
2. The method of claim 1, wherein splitting the neural network operation instruction into a convolution instruction and another instruction comprises:
and splitting the neural network operation instruction into the convolution instruction and the other instructions according to the channel correlation.
3. The method of claim 2, wherein the other instructions comprise a pooling instruction, an activation instruction, a stitching instruction, and a splitting instruction.
4. The method of claim 1, wherein said obtaining a neural network operation instruction comprises:
acquiring the neural network operation instruction;
the neural network operation instruction comprises a current node number, a father node type, a child node number, a child node type, a batch size, a weight kernel size, a padding number in the height direction, a padding number in the width direction, a stride, an input width, an input height, an input channel number, an output channel number, an input featuremap address, a weight address, a quantization parameter address, an output address and a size of a calculation block.
5. The method of claim 1, wherein the obtaining the neural network operation instruction comprises:
acquiring a neural network computational graph in a json file format;
and reading the neural network computation graph based on python and analyzing to obtain the neural network computation instruction in a dit format.
6. A data processing system, for application to a hardware accelerator, comprising:
the first acquisition module is used for acquiring a neural network operation instruction;
the first splitting module is used for splitting the neural network operation instruction into a convolution instruction and other instructions;
the second acquisition module is used for acquiring feature data and filter data corresponding to the neural network operation instruction, and blocking the feature data and the filter data to obtain block data;
and the first operation module is used for operating the block data in parallel based on the convolution instruction and the other instructions to obtain a target operation result.
7. A data processing apparatus, characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data processing method according to any one of claims 1 to 5 when executing the computer program.
8. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 5.
9. A hardware accelerator, comprising:
the memory is used for acquiring and storing the neural network operation instruction, feature data and filter data;
the splitter is used for splitting the neural network operation instruction into a convolution instruction and other instructions; partitioning the feature data and the filter data to obtain block data;
the convolution arithmetic unit is used for carrying out parallel arithmetic on the block data based on the convolution instruction to obtain a target arithmetic result;
and the other arithmetic units are used for parallelly operating the block data based on the other instructions to obtain a target operation result.
10. The hardware accelerator of claim 9 wherein the convolution operator is constructed based on a DSP array kernel; the other operators are constructed based on tensor ALUs.
11. The hardware accelerator of claim 9, further comprising:
and the buffer is used for buffering data.
CN202210340279.2A 2022-04-02 2022-04-02 Hardware accelerator, data processing method, system, equipment and medium Pending CN114492781A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210340279.2A CN114492781A (en) 2022-04-02 2022-04-02 Hardware accelerator, data processing method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210340279.2A CN114492781A (en) 2022-04-02 2022-04-02 Hardware accelerator, data processing method, system, equipment and medium

Publications (1)

Publication Number Publication Date
CN114492781A true CN114492781A (en) 2022-05-13

Family

ID=81488985

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210340279.2A Pending CN114492781A (en) 2022-04-02 2022-04-02 Hardware accelerator, data processing method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114492781A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982530A (en) * 2023-03-13 2023-04-18 苏州浪潮智能科技有限公司 Accelerator operation control method, system, storage medium, device and equipment
CN116167425A (en) * 2023-04-26 2023-05-26 浪潮电子信息产业股份有限公司 Neural network acceleration method, device, equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN107329734A (en) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing convolutional neural networks forward operation
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN111488983A (en) * 2020-03-24 2020-08-04 哈尔滨工业大学 Lightweight CNN model calculation accelerator based on FPGA
CN112905954A (en) * 2020-12-28 2021-06-04 北京计算机技术及应用研究所 CNN model convolution operation accelerated calculation method using FPGA BRAM

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107329734A (en) * 2016-04-29 2017-11-07 北京中科寒武纪科技有限公司 A kind of apparatus and method for performing convolutional neural networks forward operation
CN107239824A (en) * 2016-12-05 2017-10-10 北京深鉴智能科技有限公司 Apparatus and method for realizing sparse convolution neutral net accelerator
CN109934339A (en) * 2019-03-06 2019-06-25 东南大学 A kind of general convolutional neural networks accelerator based on a dimension systolic array
CN111488983A (en) * 2020-03-24 2020-08-04 哈尔滨工业大学 Lightweight CNN model calculation accelerator based on FPGA
CN112905954A (en) * 2020-12-28 2021-06-04 北京计算机技术及应用研究所 CNN model convolution operation accelerated calculation method using FPGA BRAM

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
DENG L等: "Model compression and hardware acceleration for neural networks: A comprehensive survey", 《PROCEEDINGS OF THE IEEE》 *
MITTAL S: "A survey of FPGA-based accelerators for convolutional neural networks", 《NEURAL COMPUTING AND APPLICATIONS》 *
YIN J Y等: "A CNN accelerator on embedded FPGA using dynamic reconfigurable coprocessor等", 《PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, INFORMATION PROCESSING AND CLOUD COMPUTING》 *
尹文枫等: "卷积神经网络压缩与加速技术研究进展", 《计算机系统应用》 *
徐欣等: "一种高度并行的卷积神经网络加速器设计方法", 《哈尔滨工业大学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982530A (en) * 2023-03-13 2023-04-18 苏州浪潮智能科技有限公司 Accelerator operation control method, system, storage medium, device and equipment
CN116167425A (en) * 2023-04-26 2023-05-26 浪潮电子信息产业股份有限公司 Neural network acceleration method, device, equipment and medium
CN116167425B (en) * 2023-04-26 2023-08-04 浪潮电子信息产业股份有限公司 Neural network acceleration method, device, equipment and medium

Similar Documents

Publication Publication Date Title
US10338925B2 (en) Tensor register files
US20220012575A1 (en) Methods and apparatus for localized processing within multicore neural networks
US10372456B2 (en) Tensor processor instruction set architecture
CN114492781A (en) Hardware accelerator, data processing method, system, equipment and medium
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
US20210241095A1 (en) Deep learning processing apparatus and method, device and storage medium
CN107944545B (en) Computing method and computing device applied to neural network
US11816574B2 (en) Structured pruning for machine learning model
Daghero et al. Energy-efficient deep learning inference on edge devices
US20200226458A1 (en) Optimizing artificial neural network computations based on automatic determination of a batch size
Zhou et al. Addressing sparsity in deep neural networks
Lin et al. Accelerating large sparse neural network inference using GPU task graph parallelism
CN117032807A (en) AI acceleration processor architecture based on RISC-V instruction set
Trujillo et al. GSGP-CUDA—a CUDA framework for geometric semantic genetic programming
US20240037179A1 (en) Data processing method and apparatus
CN109902821B (en) Data processing method and device and related components
US20200192797A1 (en) Caching data in artificial neural network computations
JP2020021208A (en) Neural network processor, neural network processing method, and program
Mohaidat et al. A Survey on Neural Network Hardware Accelerators
CN114972775A (en) Feature processing method, feature processing device, feature processing product, feature processing medium, and feature processing apparatus
US20220108156A1 (en) Hardware architecture for processing data in sparse neural network
CN114365151A (en) Neural network model transformation method, device, server and storage medium
US20230325464A1 (en) Hpc framework for accelerating sparse cholesky factorization on fpgas
US20220051095A1 (en) Machine Learning Computer
US20230259703A1 (en) Electronic device and method for controlling the electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20220513

RJ01 Rejection of invention patent application after publication