CN112200305A - Neural network acceleration coprocessor, processing system and processing method - Google Patents

Neural network acceleration coprocessor, processing system and processing method Download PDF

Info

Publication number
CN112200305A
CN112200305A CN202011069950.1A CN202011069950A CN112200305A CN 112200305 A CN112200305 A CN 112200305A CN 202011069950 A CN202011069950 A CN 202011069950A CN 112200305 A CN112200305 A CN 112200305A
Authority
CN
China
Prior art keywords
instruction
data
coprocessor
input
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011069950.1A
Other languages
Chinese (zh)
Inventor
张树华
仝杰
张鋆
赵传奇
王辰
张明皓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Electric Power Research Institute of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
China Electric Power Research Institute Co Ltd CEPRI
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, China Electric Power Research Institute Co Ltd CEPRI filed Critical State Grid Corp of China SGCC
Priority to CN202011069950.1A priority Critical patent/CN112200305A/en
Publication of CN112200305A publication Critical patent/CN112200305A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/22Microcontrol or microprogram arrangements
    • G06F9/28Enhancement of operational speed, e.g. by using several microcontrol devices operating in parallel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)

Abstract

A neural network accelerates the coprocessor, processing system and processing method, the system includes the coprocessor, also include main processor and memorizer; the main processor is used for sending an expansion instruction; the memory is used for storing data; the coprocessor is used for receiving an expansion instruction sent by the main processor, reading input data from the memory according to the received expansion instruction, performing neural network calculation on the input data to obtain output data, storing the output data into the memory, and processing time-consuming operation in the convolutional neural network by the coprocessor through setting the coprocessor, wherein the main processor controls the coprocessor to perform neural network calculation on the input data through the expansion instruction, so that the utilization rate of a CPU is reduced, and compared with pure software, the efficiency of convolutional operation is improved by more than 20 times.

Description

Neural network acceleration coprocessor, processing system and processing method
Technical Field
The invention relates to the field of artificial intelligence and chip design, in particular to a neural network acceleration coprocessor, a processing system and a processing method.
Background
Convolutional layers are the core computation modules in convolutional neural networks, and usually the convolutional layer computation amount accounts for more than 90% of the whole convolutional network computation. Fig. 1 shows a process of convolution calculating an output feature map, each input feature map corresponds to a convolution kernel, the dotted boxes of different colors in the input map correspond to different outputs, and each output is obtained by adding products of the same position of different input maps and the convolution kernel. Each output is an input local information processing result and reflects local characteristic information, and the same input characteristic diagram uses the same convolution kernel for characteristic processing, which is a weight sharing mechanism in a convolution network. The convolution layer can complete the local feature extraction of the input feature map, and complete the processing of the whole input feature map through the sliding calculation of the convolution kernel on the input feature map.
The calculation formula of the convolutional layer is as formula 1,
Figure BDA0002711927070000011
wherein out (f)0X, y) denotes the f-th0The values corresponding to the x and y positions in the output feature map, W is the convolution kernel weight matrix, in represents the input feature map, b represents the bias of the convolution layer, K is the convolution kernel size, and S is the sliding step size of the convolution kernel.
As can be seen from formula 1, the convolution includes a large number of multiply-add operations, the calculation amount is very large, the calculation efficiency is very low when the convolution is implemented in a pure software manner, and a neural network acceleration algorithm and an external hardware processor need to be developed to cooperate with an AI instruction to implement algorithm acceleration.
Disclosure of Invention
The embodiment of the invention provides a neural network acceleration coprocessor, a processing system and a processing method, which are used for solving the problem of very low operation efficiency of processing convolution layer calculation by a pure software mode at present.
The embodiment of one aspect of the invention provides a neural network acceleration coprocessor, which comprises a control module, an address generation module, a multiply-accumulate module and an output saturation module;
the address generation module is used for matching storage addresses for input data and corresponding output data;
the multiply-accumulate module is used for carrying out neural network convolution operation;
the output saturation module is used for limiting the range of output data and outputting an operation result;
the control module is used for receiving an expansion instruction sent by the main processor, controlling the address generation module to match addresses of input data and corresponding output data according to the expansion instruction, reading data from the memory according to the matching addresses, controlling the multiplication and accumulation module to carry out convolution calculation on the read data, controlling the output saturation module to output a calculation result, and storing the output result into the memory according to the matched output data address.
Preferably, the extended instruction comprises a configuration instruction for initializing convolution parameters and an operation instruction for executing convolution operation;
the configuration instruction is a single-cycle instruction and is used for configuring addresses and parameters of input and output data and parameters of a convolution kernel;
the operation instruction is a variable multi-cycle instruction, and the execution cycle number of the instruction is determined by the parameter of the convolution operation set in the primary instruction.
Preferably, in any one of the above embodiments, the configuration instruction includes the following first to sixth instructions;
the first instruction is used for setting the number of input and output tensor channels;
the second instruction is used for setting the sizes of input and output tensors;
the third instruction is used for setting the size and the step size of the convolution kernel;
the fourth instruction is used for setting a filling size and a filter weight data starting address;
the fifth instruction is used for setting the initial addresses of input and output data;
the sixth instruction is used to set the offset and offset data start address.
In any of the foregoing embodiments, it is preferable that the control module further decodes the extended instruction, reads an operand according to the decoded extended instruction, and writes the read operand into the register; the operand is used for transmitting the storage address of the input data in the memory; input data is read from the memory according to the deposit address pointed to by the operand.
The invention also provides a neural network accelerated processing system, which comprises the coprocessor, a main processor and a memory; the above-mentioned
A memory for storing data;
the main processor is used for sending an expansion instruction;
the coprocessor is used for receiving the expansion instruction sent by the main processor, reading input data from the memory according to the received expansion instruction, performing neural network calculation on the input data to obtain output data, and storing the output data into the memory.
The invention also provides a neural network acceleration processing method, which is applied to the processing system and comprises the following steps:
the main processor sends an expansion instruction to the coprocessor;
the coprocessor receives an expansion instruction sent by the main processor, reads input data from the memory according to the received expansion instruction, performs neural network operation on the read input data to obtain output data, and writes the output data into the memory;
and the main processor reads the output data stored by the coprocessor from the memory to complete the neural network algorithm processing.
In any of the above embodiments, preferably, the main processor sends an expansion instruction to the coprocessor, and controls the coprocessor to perform configuration of initialization convolution parameters and execution of convolution operation;
the initialization convolution parameter configuration comprises addresses and parameters of single-period configuration input and output data and parameters of a convolution kernel; the execution period of the convolution operation is determined by the parameters of the convolution operation set in the primary instruction.
In any of the above embodiments, preferably, when configuring the initialization convolution parameters, the method includes the following operations:
setting the number of input and output tensor channels; setting the sizes of input and output tensors; setting the size and step length of a convolution kernel; setting a filling size and a filter weight data start address; setting initial addresses of input and output data; an offset and a deviation data start address are set.
Preferably, in any one of the above embodiments, the method further includes writing each extended instruction according to the following encoding format:
a. a first bit interval of the instruction is an Opcode coding segment; b. three bits are set for controlling whether reading the source register and writing the destination register is required.
In any of the above embodiments, preferably, when the received extended instruction is used to read input data from the memory, the coprocessor decodes the extended instruction, reads an operand according to the decoded extended instruction, and writes the read operand into the register; the operand is used for transmitting the storage address of the input data in the memory; input data is read from the memory according to the deposit address pointed to by the operand.
Advantageous effects
1. According to the neural network accelerated processing method, the coprocessor and the processing system, when the neural network is processed, the coprocessor is arranged, time-consuming operation in the convolutional neural network is processed by the coprocessor, the main processor controls the coprocessor to perform neural network calculation on input data through an expansion instruction, the utilization rate of a CPU is reduced, and compared with pure software, the efficiency of convolutional operation is improved by more than 20 times;
2. when the coprocessor is configured through the extended instruction, the method simplifies the extended mode of the coprocessor instruction set, sets 7 extended instructions, initializes parameters and executes convolution operation, is simple in algorithm, improves the robustness and stability of the system, and meets the flexible and changeable calculation requirements of the algorithm.
3. When the coprocessor decodes the extended instruction, the operand can be directly read out and sent to the register, the operand is used for carrying out address transmission, and compared with a direct read-write data address, the read-write speed is higher, and the utilization rate of a buffer structure is higher.
4. According to the expansion instruction coding format, a first bit interval (lower seven bits) of an instruction is used as an Opcode coding section, each instruction group and an extra coding space of the coding section are used, more coprocessor instructions can be coded, and the expandability of the coprocessor is greatly improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate embodiments of the invention and, together with the description, serve to explain the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic diagram of the input and output characteristics of a convolutional neural network model proposed in the background of the present application;
fig. 2 is a block diagram of a neural network acceleration coprocessor according to an embodiment of the present disclosure;
fig. 3 is a block diagram of a neural network processing system according to an embodiment of the present disclosure;
fig. 4 is a flowchart of a neural network acceleration processing method provided in an embodiment of the present application;
FIG. 5 is a flow chart illustrating the execution of a convolution calculation coprocessor according to an embodiment of the present application;
FIG. 6 is a schematic diagram of another exemplary accelerated data computing system of the present application;
Detailed Description
The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings. It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.
The following detailed description is exemplary in nature and is intended to provide further details of the invention. Unless otherwise defined, all technical terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of exemplary embodiments according to the invention.
As shown in fig. 2, an embodiment of an aspect of the present invention provides a neural network acceleration coprocessor, which includes a control module 301, an address generation module 302, a multiply-accumulate module 303, and an output saturation module 304;
the address generation module 302 is configured to match storage addresses for input data and corresponding output data;
the multiply-accumulate module 303 is used for performing neural network convolution operation;
the output saturation module 304 is configured to limit a range of output data and output an operation result;
the control module 301 is configured to receive an expansion instruction sent by the main processor, control the address generation module to match addresses of input data and corresponding output data according to the expansion instruction, read data from the memory according to the matching addresses, control the multiply-accumulate module to perform convolution calculation on the read data, control the output saturation module to output a calculation result, and store the output result in the memory according to the matched output data address.
The expansion instruction comprises a configuration instruction for initializing convolution parameters and an operation instruction for executing convolution operation; further, the configuration instruction is a single-cycle instruction and is used for configuring addresses and parameters of input and output data and parameters of a convolution kernel;
the operation instruction is a variable multi-cycle instruction, and the execution cycle number of the instruction is determined by the parameter of the convolution operation set in the first-stage instruction.
The configuration instructions include the following first to sixth instructions;
the first instruction is used for setting the number of input and output tensor channels;
the second instruction is used for setting the sizes of input and output tensors;
the third instruction is used for setting the size and the step size of the convolution kernel;
the fourth instruction is used for setting a filling size and a filter weight data starting address;
the fifth instruction is used for setting the initial addresses of input and output data;
the sixth instruction is used to set the offset and offset data start address.
As shown in fig. 5, in the present embodiment, 7 expansion instructions are defined for the implementation of the convolution coprocessor, where 6 configuration instructions are INIT _ CH for setting the number of channels of the input/output tensor, INIT _ IM for setting the size of the input/output tensor, INIT _ FS for setting the size and step size of the filter kernel, INIT _ PW for setting the fill size and the start address of the filter weight data, INIT _ ddr ima for setting the start address of the input/output data, and INIT _ BIAS for setting the start address of the offset and the offset data, respectively; instructions for initializing convolution parameters; several instructions for parameter initialization are single-cycle instructions, and when the corresponding instruction is received, the operand is read out and sent to the register for subsequent calculation.
Specifically, the control module decodes the extended instruction, reads an operand according to the decoded extended instruction, and writes the read operand into a register; the operand is used for transmitting the storage address of the input data in the memory; input data is read from the memory according to the deposit address pointed to by the operand.
The operation instruction is an instruction for performing a convolution operation, which is a LOOP instruction. The LOOP instruction for performing the convolution operation is a variable multi-cycle instruction, and the execution cycle number of the instruction is determined by the parameter related to the convolution operation.
The specific definition of each instruction is shown in table 1.
Figure BDA0002711927070000071
Figure BDA0002711927070000081
Table 1 list of instructions for convolution coprocessor
In the above table, Opcode refers to the code segment, using the code custom-0, custom-1, custom-2, and custom-3 instruction groups. The xs1, xs2, and xd bits are used to control whether reading the source register and writing the destination register, respectively, is required. The funct7 interval can be used as extra coding space for coding more instructions, so a Custom instruction group can code 128 instructions by using the funct7 interval.
In the embodiment, by arranging the coprocessor, after the coprocessor receives the extended instruction, the time-consuming operation in the convolutional neural network is processed, the neural network convolution calculation is performed on the input data, the convolution, pooling and activation operations of the convolutional neural network are flexibly combined, and the method is suitable for various light convolutional neural networks.
As shown in fig. 3, the present invention further provides a neural network accelerated processing system, which includes the above coprocessor 3, further includes a main processor 1 and a memory 2; the memory 2 is used for storing data; a main processor 1, configured to send an expansion instruction; and the coprocessor 3 is used for receiving the expansion instruction sent by the main processor, reading input data from the memory according to the received expansion instruction, performing neural network calculation on the input data to obtain output data, and storing the output data into the memory.
As shown in fig. 4, the present invention further provides a neural network acceleration processing method, which is applied to the processing system, and includes the following steps:
s1, the main processor sends an expansion instruction to the coprocessor;
s2, the coprocessor receives the expansion instruction sent by the main processor, reads input data from the memory according to the received expansion instruction, performs neural network operation on the read input data to obtain output data, and writes the output data into the memory;
and S3, the main processor reads the output data stored by the coprocessor from the memory to complete the neural network algorithm processing.
Wherein in S1 or S2, the extended instruction includes a configuration instruction for initializing convolution parameters and an operation instruction for performing a convolution operation; the configuration instruction is a single-cycle instruction and is used for configuring addresses and parameters of input and output data and parameters of a convolution kernel; the operation instruction is a variable multi-cycle instruction, and the execution cycle number of the instruction is determined by the parameter of the convolution operation set in the first-stage instruction.
When configured, the method comprises the following operations: setting the number of input and output tensor channels; setting the sizes of input and output tensors; setting the size and step length of a convolution kernel; setting a filling size and a filter weight data start address; setting initial addresses of input and output data; an offset and a deviation data start address are set.
As shown in table 1 above, in this embodiment, 7 expansion instructions are defined for the implementation of the convolution coprocessor, where 6 configuration instructions are INIT _ CH for setting the number of channels of the input/output tensor, INIT _ IM for setting the size of the input/output tensor, INIT _ FS for setting the size and step size of the filter kernel, INIT _ PW for setting the fill size and the start address of the filter weight data, INIT _ ddr for setting the start address of the input/output data, and INIT _ BIAS for setting the offset and the start address of the offset data, respectively; instructions for initializing convolution parameters; several instructions for parameter initialization are single-cycle instructions, and when the corresponding instruction is received, the operand is read out and sent to the register for subsequent calculation. The operation instruction is an instruction for performing a convolution operation, which is a LOOP instruction. The LOOP instruction for performing the convolution operation is a variable multi-cycle instruction, and the execution cycle number of the instruction is determined by the parameter related to the convolution operation.
As shown in the above table, the encoding format of the extended instruction includes 1, and the first bit interval of the instruction is an Opcode encoding segment; 2. three bits are set for controlling whether reading the source register and writing the destination register is required. Specifically, in the above table, Opcode refers to the code segment, using the code custom-0, custom-1, custom-2, and custom-3 instruction groups. The first bit to the 6 th bit are generally adopted, the first bit interval is used as an Opcode coding segment, and the xs1, xs2 and xd bits are respectively used for controlling whether reading the source register and writing the target register are needed. The funct7 interval can be used as extra coding space for coding more instructions, so a Custom instruction group can code 128 instructions by using the funct7 interval.
When the method is applied to the system for testing, the embodiment shown in FIG. 6 is obtained, and the Cifar-10 data set is the data set collected by two of the hiton's hiki Alex Krizhevsky and Ilya Sutskeeper for the convenience of general object recognition. The data set contains 10 types of image sets, airplane, car, bird, cat, deer, dog, frog, horse, boat and truck, respectively, for a total of 60000 pictures, wherein each type of image set contains 6000 pictures. Each picture in the data set has a pixel size of 32 x 32 and the number of channels is RGB3 channels.
The results of the treatment are shown in the following table,
treatment stage Coprocessor-based Pure software Acceleration ratio
Convolution with a bit line 4675254 94595789 20.23
Table 2 is a list of the results of the treatment
In this embodiment, by comparison in the convolution calculation stage, the current pure software processing method requires 94595789 cycles, whereas the coprocessor-based processing method provided by the embodiment of the present invention requires 4675254 cycles, and the speed-up ratio is 20.23.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be appreciated by those skilled in the art that the invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The embodiments disclosed above are therefore to be considered in all respects as illustrative and not restrictive. All changes which come within the scope of or equivalence to the invention are intended to be embraced therein.

Claims (10)

1. A neural network acceleration coprocessor is characterized by comprising a control module, an address generation module, a multiplication and accumulation module and an output saturation module;
the control module is used for receiving an expansion instruction sent by the main processor, controlling the address generation module to match addresses of input data and corresponding output data according to the expansion instruction, reading data from the memory according to the matched addresses, controlling the multiplication and accumulation module to carry out convolution calculation on the read data, controlling the output saturation module to output a calculation result, and storing the output result into the memory according to the matched output data address;
the address generation module is used for matching storage addresses for input data and corresponding output data;
the multiply-accumulate module is used for carrying out neural network convolution operation;
the output saturation module is used for limiting the range of output data and outputting an operation result.
2. The coprocessor of claim 1, wherein the expansion instructions include configuration instructions for initializing convolution parameters and operation instructions for performing convolution operations;
the configuration instruction is a single-cycle instruction and is used for configuring addresses and parameters of input and output data and parameters of a convolution kernel;
the operation instruction is a variable multi-cycle instruction, and the execution cycle number of the instruction is determined by the parameter of the convolution operation set in the primary instruction.
3. The coprocessor of claim 2, wherein the configuration instructions include first to sixth instructions;
the first instruction is used for setting the number of input and output tensor channels;
the second instruction is used for setting the sizes of input and output tensors;
the third instruction is used for setting the size and the step size of the convolution kernel;
the fourth instruction is used for setting a filling size and a filter weight data starting address;
the fifth instruction is used for setting the initial addresses of input and output data;
the sixth instruction is used to set the offset and offset data start address.
4. The coprocessor of claim 1, further comprising a control module that decodes the extended instruction, reads operands according to the decoded extended instruction, and writes the read operands to a register; the operand is used for transmitting the storage address of the input data in the memory; input data is read from the memory according to the deposit address pointed to by the operand.
5. A neural network accelerated processing system, comprising the coprocessor of any one of claims 1-4, further comprising a host processor and a memory;
the memory is used for storing data;
the main processor is used for sending an expansion instruction;
the coprocessor is used for receiving the expansion instruction sent by the main processor, reading input data from the memory according to the received expansion instruction, performing neural network calculation on the input data to obtain output data, and storing the output data into the memory.
6. A neural network acceleration processing method, applied to the processing system of claim 5, comprising the steps of:
the main processor sends an expansion instruction to the coprocessor;
the coprocessor receives an expansion instruction sent by the main processor, reads input data from the memory according to the received expansion instruction, performs neural network operation on the read input data to obtain output data, and writes the output data into the memory;
and the main processor reads the output data stored by the coprocessor from the memory to complete the neural network algorithm processing.
7. An accelerated processing method according to claim 6, characterized in that: the main processor sends an expansion instruction to the coprocessor and controls the coprocessor to carry out configuration of initialization convolution parameters and execution of convolution operation;
the initialization convolution parameter configuration comprises addresses and parameters of single-period configuration input and output data and parameters of a convolution kernel; the execution period of the convolution operation is determined by the parameters of the convolution operation set in the primary instruction.
8. An accelerated processing method according to claim 7, characterized in that: when the initialization convolution parameters are configured, the method comprises the following operations:
setting the number of input and output tensor channels;
setting the sizes of input and output tensors;
setting the size and step length of a convolution kernel;
setting a filling size and a filter weight data start address;
setting initial addresses of input and output data;
an offset and a deviation data start address are set.
9. An accelerated processing method according to claim 8, further comprising writing each extended instruction in an encoding format as follows:
a. a first bit interval of the instruction is an Opcode coding segment;
b. three bits are set for controlling whether reading the source register and writing the destination register is required.
10. An accelerated processing method according to claim 6, when reading the input data from the memory according to the received expansion instruction, further comprising
The coprocessor decodes the extended instruction, reads an operand according to the decoded extended instruction, and writes the read operand into a register; the operand is used for transmitting the storage address of the input data in the memory; input data is read from the memory according to the deposit address pointed to by the operand.
CN202011069950.1A 2020-09-30 2020-09-30 Neural network acceleration coprocessor, processing system and processing method Pending CN112200305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011069950.1A CN112200305A (en) 2020-09-30 2020-09-30 Neural network acceleration coprocessor, processing system and processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011069950.1A CN112200305A (en) 2020-09-30 2020-09-30 Neural network acceleration coprocessor, processing system and processing method

Publications (1)

Publication Number Publication Date
CN112200305A true CN112200305A (en) 2021-01-08

Family

ID=74013636

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011069950.1A Pending CN112200305A (en) 2020-09-30 2020-09-30 Neural network acceleration coprocessor, processing system and processing method

Country Status (1)

Country Link
CN (1) CN112200305A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283593A (en) * 2021-05-25 2021-08-20 思澈科技(上海)有限公司 Convolution operation coprocessor and fast convolution method based on same
CN113743599A (en) * 2021-08-08 2021-12-03 苏州浪潮智能科技有限公司 Operation device and server of convolutional neural network
WO2023123648A1 (en) * 2021-12-29 2023-07-06 杭州万高科技股份有限公司 Convolutional neural network acceleration method and system based on cortex-m processor, and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109857460A (en) * 2019-02-20 2019-06-07 南京华捷艾米软件科技有限公司 Matrix convolution calculation method, interface, coprocessor and system based on RISC-V framework
CN109997154A (en) * 2017-10-30 2019-07-09 上海寒武纪信息科技有限公司 Information processing method and terminal device
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework
CN111126583A (en) * 2019-12-23 2020-05-08 中国电子科技集团公司第五十八研究所 Universal neural network accelerator

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109997154A (en) * 2017-10-30 2019-07-09 上海寒武纪信息科技有限公司 Information processing method and terminal device
CN109857460A (en) * 2019-02-20 2019-06-07 南京华捷艾米软件科技有限公司 Matrix convolution calculation method, interface, coprocessor and system based on RISC-V framework
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework
CN111126583A (en) * 2019-12-23 2020-05-08 中国电子科技集团公司第五十八研究所 Universal neural network accelerator

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113283593A (en) * 2021-05-25 2021-08-20 思澈科技(上海)有限公司 Convolution operation coprocessor and fast convolution method based on same
CN113283593B (en) * 2021-05-25 2023-09-12 思澈科技(上海)有限公司 Convolution operation coprocessor and rapid convolution method based on processor
CN113743599A (en) * 2021-08-08 2021-12-03 苏州浪潮智能科技有限公司 Operation device and server of convolutional neural network
CN113743599B (en) * 2021-08-08 2023-08-25 苏州浪潮智能科技有限公司 Computing device and server of convolutional neural network
WO2023123648A1 (en) * 2021-12-29 2023-07-06 杭州万高科技股份有限公司 Convolutional neural network acceleration method and system based on cortex-m processor, and medium

Similar Documents

Publication Publication Date Title
KR102166775B1 (en) Artificial neural network calculating device and method for sparse connection
CN112840356B (en) Operation accelerator, processing method and related equipment
US11308398B2 (en) Computation method
US10860917B2 (en) Apparatus and method for performing a forward operation of artificial neural networks
CN111310904B (en) Apparatus and method for performing convolutional neural network training
CN112200305A (en) Neural network acceleration coprocessor, processing system and processing method
CN113139648B (en) Data layout optimization of PIM architecture executing neural network model
CN118839741A (en) Modifying machine learning models to improve locality
US11610128B2 (en) Neural network training under memory restraint
CN114450699A (en) Method implemented by a processing unit, readable storage medium and processing unit
EP3844610B1 (en) Method and system for performing parallel computation
CN1853164B (en) Combinational method for developing building blocks of DSP compiler
CN114556260A (en) Apparatus and system for performing neural networks
CN111353591A (en) Computing device and related product
CN113743599A (en) Operation device and server of convolutional neural network
CN113807998A (en) Image processing method, target detection device, machine vision equipment and storage medium
CN113052303A (en) Apparatus for controlling data input and output of neural network circuit
CN118014022A (en) Deep learning-oriented FPGA universal heterogeneous acceleration method and equipment
CN114595811A (en) Method and apparatus for performing deep learning operations
CN112906877A (en) Data layout conscious processing in memory architectures for executing neural network models
CN115328440A (en) General sparse matrix multiplication implementation method and device based on 2D systolic array
US20230305844A1 (en) Implementing specialized instructions for accelerating dynamic programming algorithms
TW202416185A (en) Deep fusion of kernel execution
CN117519790A (en) Data processing method and device and AI chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210409

Address after: 100192 Beijing city Haidian District Qinghe small Camp Road No. 15

Applicant after: CHINA ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd.

Applicant after: STATE GRID CORPORATION OF CHINA

Applicant after: STATE GRID JIANGSU ELECTRIC POWER Research Institute

Address before: 100192 Beijing city Haidian District Qinghe small Camp Road No. 15

Applicant before: CHINA ELECTRIC POWER RESEARCH INSTITUTE Co.,Ltd.

Applicant before: STATE GRID CORPORATION OF CHINA

TA01 Transfer of patent application right