CN112015472B - Sparse convolutional neural network acceleration method and system based on data flow architecture - Google Patents
Sparse convolutional neural network acceleration method and system based on data flow architecture Download PDFInfo
- Publication number
- CN112015472B CN112015472B CN202010685107.XA CN202010685107A CN112015472B CN 112015472 B CN112015472 B CN 112015472B CN 202010685107 A CN202010685107 A CN 202010685107A CN 112015472 B CN112015472 B CN 112015472B
- Authority
- CN
- China
- Prior art keywords
- instruction
- data flow
- instructions
- value
- neural network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 35
- 230000001133 acceleration Effects 0.000 title claims abstract description 24
- 238000000034 method Methods 0.000 title claims abstract description 22
- 238000013528 artificial neural network Methods 0.000 claims abstract description 12
- 238000010586 diagram Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 5
- 238000001514 detection method Methods 0.000 abstract description 22
- 238000004364 calculation method Methods 0.000 description 5
- 238000013138 pruning Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 2
- 238000003062 neural network model Methods 0.000 description 2
- 238000012216 screening Methods 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000015556 catabolic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30145—Instruction analysis, e.g. decoding, instruction word fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
- G06F9/30141—Implementation provisions of register files, e.g. ports
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/32—Address formation of the next instruction, e.g. by incrementing the instruction counter
- G06F9/321—Program or instruction counter, e.g. incrementing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/10—Interfaces, programming languages or software development kits, e.g. for simulating neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Devices For Executing Special Programs (AREA)
Abstract
The invention provides a method for detecting invalid instructions and skipping execution in a data flow architecture, which is suitable for accelerating a sparse convolutional neural network under the data flow architecture. The invention comprises a convolution layer and a full connection layer for a sparse neural network. The instruction mark information is generated according to the data characteristics through the instruction compiled by the compiler, the instruction detection unit detects the instruction according to the instruction mark information, and execution of the invalid instruction is skipped, so that acceleration of the sparse convolutional neural network is realized.
Description
Technical Field
The invention relates to the technical field of computer system structures, in particular to a sparse convolutional neural network acceleration method and system based on a data flow architecture.
Background
The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, and along with the complexity of application, the neural network model is also complex, so that many challenges are presented to the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating sparse networks, such as a sparse-blas library for a CPU, a custars library for a GPU, etc., have appeared, which accelerate the execution of sparse networks to some extent, and for dedicated accelerators, have advanced expressive forces in terms of performance, power consumption, etc.
The data flow architecture has wide application in the aspects of big data processing, scientific calculation and the like, and the decoupled algorithm and structure thereof enable the data flow architecture to have good universality and flexibility. The natural parallelism of the data flow architecture is well matched with the parallelism characteristic of the neural network algorithm.
The neural network model also becomes 'large' and 'deep' with the complexity of application, which presents many challenges to the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in terms of calculation, storage, power consumption requirements and the like. However, dedicated accelerators like a CPU, a GPU and an accelerating dense network cannot accelerate the sparse network, and dedicated accelerators for accelerating the sparse network lack flexibility and versatility of architecture due to strong coupling of algorithms and structures, and cannot perform innovation of algorithms.
The data flow architecture is decoupled in algorithm and structure, so that the data flow architecture has good universality and flexibility, the neural network algorithm is mapped in an architecture consisting of a computing array (PE array) in the form of a data flow diagram based on the data flow architecture, the data flow diagram comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow diagram represent the dependency relationship of the nodes.
In the sparse convolutional neural network, the main operation is multiply-add operation, the pruning operation sets the weight value in the network as 0, and because 0 is multiplied by any number to be 0, the data flow diagram compiled by the compiler contains 0-value related invalid instructions, and when the sparse convolutional neural network is executed, loading and execution of the invalid instructions and data exist. The execution of invalid instructions and data not only occupies hardware resources of the accelerator, so that the resource waste is caused, but also the power consumption of the accelerator is increased, the execution time of a computing array (PE) is prolonged, and the performance is reduced.
Disclosure of Invention
Aiming at the problems of resource waste and performance reduction caused by invalid instructions, data loading and execution of a sparse network in a data stream architecture, the invention provides a method and a device for detecting the invalid instructions in the sparse network by analyzing the data and instruction characteristics of the sparse network, so that the execution of the invalid instructions is skipped, and the acceleration of the sparse network is realized.
The convolution neural network operation is mainly multiply-add operation, and for the sparse convolution neural network, pruning operation resets some weights to 0, so that a 0-value related instruction exists in the data flow graph. Pruning refers to the step of setting weights in the convolutional neural network to be based on a set threshold above which the weights remain original, and setting weights below the threshold to be 0. The purpose of pruning operation is to change the dense convolutional neural network into a sparse network by setting some weights to 0 based on the redundancy characteristics of the weight data, thereby achieving the compression of the network. This operation occurs during the data preprocessing phase of the convolutional network, i.e., before the convolutional neural network is executed.
Taking convolution calculations as an example, as shown in FIG. 1, to perform a convolution operation, it can be seen that I of Ifmap is a value to obtain Ofmap 0 -I 8 W of Filter 0 -W 8 Requires Load instructions (Inst 1 -Inst 18 ) Loading data from memory into the memory location of the PE, then also requires a Madd (multiply add) instruction (Inst 19 -Inst 27 ) In these instructions, due to W 2 ,W 3 ,W 5 ,W 7 The values of (2) are all 0, so Inst 12 ,Inst 13 ,Inst 15 ,Inst 17 For an invalid Load instruction, the corresponding Madd instruction Inst 21 ,Inst 22 ,Inst 24 ,Inst 26 Also, as invalid instructions, the loading and execution of these instructions have no effect on the end result, and can be regarded as invalid instructions, but their execution occupies hardware resources and also causes performance degradation. To remove the loading and computation of the 0 value data, the instructions related to the 0 value need to be eliminated. Therefore, the invention provides a method and a device for detecting invalid instructions in a data stream and skipping execution, so as to save computing resources and improve the performance of a sparse network.
Aiming at the defects of the prior art, the invention provides a sparse convolutional neural network acceleration method based on a data stream architecture, which comprises the following steps:
step 1, compiling the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generating instruction marking information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the ineffective instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained.
According to the sparse convolutional neural network acceleration method based on the data flow architecture, the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow graph represent the dependency relationship of the nodes.
According to the sparse convolutional neural network acceleration method based on the data flow architecture, 1 and 0 are used for respectively representing the validity and invalidity of instructions by the instruction marking information.
The sparse convolutional neural network acceleration method based on the data stream architecture, wherein the step 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration method based on the data flow architecture specifically comprises the following steps of: and marking the instructions related to the weight of 0 in the convolution layer and the full connection layer as invalid instructions, and marking the instructions related to the non-0 value as valid instructions.
The invention also provides a sparse convolutional neural network acceleration system based on the data flow architecture, which comprises:
the method comprises the steps that a module 1 compiles operations of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and instruction marking information is generated for each instruction in the data flow graph according to data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the ineffective instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained.
The sparse convolutional neural network acceleration system based on the data flow architecture comprises a plurality of nodes, wherein the nodes comprise a plurality of instructions, and directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the instruction marking information indicates the validity and invalidity of an instruction by using 1 and 0 respectively.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the module 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration system based on the data flow architecture specifically comprises the following steps of: and marking the instructions related to the weight of 0 in the convolution layer and the full connection layer as invalid instructions, and marking the instructions related to the non-0 value as valid instructions.
The advantages of the invention are as follows:
the device reduces the loading and execution of invalid instructions and data in the sparse network, and improves the execution performance of the sparse network.
The invention designs a set of invalid instruction detection device by using a hardware mode aiming at sparse convolutional neural network application, comprising a convolutional layer and a full-connection layer, so as to accelerate the sparse network. The instruction mark information is generated according to the data characteristics of the instruction generated by the compiler, then the instruction is detected according to the instruction mark information by the invalid instruction detection device, the distance between two valid instructions is calculated in a statistical mode, and the execution of the invalid instruction is skipped directly, so that loading and calculation of 0-value data are avoided, and acceleration of the sparse convolutional neural network is achieved.
Drawings
FIG. 1 is a schematic diagram of a convolution operation instruction execution flow;
FIG. 2 is a diagram of an invalid instruction detecting device;
FIG. 3 is a flow chart of invalid instruction detection;
FIG. 4 is a diagram illustrating the generation of instruction tag information;
fig. 5 is a diagram of instruction screening results.
Detailed Description
The invention comprises the following key points:
key point 1, generating mark information of an instruction;
the key point 2, the invalid instruction detection unit detects the invalid instruction and skips execution;
in order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The invention comprises the following steps:
i: generation of instruction tag information
After the compiler compiles the operations of the convolution layer and the full connection layer into a data flow graph, instruction marking information is generated according to the data characteristics to mark whether the instruction is valid or not, and the marking information indicates that the instruction is valid and invalid by using 1 and 0 respectively. The operation of the convolution layer and the full connection layer of the convolution neural network is multiply-add operation, and a compiler compiles the multiply-add operation into a data flow graph. In these layers, the data features of the weights are represented as 0 and non-0, and based on such data features, the flag information of the corresponding instruction is valid, i.e., the flag information of the instruction associated with the 0 value is 0 and the flag information of the instruction associated with the non-0 value is 1.0,1 indicates the invalidity and validity of the instruction, respectively.
II: invalid instruction detection device
As shown in fig. 2, the invalid instruction detecting device includes an instruction tag information buffer (Flag buffer), a PC counter register (PC counter), a reference PC register (PC Base), an instruction buffer module (Instructionbuffer), a PC refers to a program counter, which is used to store an address of a next instruction, and a PC counter refers to a distance between one valid instruction pointed by the PC and the next valid instruction. The purpose and reason for each module design are:
the instruction mark information buffer module is used for marking information of each instruction generated by the data characteristics of the instruction.
PC counter register: for recording the interval between two valid instructions to directly skip the execution of invalid instructions.
Reference PC register: and the PC value is used for storing the PC value of the first effective instruction, and when the fact that a subsequent instruction is effective is detected, the PC value of the next effective instruction is directly obtained by adding the reference PC value and the value of the PC counter for execution by the execution unit.
An instruction cache module: for storing instructions to be executed by the execution unit.
Fig. 3 is a flowchart of a screening process of an instruction by an instruction detection unit, which obtains intervals of two valid instructions by calculating distances between two 1 in instruction tag information Flag, and adds a current PC to the intervals to obtain a PC value of a next valid instruction, so that the PC automatically jumps and transposes the valid instruction position, thereby skipping invalid instructions. In the flow chart, the parameter i records the interval between the instruction to be detected and the instruction pointed by the current PC, flag_id represents the index of the mark information corresponding to the instruction, the instruction detection unit detects the instruction mark information one by one until one instruction mark information is detected to be 1 or all instruction mark information detection is finished, and the flow is finished.
Based on the design, the invention has the advantages that: the execution of invalid instructions is reduced, thereby reducing the number of times instructions are executed, and the accessing and executing of 0-value data. And the computing resources are saved, and the performance of the sparse network is improved.
Specifically, the invention is implemented in the compiling stage to realize the generation of instruction marking information, and an invalid instruction detection device is added in each PE to skip the execution of the invalid instruction, and the execution process of convolution is combined to describe the execution process in further detail.
(1) Generation of instruction tag information
Fig. 4 is a convolution execution of Ifmap of 5*5 and Filter of 3*3 to generate an Ofmap result. In the convolution operation, the instruction mark information of the Ifmap participating in the operation is all 1, the Filter values participating in the operation are 1,1,0,0,2,0,1,0,4 respectively, the mark information (filter_flag) of the corresponding Laod instruction is 1,1,0,0,1,0,1,0,1, when the multiplication and addition operation is executed, the instruction mark information of the multiplication and addition operation is the sum operation of the mark information of the Ifmap and the corresponding position of the Filter, and since the instruction mark information of the Ifmap is all 1, the instruction mark information of the multiplication and addition operation is the same as that of the Filter, namely 1,1,0,0,1,0,1,0,1.
(2) Detection of invalid instructions
Fig. 5 is a valid instruction finally executed after the instruction in fig. 4 is filtered by the instruction detection unit, where Flag is Flag information of a corresponding instruction, and it corresponds to the instruction in fig. 4 one by one.
Step one: initially, inst 1 The Flag value of (1) is 1 so the PC points to Inst 1 To execute Inst 1 An instruction;
step two: after execution, the PC automatically adds 1 to point to Inst 2 Due to Inst 2 Is also 1 so the PC is still pointed to Inst 2 And executing;
step three: doing so until Inst 11 Ending execution;
step four: PC automatic 1-up pointing Inst 12 Because of Inst 12 Is 0, so both i and flag_id in the instruction detection unit are incremented by 1 to detect Inst 13 Validity of the instruction;
step five: and Inst 13 The Flag of (1) is still 0, i and flag_id continue to be added with 1;
step six: at this point Inst 14 Since Flag of (1) is updated to PC+2, the detection ends and PC jumps to Inst 14 Executing at the site;
step seven: the instruction detection unit continues to execute until all instructions in the graph are detected;
step eight: the valid instruction executed by the final PE is Inst 1 -Inst 11 ,Inst 14 ,Inst 16 ,Inst 18-20 ,Inst 23 ,Inst 25 ,Inst 27 。
The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a sparse convolutional neural network acceleration system based on the data flow architecture, which comprises:
the method comprises the steps that a module 1 compiles operations of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and instruction marking information is generated for each instruction in the data flow graph according to data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the ineffective instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained.
The sparse convolutional neural network acceleration system based on the data flow architecture comprises a plurality of nodes, wherein the nodes comprise a plurality of instructions, and directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the instruction marking information indicates the validity and invalidity of an instruction by using 1 and 0 respectively.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the module 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration system based on the data flow architecture specifically comprises the following steps of: and marking the instructions related to the weight of 0 in the convolution layer and the full connection layer as invalid instructions, and marking the instructions related to the non-0 value as valid instructions.
Claims (4)
1. The sparse convolutional neural network acceleration method based on the data stream architecture is characterized by comprising the following steps of:
step 1, compiling the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generating instruction marking information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
step 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of invalid instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained;
the instruction tag information indicates the validity and invalidity of an instruction using 1 and 0, respectively; the process of generating instruction mark information in the step 1 specifically includes: marking the instruction with the weight of 0 value in the convolution layer and the full connection layer as an invalid instruction, and marking the instruction with the non-0 value as a valid instruction;
the step 2 specifically detects the instruction by an invalid instruction detecting device, which includes:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
2. The method for accelerating sparse convolutional neural network based on data flow architecture of claim 1, wherein the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and directed edges of the data flow graph represent dependency relationships of the nodes.
3. A sparse convolutional neural network acceleration system based on a data flow architecture, comprising:
the method comprises the steps that a module 1 compiles operations of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and instruction marking information is generated for each instruction in the data flow graph according to data characteristics of the data flow graph;
the module 2 detects the instruction through the instruction marking information, reserves the effective instruction in the data flow diagram, counts the distance between the two effective instructions, and directly skips the execution of the ineffective instruction according to the distance when the data flow diagram is executed until the processing result of the data flow diagram is obtained;
the instruction tag information indicates the validity and invalidity of an instruction using 1 and 0, respectively; the process of generating instruction tag information in the module 1 specifically includes: marking the instruction with the weight of 0 value in the convolution layer and the full connection layer as an invalid instruction, and marking the instruction with the non-0 value as a valid instruction;
the module 2 detects the instruction specifically by an invalid instruction detecting means including:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
4. The sparse convolutional neural network acceleration system of claim 3, wherein the dataflow graph includes a plurality of nodes including a plurality of instructions, and wherein the directed edges of the dataflow graph represent dependencies of the nodes.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010685107.XA CN112015472B (en) | 2020-07-16 | 2020-07-16 | Sparse convolutional neural network acceleration method and system based on data flow architecture |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010685107.XA CN112015472B (en) | 2020-07-16 | 2020-07-16 | Sparse convolutional neural network acceleration method and system based on data flow architecture |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112015472A CN112015472A (en) | 2020-12-01 |
CN112015472B true CN112015472B (en) | 2023-12-12 |
Family
ID=73499705
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010685107.XA Active CN112015472B (en) | 2020-07-16 | 2020-07-16 | Sparse convolutional neural network acceleration method and system based on data flow architecture |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112015472B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472350A (en) * | 2018-10-30 | 2019-03-15 | 南京大学 | A kind of neural network acceleration system based on block circulation sparse matrix |
CN110991631A (en) * | 2019-11-28 | 2020-04-10 | 福州大学 | Neural network acceleration system based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
CN111368988A (en) * | 2020-02-28 | 2020-07-03 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10489480B2 (en) * | 2017-01-22 | 2019-11-26 | Gsi Technology Inc. | Sparse matrix multiplication in associative memory device |
-
2020
- 2020-07-16 CN CN202010685107.XA patent/CN112015472B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109472350A (en) * | 2018-10-30 | 2019-03-15 | 南京大学 | A kind of neural network acceleration system based on block circulation sparse matrix |
CN110991631A (en) * | 2019-11-28 | 2020-04-10 | 福州大学 | Neural network acceleration system based on FPGA |
CN111062472A (en) * | 2019-12-11 | 2020-04-24 | 浙江大学 | Sparse neural network accelerator based on structured pruning and acceleration method thereof |
CN111368988A (en) * | 2020-02-28 | 2020-07-03 | 北京航空航天大学 | Deep learning training hardware accelerator utilizing sparsity |
Also Published As
Publication number | Publication date |
---|---|
CN112015472A (en) | 2020-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3304363B1 (en) | System for reversible circuit compilation with space constraint, method and program | |
CN105022670A (en) | Heterogeneous distributed task processing system and processing method in cloud computing platform | |
US8789031B2 (en) | Software constructed strands for execution on a multi-core architecture | |
US20120030652A1 (en) | Mechanism for Describing Values of Optimized Away Parameters in a Compiler-Generated Debug Output | |
KR20150052350A (en) | Combined branch target and predicate prediction | |
US20210350230A1 (en) | Data dividing method and processor for convolution operation | |
Chu et al. | Precise cache timing analysis via symbolic execution | |
CN114416045A (en) | Method and device for automatically generating operator | |
US20160147516A1 (en) | Execution of complex recursive algorithms | |
CN112015473B (en) | Sparse convolutional neural network acceleration method and system based on data flow architecture | |
Wen et al. | A swap dominated tensor re-generation strategy for training deep learning models | |
CN107870862B (en) | Construction method, traversal testing method and computing device of new control prediction model | |
CN112015472B (en) | Sparse convolutional neural network acceleration method and system based on data flow architecture | |
Le et al. | Involving cpus into multi-gpu deep learning | |
CN112183744A (en) | Neural network pruning method and device | |
US8381195B2 (en) | Implementing parallel loops with serial semantics | |
CN114791865B (en) | Configuration item self-consistency detection method, system and medium based on relation diagram | |
WO2024000464A1 (en) | Blocking policy generation method and apparatus for tensor computation | |
CN112215349B (en) | Sparse convolutional neural network acceleration method and device based on data flow architecture | |
CN115130672A (en) | Method and device for calculating convolution neural network by software and hardware collaborative optimization | |
Nehmeh et al. | Integer word-length optimization for fixed-point systems | |
Kim et al. | System level power reduction for yolo2 sub-modules for object detection of future autonomous vehicles | |
D’Alberto et al. | Static analysis of parameterized loop nests for energy efficient use of data caches | |
CN118278468B (en) | Deep neural network reasoning method and device based on database management system | |
CN116301903B (en) | Compiler, AI network compiling method, processing method and executing system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |