CN112015472A - Sparse convolution neural network acceleration method and system based on data flow architecture - Google Patents

Sparse convolution neural network acceleration method and system based on data flow architecture Download PDF

Info

Publication number
CN112015472A
CN112015472A CN202010685107.XA CN202010685107A CN112015472A CN 112015472 A CN112015472 A CN 112015472A CN 202010685107 A CN202010685107 A CN 202010685107A CN 112015472 A CN112015472 A CN 112015472A
Authority
CN
China
Prior art keywords
instruction
data flow
neural network
value
sparse
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010685107.XA
Other languages
Chinese (zh)
Other versions
CN112015472B (en
Inventor
吴欣欣
范志华
轩伟
李文明
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010685107.XA priority Critical patent/CN112015472B/en
Publication of CN112015472A publication Critical patent/CN112015472A/en
Application granted granted Critical
Publication of CN112015472B publication Critical patent/CN112015472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30141Implementation provisions of register files, e.g. ports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/321Program or instruction counter, e.g. incrementing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a method for detecting and skipping execution of invalid instructions in a data stream architecture, which is suitable for accelerating a sparse convolution neural network under the data stream architecture. The invention relates to a sparse neural network, which comprises a convolution layer and a full connection layer. The instruction detection unit detects the instruction according to the instruction marking information and skips the execution of the invalid instruction, so that the acceleration of the sparse convolutional neural network is realized.

Description

Sparse convolution neural network acceleration method and system based on data flow architecture
Technical Field
The invention relates to the technical field of computer system structures, in particular to a sparse convolution neural network acceleration method and system based on a data flow architecture.
Background
The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, the neural network model is complicated along with application complexity, a plurality of challenges are provided for traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating a sparse network have appeared, such as a CPU-oriented sparse-blas library, a GPU-oriented custare library, and the like, which accelerate the execution of the sparse network to some extent, and for a dedicated accelerator, have advanced expressive power in terms of performance, power consumption, and the like.
The data flow architecture has wide application in the aspects of big data processing, scientific calculation and the like, and the decoupled algorithm and structure of the data flow architecture enable the data flow architecture to have good universality and flexibility. The natural parallelism of the dataflow architecture matches well the parallel nature of the neural network algorithm.
The neural network model also becomes "large" and "deep" with the complication of the application, which presents many challenges for the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. However, the special accelerator like the CPU, the GPU and the acceleration-intensive network cannot accelerate the sparse network, and the special accelerator for accelerating the sparse network cannot innovate the algorithm because the strong coupling of the algorithm and the structure lacks flexibility and versatility of the architecture.
The data flow architecture decoupling algorithm and structure enable the data flow architecture decoupling algorithm to have good universality and flexibility, based on the data flow architecture, the neural network algorithm is mapped in the architecture formed by a computing array (PE array) in a data flow diagram mode, the data flow diagram comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow diagram represent the dependency relationship of the nodes.
In the sparse convolutional neural network, the main operation is multiplication and addition operation, the weight in the network is set to be 0 through pruning operation, and since 0 is multiplied by any number to be 0, a data flow graph compiled by a compiler contains 0-value related invalid instructions, and when the sparse convolutional neural network is executed, invalid instructions and data are loaded and executed. Executing invalid instructions and data not only occupies hardware resources of the accelerator, causing resource waste, but also causing an increase in power consumption of the accelerator and an increase in execution time of a compute array (PE), resulting in performance degradation.
Disclosure of Invention
Aiming at the problems of resource waste and performance reduction caused by invalid instructions of a sparse network in a data flow architecture and data loading and execution, the invention discloses a method and a device for detecting invalid instructions in the sparse network by analyzing the characteristics of data and instructions of the sparse network, so that the execution of the invalid instructions is skipped, and the acceleration of the sparse network is realized.
The operation of the convolutional neural network is mainly multiplication and addition operation, and for the sparse convolutional neural network, some weights are set to be 0 by pruning operation, so that 0-value-related instructions exist in a data flow graph. Pruning refers to basing the weights in the convolutional neural network on a set threshold, with weights above the threshold retaining the original values and weights below the threshold set to 0. The purpose of the pruning operation is to set some weights to 0 to turn the dense convolutional neural network into a sparse network based on the redundant nature of the weight data, thereby achieving compression of the network. This operation occurs prior to the data pre-processing stage of the convolutional network, i.e., the convolutional neural network is performed.
Taking convolution calculation as an example, as shown in FIG. 1, to perform a convolution operation, it can be seen that to obtain a value of Ofmap, I of Ifmap0-I8And W of Filter0-W8Require Load instructions (Inst)1-Inst18) Data is loaded from memory into the PE's memory cells, followed by a Madd (multiply add) instruction (Inst)19-Inst27) In these instructions, because W2,W3,W5,W7Are all 0, so Inst12,Inst13,Inst15,Inst17For invalid Load instructions, corresponding Madd instruction Inst21,Inst22,Inst24,Inst26And also invalid instructions, the loading and execution of which have no influence on the final result can be regarded as invalid instructions, but the execution of which occupies hardware resources and causes the performance to be reduced. To remove the loading and calculation of 0 value data, the instructions associated with the 0 value need to be eliminated. Therefore, the invention provides a method and a device for detecting invalid instructions and skipping execution in a data stream, so as to save computing resources and improve the performance of a sparse network.
Aiming at the defects of the prior art, the invention provides a sparse convolution neural network acceleration method based on a data flow architecture, which comprises the following steps:
step 1, compiling the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generating instruction mark information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving the effective instruction in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the invalid instruction by the sparse convolutional neural network according to the distance when the data flow diagram is executed until the processing result of the data flow diagram is obtained.
The sparse convolution neural network acceleration method based on the data flow architecture comprises the steps that the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration method based on the data flow architecture is characterized in that the instruction marking information respectively represents the validity and the invalidity of an instruction by using 1 and 0.
The sparse convolutional neural network acceleration method based on the data stream architecture, wherein the step 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
the PC counter register is used for recording the interval between two effective instructions so as to directly skip the execution of the ineffective instruction;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a certain subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for the execution unit to execute;
the instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration method based on the data stream architecture, wherein the process of generating the instruction marking information in the step 1 specifically comprises the following steps: and marking the instruction related to the weight value of 0 in the convolutional layer and the full-link layer as an invalid instruction, and marking the instruction related to the non-0 value as a valid instruction.
The invention also provides a sparse convolution neural network acceleration system based on the data flow architecture, which comprises the following steps:
the method comprises the following steps that a module 1 compiles the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generates instruction mark information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and the module 2 detects the instruction through the instruction marking information, reserves the effective instruction in the data flow diagram, counts the distance between the two effective instructions, and directly skips the execution of the invalid instruction according to the distance when executing the data flow diagram until obtaining the processing result of the data flow diagram.
The sparse convolution neural network acceleration system based on the data flow architecture comprises a data flow graph and a data flow graph, wherein the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration system based on the data flow architecture is characterized in that the instruction marking information respectively represents the validity and the invalidity of an instruction by using 1 and 0.
The sparse convolution neural network acceleration system based on the data flow architecture is characterized in that the module 2 specifically detects an instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
the PC counter register is used for recording the interval between two effective instructions so as to directly skip the execution of the ineffective instruction;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a certain subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for the execution unit to execute;
the instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration system based on the data stream architecture, wherein the process of generating the instruction marking information in the module 1 specifically includes: and marking the instruction related to the weight value of 0 in the convolutional layer and the full-link layer as an invalid instruction, and marking the instruction related to the non-0 value as a valid instruction.
According to the scheme, the invention has the advantages that:
the device reduces the loading and execution of invalid instructions and data in the sparse network, and improves the execution performance of the sparse network.
Aiming at sparse convolutional neural network application, the sparse convolutional neural network detection method comprises a convolutional layer and a full-link layer, and a set of invalid instruction detection device is designed in a hardware mode to accelerate a sparse network. The instruction generated by the compiler generates instruction marking information according to the data characteristics, then the invalid instruction detection device detects the instruction according to the instruction marking information, the distance between two valid instructions is calculated in a statistical mode, and the invalid instruction is directly skipped over to be executed, so that the loading and calculation of 0 value data are avoided, and the acceleration of the sparse convolutional neural network is realized.
Drawings
FIG. 1 is a schematic diagram illustrating a flow chart of a convolution operation instruction;
FIG. 2 is a diagram of an invalid instruction detection apparatus;
FIG. 3 is a flow diagram of invalid instruction detection;
FIG. 4 illustrates generation of instruction tag information;
FIG. 5 is a diagram of instruction screening results.
Detailed Description
The invention comprises the following key points:
the key point 1 generates marking information of the instruction;
the key point 2 is an invalid instruction detection unit which detects invalid instructions and skips execution;
in order to make the aforementioned features and effects of the present invention more comprehensible, embodiments accompanied with figures are described in detail below.
The invention comprises the following steps:
i: generation of instruction marking information
After the compiler compiles the operations of the convolution layer and the full link layer into a data flow graph, instruction marking information is generated according to data characteristics to mark whether the instruction is valid or not, and the marking information respectively represents the validity and the invalidity of the instruction by using 1 and 0. The operations of the convolution layer and the full connection layer of the convolution neural network are multiplication and addition operations, and the compiler compiles the multiplication and addition operations into a data flow graph. In these layers, the weighted data characteristics are expressed as 0 in some cases and not 0 in some cases, and based on such data characteristics, whether or not the corresponding instruction is marked valid is marked, that is, the marking information of the instruction related to the value 0 is 0, and the marking information of the instruction related to the value not 0 is 1. 0, 1 represent invalid and valid instructions, respectively.
II: invalid command detection device
As shown in fig. 2, the invalid instruction detecting apparatus includes an instruction Flag information buffer module (Flag buffer), a PC counter register (PC counter), a reference PC register (PC Base), and an instruction buffer module (instruction buffer), where the PC refers to a program counter and is an address for storing a next instruction, and the PC counter refers to a distance between an effective instruction pointed by the PC and a next effective instruction. The purpose and reason for each module design are:
and the instruction marking information caching module is used for generating marking information of each instruction according to the data characteristics of the instruction.
PC counter register: for recording the interval between two valid instructions to directly skip execution of an invalid instruction.
Reference PC register: and the PC value is used for storing the PC value of the first effective instruction, and when the following certain instruction is detected to be effective, the PC value of the next effective instruction is directly obtained by adding the reference PC value and the value of the PC counter for the execution of the execution unit.
The instruction cache module: for storing instructions to be executed by the execution unit.
Fig. 3 is a flowchart of a process of screening instructions by the instruction detection unit, which obtains an interval between two valid instructions by calculating a distance between two 1 s in the instruction Flag information Flag, and adds the current PC to the interval to obtain a PC value of the next valid instruction, so that the PC automatically jumps and transposes the valid instruction position to skip an invalid instruction. In the flow chart, a parameter i records the interval between the instruction to be detected and the instruction pointed by the current PC, Flag _ id represents the index of the mark information corresponding to the instruction, and the instruction detection unit detects the mark information of the instruction one by one until detecting that one piece of the mark information of the instruction is 1 or all the mark information of the instruction is detected to be finished, and the flow is finished.
Based on the design, the invention has the advantages that: the execution of invalid instructions is reduced, thereby reducing the number of times instructions are executed, as well as the access and execution of 0 value data. The computing resources are saved, and the performance of the sparse network is improved.
Specifically, the present invention is implemented in the compiling stage to realize the generation of the instruction marking information, and an invalid instruction detection device is added in each PE to skip the execution of the invalid instruction, which is further described in detail in conjunction with the execution process of convolution.
(1) Generation of instruction marking information
Fig. 4 shows the convolution implementation of 5 × 5 Ifmap and 3 × 3 Filter to generate one Ofmap result. In the convolution operation, the instruction flag information of the ifmaps participating in the operation is all 1, and the Filter values participating in the operation are respectively 1, 1, 0, 0, 2, 0, 1, 0, 4, so that the flag information (Filter _ flag) of the corresponding Laod instruction is 1, 1, 0, 0, 1, 0, 1 when the multiplication and addition operation is performed, the instruction flag information of the multiplication and addition operation is the and operation of the flag information of the positions corresponding to the ifmaps and the Filter, and the instruction flag information of the ifmaps is all 1, so that the instruction flag information of the multiplication and addition operation is the same as that of the Filter, namely 1, 1, 0, 0, 1, 0, 1, 0, 1.
(2) Detection of invalid instructions
Fig. 5 is a valid instruction finally executed after the instruction in fig. 4 is screened by the instruction detection unit, where Flag is Flag information of a corresponding instruction, and it corresponds to the instruction in fig. 4 one to one.
The method comprises the following steps: at the beginning, Inst1Is 1, so PC points to Inst1To execute Inst1Instructions;
step two: after the execution is finished, the PC automatically adds 1 pointing Inst2Due to Inst2Flag of (2) is also 1, so the PC still points to Inst2And executing;
step three: so executed until Inst11The execution is finished;
step four: PC automatic plus 1 pointing Inst12Because of Inst12Is 0, so i and Flag _ id in the instruction detecting unit are both increased by 1 to detect Inst13The validity of the instruction;
step five: and Inst13The Flag of (1) is still 0, i and Flag _ id continue to be added by 1;
step six: inst at this time14Flag of (1), so PC is updated to PC +2, the detection is over, and PC jumps to Inst14Is executed;
step seven: the instruction detection unit continues to execute until all instructions in the graph are detected;
step eight: the effective instruction executed by the PE is Inst finally1-Inst11,Inst14,Inst16,Inst18-20,Inst23,Inst25,Inst27
The following are system examples corresponding to the above method examples, and this embodiment can be implemented in cooperation with the above embodiments. The related technical details mentioned in the above embodiments are still valid in this embodiment, and are not described herein again in order to reduce repetition. Accordingly, the related-art details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a sparse convolution neural network acceleration system based on the data flow architecture, which comprises the following steps:
the method comprises the following steps that a module 1 compiles the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generates instruction mark information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and the module 2 detects the instruction through the instruction marking information, reserves the effective instruction in the data flow diagram, counts the distance between the two effective instructions, and directly skips the execution of the invalid instruction according to the distance when executing the data flow diagram until obtaining the processing result of the data flow diagram.
The sparse convolution neural network acceleration system based on the data flow architecture comprises a data flow graph and a data flow graph, wherein the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration system based on the data flow architecture is characterized in that the instruction marking information respectively represents the validity and the invalidity of an instruction by using 1 and 0.
The sparse convolution neural network acceleration system based on the data flow architecture is characterized in that the module 2 specifically detects an instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
the PC counter register is used for recording the interval between two effective instructions so as to directly skip the execution of the ineffective instruction;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a certain subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for the execution unit to execute;
the instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration system based on the data stream architecture, wherein the process of generating the instruction marking information in the module 1 specifically includes: and marking the instruction related to the weight value of 0 in the convolutional layer and the full-link layer as an invalid instruction, and marking the instruction related to the non-0 value as a valid instruction.

Claims (10)

1. A sparse convolution neural network acceleration method based on a data flow architecture is characterized by comprising the following steps:
step 1, compiling the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generating instruction mark information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving the effective instruction in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the invalid instruction by the sparse convolutional neural network according to the distance when the data flow diagram is executed until the processing result of the data flow diagram is obtained.
2. The sparse convolutional neural network acceleration method based on a dataflow architecture of claim 1, wherein the dataflow graph includes a plurality of nodes, the nodes include a plurality of instructions, and the directed edges of the dataflow graph represent dependency relationships of the nodes.
3. The data-flow-architecture-based sparse convolutional neural network acceleration method of claim 1, wherein the instruction tag information uses 1 and 0 to represent validity and invalidity of the instruction, respectively.
4. The method as claimed in claim 1, wherein the step 2 is to detect the command by a null command detection device, and the null command detection device comprises:
the instruction marking information module is used for caching instruction marking information;
the PC counter register is used for recording the interval between two effective instructions so as to directly skip the execution of the ineffective instruction;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a certain subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for the execution unit to execute;
the instruction cache module: for storing valid instructions to be executed by the execution unit.
5. The method as claimed in claim 1, wherein the step 1 of generating the instruction mark information specifically includes: and marking the instruction related to the weight value of 0 in the convolutional layer and the full-link layer as an invalid instruction, and marking the instruction related to the non-0 value as a valid instruction.
6. A sparse convolutional neural network acceleration system based on a dataflow architecture, comprising:
the method comprises the following steps that a module 1 compiles the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generates instruction mark information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and the module 2 detects the instruction through the instruction marking information, reserves the effective instruction in the data flow diagram, counts the distance between the two effective instructions, and directly skips the execution of the invalid instruction according to the distance when executing the data flow diagram until obtaining the processing result of the data flow diagram.
7. The sparse convolutional neural network acceleration system based on a dataflow architecture of claim 6, wherein the dataflow graph includes a plurality of nodes, the nodes include a plurality of instructions, and the directed edges of the dataflow graph represent dependency relationships of the nodes.
8. The data-flow-architecture-based sparse convolutional neural network acceleration system of claim 6, wherein the instruction tag information uses 1 and 0 to represent validity and invalidity of an instruction, respectively.
9. The sparse convolutional neural network acceleration system based on data flow architecture of claim 6, wherein the module 2 specifically detects the command through a null command detection device, the null command detection device comprises:
the instruction marking information module is used for caching instruction marking information;
the PC counter register is used for recording the interval between two effective instructions so as to directly skip the execution of the ineffective instruction;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a certain subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for the execution unit to execute;
the instruction cache module: for storing valid instructions to be executed by the execution unit.
10. The sparse convolutional neural network acceleration system based on a data flow architecture as claimed in claim 6, wherein the process of generating instruction tag information in the module 1 specifically includes: and marking the instruction related to the weight value of 0 in the convolutional layer and the full-link layer as an invalid instruction, and marking the instruction related to the non-0 value as a valid instruction.
CN202010685107.XA 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture Active CN112015472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010685107.XA CN112015472B (en) 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010685107.XA CN112015472B (en) 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture

Publications (2)

Publication Number Publication Date
CN112015472A true CN112015472A (en) 2020-12-01
CN112015472B CN112015472B (en) 2023-12-12

Family

ID=73499705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010685107.XA Active CN112015472B (en) 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture

Country Status (1)

Country Link
CN (1) CN112015472B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180210862A1 (en) * 2017-01-22 2018-07-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111062472A (en) * 2019-12-11 2020-04-24 浙江大学 Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180210862A1 (en) * 2017-01-22 2018-07-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111062472A (en) * 2019-12-11 2020-04-24 浙江大学 Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity

Also Published As

Publication number Publication date
CN112015472B (en) 2023-12-12

Similar Documents

Publication Publication Date Title
US8302078B2 (en) Lazy evaluation of geometric definitions of objects within procedural programming environments
US20120030652A1 (en) Mechanism for Describing Values of Optimized Away Parameters in a Compiler-Generated Debug Output
US7539851B2 (en) Using register readiness to facilitate value prediction
CN101611380A (en) Speculative throughput calculates
WO2020073641A1 (en) Data structure-oriented data prefetching method and device for graphics processing unit
CN112015473B (en) Sparse convolutional neural network acceleration method and system based on data flow architecture
US10318261B2 (en) Execution of complex recursive algorithms
CN115809063B (en) Storage process compiling method, system, electronic equipment and storage medium
US20050071834A1 (en) Generating executable code based on code performance data
CN115936248A (en) Attention network-based power load prediction method, device and system
US10248814B2 (en) Memory integrity monitoring
CN112015472B (en) Sparse convolutional neural network acceleration method and system based on data flow architecture
CN108021563B (en) Method and device for detecting data dependence between instructions
CN111124694B (en) Deadlock detection and solution method for reachability graph based on petri network
CN112183744A (en) Neural network pruning method and device
US8381195B2 (en) Implementing parallel loops with serial semantics
Liu et al. A theoretical framework for value prediction in parallel systems
CN112215349B (en) Sparse convolutional neural network acceleration method and device based on data flow architecture
CN115130672A (en) Method and device for calculating convolution neural network by software and hardware collaborative optimization
CN110969259B (en) Processing core with data-dependent adaptive rounding
Li et al. Efficient microarchitectural vulnerabilities prediction using boosted regression trees and patient rule inductions
CN117561502A (en) Method and device for determining failure reason
CN111190644A (en) Embedded Flash on-chip read instruction hardware acceleration method and device
D’Alberto et al. Static analysis of parameterized loop nests for energy efficient use of data caches
CN113705800A (en) Processing unit, related device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant