CN112015472B - Sparse convolutional neural network acceleration method and system based on data flow architecture - Google Patents

Sparse convolutional neural network acceleration method and system based on data flow architecture Download PDF

Info

Publication number
CN112015472B
CN112015472B CN202010685107.XA CN202010685107A CN112015472B CN 112015472 B CN112015472 B CN 112015472B CN 202010685107 A CN202010685107 A CN 202010685107A CN 112015472 B CN112015472 B CN 112015472B
Authority
CN
China
Prior art keywords
instruction
data flow
instructions
value
neural network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010685107.XA
Other languages
Chinese (zh)
Other versions
CN112015472A (en
Inventor
吴欣欣
范志华
轩伟
李文明
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202010685107.XA priority Critical patent/CN112015472B/en
Publication of CN112015472A publication Critical patent/CN112015472A/en
Application granted granted Critical
Publication of CN112015472B publication Critical patent/CN112015472B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30145Instruction analysis, e.g. decoding, instruction word fields
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • G06F9/30141Implementation provisions of register files, e.g. ports
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/32Address formation of the next instruction, e.g. by incrementing the instruction counter
    • G06F9/321Program or instruction counter, e.g. incrementing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/10Interfaces, programming languages or software development kits, e.g. for simulating neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Devices For Executing Special Programs (AREA)

Abstract

The invention provides a method for detecting invalid instructions and skipping execution in a data flow architecture, which is suitable for accelerating a sparse convolutional neural network under the data flow architecture. The invention comprises a convolution layer and a full connection layer for a sparse neural network. The instruction mark information is generated according to the data characteristics through the instruction compiled by the compiler, the instruction detection unit detects the instruction according to the instruction mark information, and execution of the invalid instruction is skipped, so that acceleration of the sparse convolutional neural network is realized.

Description

Sparse convolutional neural network acceleration method and system based on data flow architecture
Technical Field
The invention relates to the technical field of computer system structures, in particular to a sparse convolutional neural network acceleration method and system based on a data flow architecture.
Background
The neural network has advanced performance in the aspects of image detection, voice recognition and natural language processing, and along with the complexity of application, the neural network model is also complex, so that many challenges are presented to the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in the aspects of calculation, storage, power consumption requirements and the like. Many algorithms and accelerators for accelerating sparse networks, such as a sparse-blas library for a CPU, a custars library for a GPU, etc., have appeared, which accelerate the execution of sparse networks to some extent, and for dedicated accelerators, have advanced expressive forces in terms of performance, power consumption, etc.
The data flow architecture has wide application in the aspects of big data processing, scientific calculation and the like, and the decoupled algorithm and structure thereof enable the data flow architecture to have good universality and flexibility. The natural parallelism of the data flow architecture is well matched with the parallelism characteristic of the neural network algorithm.
The neural network model also becomes 'large' and 'deep' with the complexity of application, which presents many challenges to the traditional hardware, and in order to relieve the pressure of hardware resources, the sparse network has good advantages in terms of calculation, storage, power consumption requirements and the like. However, dedicated accelerators like a CPU, a GPU and an accelerating dense network cannot accelerate the sparse network, and dedicated accelerators for accelerating the sparse network lack flexibility and versatility of architecture due to strong coupling of algorithms and structures, and cannot perform innovation of algorithms.
The data flow architecture is decoupled in algorithm and structure, so that the data flow architecture has good universality and flexibility, the neural network algorithm is mapped in an architecture consisting of a computing array (PE array) in the form of a data flow diagram based on the data flow architecture, the data flow diagram comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow diagram represent the dependency relationship of the nodes.
In the sparse convolutional neural network, the main operation is multiply-add operation, the pruning operation sets the weight value in the network as 0, and because 0 is multiplied by any number to be 0, the data flow diagram compiled by the compiler contains 0-value related invalid instructions, and when the sparse convolutional neural network is executed, loading and execution of the invalid instructions and data exist. The execution of invalid instructions and data not only occupies hardware resources of the accelerator, so that the resource waste is caused, but also the power consumption of the accelerator is increased, the execution time of a computing array (PE) is prolonged, and the performance is reduced.
Disclosure of Invention
Aiming at the problems of resource waste and performance reduction caused by invalid instructions, data loading and execution of a sparse network in a data stream architecture, the invention provides a method and a device for detecting the invalid instructions in the sparse network by analyzing the data and instruction characteristics of the sparse network, so that the execution of the invalid instructions is skipped, and the acceleration of the sparse network is realized.
The convolution neural network operation is mainly multiply-add operation, and for the sparse convolution neural network, pruning operation resets some weights to 0, so that a 0-value related instruction exists in the data flow graph. Pruning refers to the step of setting weights in the convolutional neural network to be based on a set threshold above which the weights remain original, and setting weights below the threshold to be 0. The purpose of pruning operation is to change the dense convolutional neural network into a sparse network by setting some weights to 0 based on the redundancy characteristics of the weight data, thereby achieving the compression of the network. This operation occurs during the data preprocessing phase of the convolutional network, i.e., before the convolutional neural network is executed.
Taking convolution calculations as an example, as shown in FIG. 1, to perform a convolution operation, it can be seen that I of Ifmap is a value to obtain Ofmap 0 -I 8 W of Filter 0 -W 8 Requires Load instructions (Inst 1 -Inst 18 ) Loading data from memory into the memory location of the PE, then also requires a Madd (multiply add) instruction (Inst 19 -Inst 27 ) In these instructions, due to W 2 ,W 3 ,W 5 ,W 7 The values of (2) are all 0, so Inst 12 ,Inst 13 ,Inst 15 ,Inst 17 For an invalid Load instruction, the corresponding Madd instruction Inst 21 ,Inst 22 ,Inst 24 ,Inst 26 Also, as invalid instructions, the loading and execution of these instructions have no effect on the end result, and can be regarded as invalid instructions, but their execution occupies hardware resources and also causes performance degradation. To remove the loading and computation of the 0 value data, the instructions related to the 0 value need to be eliminated. Therefore, the invention provides a method and a device for detecting invalid instructions in a data stream and skipping execution, so as to save computing resources and improve the performance of a sparse network.
Aiming at the defects of the prior art, the invention provides a sparse convolutional neural network acceleration method based on a data stream architecture, which comprises the following steps:
step 1, compiling the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generating instruction marking information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the ineffective instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained.
According to the sparse convolutional neural network acceleration method based on the data flow architecture, the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and the directed edges of the data flow graph represent the dependency relationship of the nodes.
According to the sparse convolutional neural network acceleration method based on the data flow architecture, 1 and 0 are used for respectively representing the validity and invalidity of instructions by the instruction marking information.
The sparse convolutional neural network acceleration method based on the data stream architecture, wherein the step 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration method based on the data flow architecture specifically comprises the following steps of: and marking the instructions related to the weight of 0 in the convolution layer and the full connection layer as invalid instructions, and marking the instructions related to the non-0 value as valid instructions.
The invention also provides a sparse convolutional neural network acceleration system based on the data flow architecture, which comprises:
the method comprises the steps that a module 1 compiles operations of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and instruction marking information is generated for each instruction in the data flow graph according to data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the ineffective instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained.
The sparse convolutional neural network acceleration system based on the data flow architecture comprises a plurality of nodes, wherein the nodes comprise a plurality of instructions, and directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the instruction marking information indicates the validity and invalidity of an instruction by using 1 and 0 respectively.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the module 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration system based on the data flow architecture specifically comprises the following steps of: and marking the instructions related to the weight of 0 in the convolution layer and the full connection layer as invalid instructions, and marking the instructions related to the non-0 value as valid instructions.
The advantages of the invention are as follows:
the device reduces the loading and execution of invalid instructions and data in the sparse network, and improves the execution performance of the sparse network.
The invention designs a set of invalid instruction detection device by using a hardware mode aiming at sparse convolutional neural network application, comprising a convolutional layer and a full-connection layer, so as to accelerate the sparse network. The instruction mark information is generated according to the data characteristics of the instruction generated by the compiler, then the instruction is detected according to the instruction mark information by the invalid instruction detection device, the distance between two valid instructions is calculated in a statistical mode, and the execution of the invalid instruction is skipped directly, so that loading and calculation of 0-value data are avoided, and acceleration of the sparse convolutional neural network is achieved.
Drawings
FIG. 1 is a schematic diagram of a convolution operation instruction execution flow;
FIG. 2 is a diagram of an invalid instruction detecting device;
FIG. 3 is a flow chart of invalid instruction detection;
FIG. 4 is a diagram illustrating the generation of instruction tag information;
fig. 5 is a diagram of instruction screening results.
Detailed Description
The invention comprises the following key points:
key point 1, generating mark information of an instruction;
the key point 2, the invalid instruction detection unit detects the invalid instruction and skips execution;
in order to make the above features and effects of the present invention more clearly understood, the following specific examples are given with reference to the accompanying drawings.
The invention comprises the following steps:
i: generation of instruction tag information
After the compiler compiles the operations of the convolution layer and the full connection layer into a data flow graph, instruction marking information is generated according to the data characteristics to mark whether the instruction is valid or not, and the marking information indicates that the instruction is valid and invalid by using 1 and 0 respectively. The operation of the convolution layer and the full connection layer of the convolution neural network is multiply-add operation, and a compiler compiles the multiply-add operation into a data flow graph. In these layers, the data features of the weights are represented as 0 and non-0, and based on such data features, the flag information of the corresponding instruction is valid, i.e., the flag information of the instruction associated with the 0 value is 0 and the flag information of the instruction associated with the non-0 value is 1.0,1 indicates the invalidity and validity of the instruction, respectively.
II: invalid instruction detection device
As shown in fig. 2, the invalid instruction detecting device includes an instruction tag information buffer (Flag buffer), a PC counter register (PC counter), a reference PC register (PC Base), an instruction buffer module (Instructionbuffer), a PC refers to a program counter, which is used to store an address of a next instruction, and a PC counter refers to a distance between one valid instruction pointed by the PC and the next valid instruction. The purpose and reason for each module design are:
the instruction mark information buffer module is used for marking information of each instruction generated by the data characteristics of the instruction.
PC counter register: for recording the interval between two valid instructions to directly skip the execution of invalid instructions.
Reference PC register: and the PC value is used for storing the PC value of the first effective instruction, and when the fact that a subsequent instruction is effective is detected, the PC value of the next effective instruction is directly obtained by adding the reference PC value and the value of the PC counter for execution by the execution unit.
An instruction cache module: for storing instructions to be executed by the execution unit.
Fig. 3 is a flowchart of a screening process of an instruction by an instruction detection unit, which obtains intervals of two valid instructions by calculating distances between two 1 in instruction tag information Flag, and adds a current PC to the intervals to obtain a PC value of a next valid instruction, so that the PC automatically jumps and transposes the valid instruction position, thereby skipping invalid instructions. In the flow chart, the parameter i records the interval between the instruction to be detected and the instruction pointed by the current PC, flag_id represents the index of the mark information corresponding to the instruction, the instruction detection unit detects the instruction mark information one by one until one instruction mark information is detected to be 1 or all instruction mark information detection is finished, and the flow is finished.
Based on the design, the invention has the advantages that: the execution of invalid instructions is reduced, thereby reducing the number of times instructions are executed, and the accessing and executing of 0-value data. And the computing resources are saved, and the performance of the sparse network is improved.
Specifically, the invention is implemented in the compiling stage to realize the generation of instruction marking information, and an invalid instruction detection device is added in each PE to skip the execution of the invalid instruction, and the execution process of convolution is combined to describe the execution process in further detail.
(1) Generation of instruction tag information
Fig. 4 is a convolution execution of Ifmap of 5*5 and Filter of 3*3 to generate an Ofmap result. In the convolution operation, the instruction mark information of the Ifmap participating in the operation is all 1, the Filter values participating in the operation are 1,1,0,0,2,0,1,0,4 respectively, the mark information (filter_flag) of the corresponding Laod instruction is 1,1,0,0,1,0,1,0,1, when the multiplication and addition operation is executed, the instruction mark information of the multiplication and addition operation is the sum operation of the mark information of the Ifmap and the corresponding position of the Filter, and since the instruction mark information of the Ifmap is all 1, the instruction mark information of the multiplication and addition operation is the same as that of the Filter, namely 1,1,0,0,1,0,1,0,1.
(2) Detection of invalid instructions
Fig. 5 is a valid instruction finally executed after the instruction in fig. 4 is filtered by the instruction detection unit, where Flag is Flag information of a corresponding instruction, and it corresponds to the instruction in fig. 4 one by one.
Step one: initially, inst 1 The Flag value of (1) is 1 so the PC points to Inst 1 To execute Inst 1 An instruction;
step two: after execution, the PC automatically adds 1 to point to Inst 2 Due to Inst 2 Is also 1 so the PC is still pointed to Inst 2 And executing;
step three: doing so until Inst 11 Ending execution;
step four: PC automatic 1-up pointing Inst 12 Because of Inst 12 Is 0, so both i and flag_id in the instruction detection unit are incremented by 1 to detect Inst 13 Validity of the instruction;
step five: and Inst 13 The Flag of (1) is still 0, i and flag_id continue to be added with 1;
step six: at this point Inst 14 Since Flag of (1) is updated to PC+2, the detection ends and PC jumps to Inst 14 Executing at the site;
step seven: the instruction detection unit continues to execute until all instructions in the graph are detected;
step eight: the valid instruction executed by the final PE is Inst 1 -Inst 11 ,Inst 14 ,Inst 16 ,Inst 18-20 ,Inst 23 ,Inst 25 ,Inst 27
The following is a system example corresponding to the above method example, and this embodiment mode may be implemented in cooperation with the above embodiment mode. The related technical details mentioned in the above embodiments are still valid in this embodiment, and in order to reduce repetition, they are not repeated here. Accordingly, the related technical details mentioned in the present embodiment can also be applied to the above-described embodiments.
The invention also provides a sparse convolutional neural network acceleration system based on the data flow architecture, which comprises:
the method comprises the steps that a module 1 compiles operations of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and instruction marking information is generated for each instruction in the data flow graph according to data characteristics of the data flow graph;
and 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of the ineffective instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained.
The sparse convolutional neural network acceleration system based on the data flow architecture comprises a plurality of nodes, wherein the nodes comprise a plurality of instructions, and directed edges of the data flow graph represent the dependency relationship of the nodes.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the instruction marking information indicates the validity and invalidity of an instruction by using 1 and 0 respectively.
The sparse convolutional neural network acceleration system based on the data flow architecture, wherein the module 2 specifically detects the instruction through an invalid instruction detection device, and the invalid instruction detection device comprises:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
The sparse convolutional neural network acceleration system based on the data flow architecture specifically comprises the following steps of: and marking the instructions related to the weight of 0 in the convolution layer and the full connection layer as invalid instructions, and marking the instructions related to the non-0 value as valid instructions.

Claims (4)

1. The sparse convolutional neural network acceleration method based on the data stream architecture is characterized by comprising the following steps of:
step 1, compiling the operation of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and generating instruction marking information for each instruction in the data flow graph according to the data characteristics of the data flow graph;
step 2, detecting the instruction through the instruction marking information, reserving effective instructions in the data flow diagram, counting the distance between the two effective instructions, and directly skipping the execution of invalid instructions by the sparse convolutional neural network according to the distance when the data flow diagram is executed until a processing result of the data flow diagram is obtained;
the instruction tag information indicates the validity and invalidity of an instruction using 1 and 0, respectively; the process of generating instruction mark information in the step 1 specifically includes: marking the instruction with the weight of 0 value in the convolution layer and the full connection layer as an invalid instruction, and marking the instruction with the non-0 value as a valid instruction;
the step 2 specifically detects the instruction by an invalid instruction detecting device, which includes:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
2. The method for accelerating sparse convolutional neural network based on data flow architecture of claim 1, wherein the data flow graph comprises a plurality of nodes, the nodes comprise a plurality of instructions, and directed edges of the data flow graph represent dependency relationships of the nodes.
3. A sparse convolutional neural network acceleration system based on a data flow architecture, comprising:
the method comprises the steps that a module 1 compiles operations of a convolution layer and a full connection layer in a sparse convolution neural network into a data flow graph through a compiler, and instruction marking information is generated for each instruction in the data flow graph according to data characteristics of the data flow graph;
the module 2 detects the instruction through the instruction marking information, reserves the effective instruction in the data flow diagram, counts the distance between the two effective instructions, and directly skips the execution of the ineffective instruction according to the distance when the data flow diagram is executed until the processing result of the data flow diagram is obtained;
the instruction tag information indicates the validity and invalidity of an instruction using 1 and 0, respectively; the process of generating instruction tag information in the module 1 specifically includes: marking the instruction with the weight of 0 value in the convolution layer and the full connection layer as an invalid instruction, and marking the instruction with the non-0 value as a valid instruction;
the module 2 detects the instruction specifically by an invalid instruction detecting means including:
the instruction marking information module is used for caching instruction marking information;
a PC counter register for recording the interval between two valid instructions to directly skip the execution of invalid instructions;
the reference PC register is used for storing the PC value of the first effective instruction as a reference PC value, and when the fact that a subsequent instruction is effective is detected, the reference PC value is added with the interval value stored by the PC counter to obtain the PC value of the next effective instruction for execution by the execution unit;
an instruction cache module: for storing valid instructions to be executed by the execution unit.
4. The sparse convolutional neural network acceleration system of claim 3, wherein the dataflow graph includes a plurality of nodes including a plurality of instructions, and wherein the directed edges of the dataflow graph represent dependencies of the nodes.
CN202010685107.XA 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture Active CN112015472B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010685107.XA CN112015472B (en) 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010685107.XA CN112015472B (en) 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture

Publications (2)

Publication Number Publication Date
CN112015472A CN112015472A (en) 2020-12-01
CN112015472B true CN112015472B (en) 2023-12-12

Family

ID=73499705

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010685107.XA Active CN112015472B (en) 2020-07-16 2020-07-16 Sparse convolutional neural network acceleration method and system based on data flow architecture

Country Status (1)

Country Link
CN (1) CN112015472B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111062472A (en) * 2019-12-11 2020-04-24 浙江大学 Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018134740A2 (en) * 2017-01-22 2018-07-26 Gsi Technology Inc. Sparse matrix multiplication in associative memory device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472350A (en) * 2018-10-30 2019-03-15 南京大学 A kind of neural network acceleration system based on block circulation sparse matrix
CN110991631A (en) * 2019-11-28 2020-04-10 福州大学 Neural network acceleration system based on FPGA
CN111062472A (en) * 2019-12-11 2020-04-24 浙江大学 Sparse neural network accelerator based on structured pruning and acceleration method thereof
CN111368988A (en) * 2020-02-28 2020-07-03 北京航空航天大学 Deep learning training hardware accelerator utilizing sparsity

Also Published As

Publication number Publication date
CN112015472A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN105022670A (en) Heterogeneous distributed task processing system and processing method in cloud computing platform
US8789031B2 (en) Software constructed strands for execution on a multi-core architecture
US20120030652A1 (en) Mechanism for Describing Values of Optimized Away Parameters in a Compiler-Generated Debug Output
KR20150052350A (en) Combined branch target and predicate prediction
US20170192787A1 (en) Loop code processor optimizations
US20210350230A1 (en) Data dividing method and processor for convolution operation
Chu et al. Precise cache timing analysis via symbolic execution
US20160147516A1 (en) Execution of complex recursive algorithms
CN112015473B (en) Sparse convolutional neural network acceleration method and system based on data flow architecture
CN114416045A (en) Method and device for automatically generating operator
US20170192793A1 (en) Efficient instruction processing for sparse data
CN112015472B (en) Sparse convolutional neural network acceleration method and system based on data flow architecture
Le et al. Involving cpus into multi-gpu deep learning
CN112183744A (en) Neural network pruning method and device
Wen et al. A swap dominated tensor re-generation strategy for training deep learning models
CN112215349B (en) Sparse convolutional neural network acceleration method and device based on data flow architecture
CN115130672A (en) Method and device for calculating convolution neural network by software and hardware collaborative optimization
Nehmeh et al. Integer word-length optimization for fixed-point systems
Prokesch et al. Towards automated generation of time-predictable code
D’Alberto et al. Static analysis of parameterized loop nests for energy efficient use of data caches
Kim et al. System level power reduction for yolo2 sub-modules for object detection of future autonomous vehicles
CN116301903B (en) Compiler, AI network compiling method, processing method and executing system
Huang et al. DTW-based subsequence similarity search on AMD heterogeneous computing platform
CN115470598B (en) Multithreading-based three-dimensional rolled piece model block data rapid inheritance method and system
KR100829167B1 (en) Method of reducing data dependence for software pipelining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant