CN110502278A - Neural network coprocessor and its association's processing method based on RiscV extended instruction - Google Patents

Neural network coprocessor and its association's processing method based on RiscV extended instruction Download PDF

Info

Publication number
CN110502278A
CN110502278A CN201910671987.2A CN201910671987A CN110502278A CN 110502278 A CN110502278 A CN 110502278A CN 201910671987 A CN201910671987 A CN 201910671987A CN 110502278 A CN110502278 A CN 110502278A
Authority
CN
China
Prior art keywords
instruction
unit
extended
extended instruction
control unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910671987.2A
Other languages
Chinese (zh)
Other versions
CN110502278B (en
Inventor
廖裕民
张义航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Rockchip Electronics Co Ltd
Original Assignee
Fuzhou Rockchip Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Rockchip Electronics Co Ltd filed Critical Fuzhou Rockchip Electronics Co Ltd
Priority to CN201910671987.2A priority Critical patent/CN110502278B/en
Publication of CN110502278A publication Critical patent/CN110502278A/en
Application granted granted Critical
Publication of CN110502278B publication Critical patent/CN110502278B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/3001Arithmetic instructions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30181Instruction operation extension or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3877Concurrent instruction execution, e.g. pipeline or look ahead using a slave processor, e.g. coprocessor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Neurology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The present invention provides a kind of neural network coprocessor based on RiscV extended instruction and its association's processing method, including extended instruction arithmetic element, the extended instruction arithmetic element are connected to a RiscV CPU;The extended instruction arithmetic element is in the extended instruction request for receiving the RiscV CPU, each extended instruction is classified according to subordinate relation into multiple instruction operation grade, each extended instruction of each operation grade needs the arithmetic element used, connection relationship between each arithmetic element, and the degree of parallelism of each arithmetic element is configured, and is exported after completing ordering calculation according to configuration information to the RiscV CPU.The present invention is by allowing coprocessor to have programmable, configurable ability, to new function is realized, is adapted to any new algorithm and provides support.

Description

Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof
Technical Field
The invention relates to a coprocessor and a coprocessing method thereof.
Background
A coprocessor (coprocessor) is a chip that relieves the system microprocessor of certain processing tasks. For example, a math co-processor may control digital processing; the graphics coprocessor may handle video rendering. For example, the intelpentium microprocessor includes a built-in math co-processor.
The coprocessor may be attached to the ARM processor. A coprocessor extends core processing functionality by extending the instruction set or providing configuration registers. One or more coprocessors may be connected to the ARM core through a coprocessor interface. The ARM microprocessor may support up to 16 coprocessors for various coprocessing operations, each coprocessor executing coprocessing instructions for itself only, ignoring instructions from the ARM processor and other coprocessors during program execution. The coprocessor instructions of the ARM are mainly used for initializing the data processing operation of the ARM coprocessor by the ARM processor, transmitting data between the registers of the ARM processor and the registers of the coprocessor, and transmitting data between the registers of the ARM coprocessor and a memory.
However, the conventional coprocessor belongs to a fixed circuit after leaving the factory, and is not programmable or configurable at a later stage, and only can perform operation and acceleration of a specific algorithm, however, with the rapid development of a high-speed computing technology, various novel algorithms are developed endlessly, and it is obvious that the conventional coprocessor cannot adapt to the development of a new and improved high-speed computing technology because only operation and acceleration of the specific algorithm can be performed.
Therefore, the invention provides a programmable and configurable coprocessor which realizes programmable and configurable functions based on a RiccV CPU extended instruction so as to solve the defects of a fixed circuit in the prior art.
RiscV, RISC-V (the English pronunciation is "risk-five"), is a completely new instruction set architecture originally invented in 2010 by developers of the department of computer science, Krste Asanovic, Andrew Waterman and Yunsule, of the department of EECS, university of California, Berkeley, Inc. Where "RISC" represents the reduced instruction set and where "V" represents the fifth generation instruction set designed from RISC I by Berkeley division.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a RiccV extended instruction-based neural network coprocessor and a coprocessing method thereof, wherein the coprocessor is enabled to realize new functions in a programmable and configurable mode so as to be capable of adapting to any new algorithm.
The coprocessor of the invention is realized by the following steps: a neural network coprocessor based on a RiccV extended instruction comprises an extended instruction arithmetic unit, wherein the extended instruction arithmetic unit is connected to a RiccV CPU; and when receiving an extended instruction request of the RiscV CPU, the extended instruction operation unit classifies each extended instruction into a plurality of instruction operation levels according to the dependency relationship, configures the operation units required by each extended instruction of each operation level, the connection relationship among the operation units and the parallelism of the operation units, completes instruction operation according to the configuration information, and outputs the instruction operation to the RiscV CPU.
Furthermore, the extended instruction arithmetic unit comprises an instruction decoding distribution unit, an instruction grade mapping storage unit, a result output selection unit, an arithmetic basic unit array consisting of a plurality of basic arithmetic circuits, a plurality of grades of interconnection control units and a plurality of grades of interconnection configuration units;
the instruction level mapping storage unit is respectively connected with the instruction decoding distribution unit and the result output selection unit;
the input and the output of each basic operational circuit are respectively connected to the output and the input of any other basic operational circuit through the interconnection control unit of each grade;
the interconnection control unit of each grade is connected with the instruction decoding distribution unit, the result output selection unit and the interconnection configuration unit of a corresponding grade.
Further, the arithmetic basic unit array includes:
an adder group composed of a plurality of adders;
a multiplier group composed of a plurality of multipliers;
an OR operation group composed of a plurality of OR operators;
an inverse operation group composed of a plurality of inverse operators;
the LUT lookup operation group consists of a plurality of LUT table storage units and a plurality of LUT lookup configuration units, and each LUT table storage unit is correspondingly connected with one LUT lookup configuration unit;
the interconnection control unit includes:
a first-stage interconnection control unit, which is respectively connected with each adder, each multiplier, each or operator, each inverse operator, and each LUT table storage unit;
a second-level interconnection control unit respectively connected to each of the adders, each of the multipliers, each of the or operators, each of the inverse operators, and each of the LUT table storage units;
a three-level interconnection control unit respectively connected to each of the adders, each of the multipliers, each of the or operators, each of the inverse operators, and each of the LUT table storage units;
the interconnection configuration unit includes:
the primary interconnection configuration unit is connected with the primary interconnection control unit;
the second-level interconnection configuration unit is connected with the second-level interconnection control unit;
and the third-level interconnection configuration unit is connected with the second-level interconnection control unit.
Furthermore, the coprocessor also comprises a multi-beat instruction delay mapping storage unit and a feedback control completion unit; the multi-beat instruction delay mapping storage unit is connected to the RiscV CPU through the completion feedback control unit;
the multi-beat instruction delay mapping storage unit stores multi-beat delay information corresponding to all the extended instructions, after receiving an extended instruction request from the RiscV CPU, the multi-beat instruction delay mapping storage unit inquires the delay beat number corresponding to the extended instruction and sends the delay beat number to the completion feedback control unit, then the feedback control unit sets the completion state feedback signal to be valid according to the delay clock beat number of the current extended instruction sent by the multi-beat instruction delay mapping storage unit, and after the delay beat after receiving the instruction request reaches the value, the RiscV CPU is informed to sample and receive the operation result.
Furthermore, the coprocessor also comprises an extended instruction pipeline state storage unit and an extended instruction pipeline control unit; the extended instruction pipeline state storage unit is connected to the RiccV CPU through the extended instruction pipeline control unit;
the extended instruction pipeline state storage unit stores information whether all extended instructions need the CPU to stop the pipeline, inquires whether the current extended instructions need the CPU to stop the pipeline after receiving an extended instruction request from the RiccV CPU, and sends the information to the extended instruction pipeline control unit; and the extended instruction pipeline control unit transmits the received information to the RiscV CPU and informs the RiscV CPU whether to stop the current pipeline operation until the extended instruction finishes the operation.
The method of the invention is realized as follows: a neural network coprocessing method based on a RiccV extended instruction is characterized in that the configurable coprocessor is provided with an extended instruction arithmetic unit, the extended instruction arithmetic unit receives an application of the extended instruction sent by a RiccV CPU, classifies each extended instruction into a plurality of instruction arithmetic levels according to a subordination relation, configures arithmetic units required by each extended instruction of each arithmetic level, connection relations among the arithmetic units and parallelism of the arithmetic units, completes instruction arithmetic according to configuration information, and outputs the instruction to the RiccV CPU.
Furthermore, the extended instruction arithmetic unit comprises an instruction decoding distribution unit, an instruction grade mapping storage unit, a result output selection unit, an arithmetic basic unit array consisting of a plurality of basic arithmetic circuits, a plurality of grades of interconnection control units and a plurality of grades of interconnection configuration units;
the instruction level mapping storage unit is respectively connected with the instruction decoding distribution unit and the result output selection unit; the input and the output of each basic operational circuit are respectively connected to the output and the input of any other basic operational circuit through the interconnection control unit of each grade; the interconnection control unit of each grade is connected with the instruction decoding distribution unit, the result output selection unit and the interconnection configuration unit of a corresponding grade.
The instruction decoding distribution unit reads the information of which grade the current expansion instruction corresponds to in the instruction grade mapping storage unit and distributes the current expansion instruction to the interconnection control unit of the corresponding grade for operation;
after receiving the extension instruction, the interconnection control unit at any level inquires the operation units needed to be used corresponding to the current extension instruction, the connection relation among the operation units and the configuration information of the parallelism degree of the operation units from the interconnection configuration unit at the corresponding level; then configuring corresponding interconnection forms according to the configuration information and controlling data flow;
and then, the interconnection control unit of any grade starts to carry out instruction operation through the basic operation circuit, and sends a result to the result output selection unit after the instruction operation is finished.
Furthermore, the configurable coprocessor further comprises a multi-beat instruction delay mapping storage unit and a feedback control completion unit;
the multi-beat instruction delay mapping storage unit stores multi-beat delay information corresponding to all the extended instructions, inquires the delay beat number corresponding to the extended instruction after receiving the extended instruction request from the RiscV CPU, and sends the delay beat number to the completion feedback control unit;
and the feedback control unit sets the completion state feedback signal to be effective after the delay beat after receiving the instruction request reaches the value according to the delay clock beat number of the current extended instruction sent by the multi-beat instruction delay mapping storage unit, and informs the RiccV CPU to sample and receive the operation result.
Further, the configurable coprocessor further comprises an extended instruction pipeline state storage unit and an extended instruction pipeline control unit;
the extended instruction pipeline state storage unit stores information whether all extended instructions need the CPU to stop the pipeline, inquires whether the current extended instructions need the CPU to stop the pipeline after receiving an extended instruction request from the RiccV CPU, and sends the information to the extended instruction pipeline control unit;
and the extended instruction pipeline control unit transmits the received information to the RiscV CPU and informs the RiscV CPU whether to stop the current pipeline operation until the extended instruction finishes the operation.
The invention has the following advantages:
1. the extended instruction arithmetic unit in the coprocessor has configurability, namely, the circuit can realize new functions through programming and configuration, so that a new neural network structure and a new algorithm can be adapted at will, and the defect of a fixed circuit in the prior art is well overcome;
2. the coprocessor can be well coupled with an emerging RiccV CPU instruction set by expanding the instruction set, so that the circuit can be seamlessly connected to the RiccV CPU to form a neural network coprocessor based on the RiccV expansion instruction;
3. the coprocessor of the invention realizes an instruction system with multi-level complexity classification by expanding the instruction arithmetic unit, so that the programming arithmetic structure instruction of each level can be called by the higher programming instruction, and the circuit can work more efficiently.
Drawings
The invention will be further described with reference to the following examples with reference to the accompanying drawings.
FIG. 1 is a block diagram of the overall circuit structure of a coprocessor according to an embodiment of the present invention.
FIG. 2 is a block diagram of an extended instruction arithmetic unit in the coprocessor of the present invention.
FIG. 3 is a block diagram of the overall circuit structure of another embodiment of the coprocessor of the present invention.
FIG. 4 is a block diagram of the overall circuit structure of a coprocessor according to another embodiment of the present invention.
Detailed Description
Example one
Fig. 1 shows an embodiment of the coprocessor of the present invention, which is a RiscV extended instruction-based neural network coprocessor, including an extended instruction arithmetic unit connected to a RiscV CPU.
The RiccV CPU is a main CPU and is responsible for program operation, and when a native instruction (namely a non-extended instruction) is operated, instruction operation is completed inside the RiccV CPU; when the instruction to be executed is an extended instruction, the definition of the extended instruction by the user is received, for example, the extended instruction is defined as a convolution operation instruction (the convolution operation instruction belongs to a non-native instruction, but is an extended instruction) by the user, and then the extended instruction request is sent to the extended instruction operation unit.
And when receiving an extended instruction request of the RiscV CPU, the extended instruction operation unit classifies each extended instruction into a plurality of instruction operation levels according to the dependency relationship, configures the operation units required by each extended instruction of each operation level, the connection relationship among the operation units and the parallelism of the operation units, completes instruction operation according to the configuration information, and outputs the instruction operation to the RiscV CPU.
As shown in fig. 2, the extended instruction arithmetic unit includes an instruction decoding allocation unit, an instruction level mapping storage unit, a result output selection unit, an arithmetic basic unit array composed of a plurality of basic arithmetic circuits, a plurality of levels of interconnection control units, and a plurality of levels of interconnection configuration units;
the instruction level mapping storage unit is respectively connected with the instruction decoding distribution unit and the result output selection unit;
the input and the output of each basic operational circuit are respectively connected to the output and the input of any other basic operational circuit through the interconnection control unit of each grade;
the interconnection control unit of each grade is connected with the instruction decoding distribution unit, the result output selection unit and the interconnection configuration unit of a corresponding grade.
Wherein,
the instruction level mapping storage unit is responsible for storing the level instruction corresponding to each extended instruction, and the instruction decoding distribution unit distributes the current instruction to the interconnection control unit of the corresponding level for operation;
the instruction decoding distribution unit is responsible for mapping the level corresponding to the query instruction to the instruction level mapping storage unit after receiving the extended instruction, and then distributing the instruction to the interconnection control unit with the corresponding level for operation;
the arithmetic basic unit array is used for providing an arithmetic unit for finishing the calculation of the extended instruction;
the interconnection configuration unit is responsible for storing information of the connection relation between the arithmetic units and the parallelism of the arithmetic units, which are needed by each extension instruction of the level;
the interconnection control unit is responsible for inquiring the connection relation between the arithmetic units and the units which need to be used and correspond to the instruction and the parallelism information of the arithmetic units from the interconnection configuration unit corresponding to the interconnection control unit after receiving the instruction, configuring the interconnection control unit into the interconnection form corresponding to the information according to the configuration information and controlling the data flow of the interconnection control unit, starting the instruction operation and sending the result to the result output selection unit after the instruction operation is finished;
the result output selection unit gates the output result of the interconnection control unit of the corresponding level according to the instruction level information of the instruction level mapping storage unit;
specifically, as shown in fig. 2, in the embodiment, the arithmetic basic unit array includes:
the adder group consists of a plurality of adders, and the number of the adders is unlimited;
the multiplier group is composed of a plurality of multipliers, and the number of the multipliers is unlimited;
an OR operation group composed of a plurality of OR operators, the number of which is unlimited;
the inverse operation group is composed of a plurality of inverse operators, and the number is unlimited;
the LUT lookup operation group consists of a plurality of LUT table storage units and a plurality of LUT lookup configuration units, the number of the LUT table storage units is not limited, and each LUT table storage unit is correspondingly connected with one LUT lookup configuration unit;
the interconnection control unit includes:
a first-stage interconnection control unit, which is respectively connected with each adder, each multiplier, each or operator, each inverse operator, and each LUT table storage unit;
a second-level interconnection control unit respectively connected to each of the adders, each of the multipliers, each of the or operators, each of the inverse operators, and each of the LUT table storage units;
a three-level interconnection control unit respectively connected to each of the adders, each of the multipliers, each of the or operators, each of the inverse operators, and each of the LUT table storage units;
the interconnection configuration unit includes:
the primary interconnection configuration unit is connected with the primary interconnection control unit;
the second-level interconnection configuration unit is connected with the second-level interconnection control unit;
and the third-level interconnection configuration unit is connected with the second-level interconnection control unit.
Therefore, each extended instruction can be classified into three instruction levels according to the subordination relationship to complete calculation, the instruction complexity from level one to level three is an increasing relationship and a calling relationship, the instructions with low levels can adapt to various novel algorithms, the adaptability is strong, but the data throughput and the instruction number are increased due to the fact that multiple times of execution are needed, and therefore the operation efficiency is reduced; the high-level instruction has high complexity, is composed of low-level instructions, can complete high-complexity calculation at one time, but has low adaptability, and only supports partial mature high-complexity operation, so that the multi-level instruction system has both adaptability and execution efficiency.
Examples are:
the first-stage instruction is a multiply-add operation, a multiply-add operation group and an add operation group are needed, the connection relation is that an input value is firstly given to the multiplier group, then the output of the multiplier group is connected to the input of the adder group, the parallelism degree is N, and the method can be used in any new algorithm used for the multiply-add operation.
The second-level instruction is convolution operation, can directly call the structure of the first-level instruction multiply-add operation, and then adds an output result on the basis of the multiply-add operation instruction, and completes the accumulation of the multiply-add result through an adder, thereby realizing the convolution.
The three-level instruction and convolution activation operation can directly call the structure of the convolution operation of the two-level instruction, and then the output result is added on the basis of the convolution instruction and output after the activation operation is completed through the LUT table look-up unit, so that the convolution activation is realized.
Example two
As shown in fig. 3, compared with the first embodiment, the coprocessor of the present embodiment further includes a multi-beat instruction delay mapping storage unit and a completion feedback control unit; the multi-beat instruction delay mapping storage unit is connected to the RiscV CPU through the completion feedback control unit; when the extended instruction arithmetic unit needs a plurality of clock beats to finish the operation, after the delay beat reaches a corresponding value, the completion state feedback signal is set to be effective, and the RiccV CPU is informed to sample and receive the operation result.
The multi-beat instruction delay mapping storage unit stores multi-beat delay information corresponding to all the extended instructions, after receiving an extended instruction request from the RiscV CPU, the multi-beat instruction delay mapping storage unit inquires the delay beat number corresponding to the extended instruction and sends the delay beat number to the completion feedback control unit, then the feedback control unit sets the completion state feedback signal to be valid according to the delay clock beat number of the current extended instruction sent by the multi-beat instruction delay mapping storage unit, and after the delay beat after receiving the instruction request reaches the value, the RiscV CPU is informed to sample and receive the operation result.
EXAMPLE III
As shown in fig. 4, compared with the coprocessor of the first embodiment or the second embodiment, the coprocessor of the present embodiment further includes an extended instruction pipeline state storage unit and an extended instruction pipeline control unit; the extended instruction pipeline state storage unit is connected to the RiccV CPU through the extended instruction pipeline control unit; the RiscV CPU is notified to stall the pipeline when it needs to stall the pipeline waiting for the instruction to complete the operation.
The extended instruction pipeline state storage unit stores information whether all extended instructions need the CPU to stop the pipeline, inquires whether the current extended instructions need the CPU to stop the pipeline after receiving an extended instruction request from the RiccV CPU, and sends the information to the extended instruction pipeline control unit; and the extended instruction pipeline control unit transmits the received information to the RiscV CPU and informs the RiscV CPU whether to stop the current pipeline operation until the extended instruction finishes the operation.
The overall work flow of the invention is as follows:
1. when the instructions to be executed by the RiccV CPU are extended instructions, the extended instruction request is sent to the extended instruction operation unit, and meanwhile, pipeline control information and completion state feedback information are inquired and read, wherein the pipeline control information indicates whether the extended instructions need the RiccV CPU to stop all current pipelines or not, and when the instructions are executed, the completion state feedback information is used for indicating whether the extended instructions finish the operation or not according to the current clock beat, and the RiccV CPU can sample and receive the instruction results through an instruction result return end.
2. The extended instruction arithmetic unit completes instruction arithmetic after receiving the extended instruction request and sends the arithmetic result back to the RiccV CPU, when the extended instruction arithmetic unit needs a plurality of clock beats to complete arithmetic, the multi-beat instruction delay mapping storage unit inquires the delay beat number corresponding to the instruction after receiving the extended instruction request and sends the delay beat number to the completion feedback control unit, after the delay beat reaches the value, the completion state feedback signal is set to be effective through the feedback control unit, and the RiccV CPU is informed to sample and receive the arithmetic result.
3. Meanwhile, after receiving the extended instruction request, the RiccV CPU inquires whether the instruction corresponds to the instruction and needs to stop the pipeline of the RiccV CPU to wait for the instruction to complete the operation, sends the inquiry result to the extended instruction pipeline control unit, and then the extended instruction pipeline control unit sends the information to the RiccV CPU to inform the RiccV CPU whether to stop the current pipeline operation until the extended instruction operation unit finishes the operation.
4. After the operation of the extended instruction operation unit is finished, the RiccV CPU samples and receives and executes the operation result of the extended instruction operation unit according to the completion state feedback signal, and after the instruction execution is finished, the next instruction is executed.
Although specific embodiments of the invention have been described above, it will be understood by those skilled in the art that the specific embodiments described are illustrative only and are not limiting upon the scope of the invention, and that equivalent modifications and variations can be made by those skilled in the art without departing from the spirit of the invention, which is to be limited only by the appended claims.

Claims (9)

1. A neural network coprocessor based on a RiscV extended instruction, comprising: the system comprises an extended instruction arithmetic unit, a processor and a control unit, wherein the extended instruction arithmetic unit is connected to a RiccV CPU;
and when receiving an extended instruction request of the RiscV CPU, the extended instruction operation unit classifies each extended instruction into a plurality of instruction operation levels according to the dependency relationship, configures the operation units required by each extended instruction of each operation level, the connection relationship among the operation units and the parallelism of the operation units, completes instruction operation according to the configuration information, and outputs the instruction operation to the RiscV CPU.
2. The RiscV instruction-based neural network coprocessor of claim 1, wherein: the extended instruction arithmetic unit comprises an instruction decoding distribution unit, an instruction grade mapping storage unit, a result output selection unit, an arithmetic basic unit array consisting of a plurality of basic arithmetic circuits, a plurality of grades of interconnection control units and a plurality of grades of interconnection configuration units;
the instruction level mapping storage unit is respectively connected with the instruction decoding distribution unit and the result output selection unit;
the input and the output of each basic operational circuit are respectively connected to the output and the input of any other basic operational circuit through the interconnection control unit of each grade;
the interconnection control unit of each grade is connected with the instruction decoding distribution unit, the result output selection unit and the interconnection configuration unit of a corresponding grade.
3. The RiscV extended instruction-based neural network coprocessor of claim 2, wherein:
the arithmetic basic unit array includes:
an adder group composed of a plurality of adders;
a multiplier group composed of a plurality of multipliers;
an OR operation group composed of a plurality of OR operators;
an inverse operation group composed of a plurality of inverse operators;
the LUT lookup operation group consists of a plurality of LUT table storage units and a plurality of LUT lookup configuration units, and each LUT table storage unit is correspondingly connected with one LUT lookup configuration unit;
the interconnection control unit includes:
a first-stage interconnection control unit, which is respectively connected with each adder, each multiplier, each or operator, each inverse operator, and each LUT table storage unit;
a second-level interconnection control unit respectively connected to each of the adders, each of the multipliers, each of the or operators, each of the inverse operators, and each of the LUT table storage units;
a three-level interconnection control unit respectively connected to each of the adders, each of the multipliers, each of the or operators, each of the inverse operators, and each of the LUT table storage units;
the interconnection configuration unit includes:
the primary interconnection configuration unit is connected with the primary interconnection control unit;
the second-level interconnection configuration unit is connected with the second-level interconnection control unit;
and the third-level interconnection configuration unit is connected with the second-level interconnection control unit.
4. The RiscV extended instruction based neural network coprocessor of any of claims 1-3, wherein: the system also comprises a multi-beat instruction delay mapping storage unit and a feedback control completion unit; the multi-beat instruction delay mapping storage unit is connected to the RiscV CPU through the completion feedback control unit;
the multi-beat instruction delay mapping storage unit stores multi-beat delay information corresponding to all the extended instructions, after receiving an extended instruction request from the RiscV CPU, the multi-beat instruction delay mapping storage unit inquires the delay beat number corresponding to the extended instruction and sends the delay beat number to the completion feedback control unit, then the feedback control unit sets the completion state feedback signal to be valid according to the delay clock beat number of the current extended instruction sent by the multi-beat instruction delay mapping storage unit, and after the delay beat after receiving the instruction request reaches the value, the RiscV CPU is informed to sample and receive the operation result.
5. The RiscV instruction-based neural network coprocessor of claim 4, wherein: the system also comprises an extended instruction pipeline state storage unit and an extended instruction pipeline control unit; the extended instruction pipeline state storage unit is connected to the RiccV CPU through the extended instruction pipeline control unit;
the extended instruction pipeline state storage unit stores information whether all extended instructions need the CPU to stop the pipeline, inquires whether the current extended instruction needs the RiccVCPU to stop the pipeline after receiving an extended instruction request from the RiccV CPU, and sends the information to the extended instruction pipeline control unit; and the extended instruction pipeline control unit transmits the received information to the RiscV CPU and informs the RiscV CPU whether to stop the current pipeline operation until the extended instruction finishes the operation.
6. A neural network coprocessing method based on a RiccV extended instruction is characterized by comprising the following steps: the configurable coprocessor is provided with an extended instruction arithmetic unit, the extended instruction arithmetic unit receives an application of an extended instruction sent by the RiscV CPU, classifies each extended instruction into a plurality of instruction arithmetic levels according to the subordination relation, configures the arithmetic unit required by each extended instruction of each arithmetic level, the connection relation among the arithmetic units and the parallelism of each arithmetic unit, completes the instruction operation according to the configuration information and outputs the instruction operation to the RiscV CPU.
7. The method of claim 6, wherein the neural network coprocessing method based on the RiscV extended instruction comprises the following steps: the extended instruction arithmetic unit comprises an instruction decoding distribution unit, an instruction grade mapping storage unit, a result output selection unit, an arithmetic basic unit array consisting of a plurality of basic arithmetic circuits, a plurality of grades of interconnection control units and a plurality of grades of interconnection configuration units;
the instruction level mapping storage unit is respectively connected with the instruction decoding distribution unit and the result output selection unit; the input and the output of each basic operational circuit are respectively connected to the output and the input of any other basic operational circuit through the interconnection control unit of each grade; the interconnection control unit of each grade is connected with the instruction decoding distribution unit, the result output selection unit and the interconnection configuration unit of a corresponding grade.
The instruction decoding distribution unit reads the information of which grade the current expansion instruction corresponds to in the instruction grade mapping storage unit and distributes the current expansion instruction to the interconnection control unit of the corresponding grade for operation;
after receiving the extension instruction, the interconnection control unit at any level inquires the operation units needed to be used corresponding to the current extension instruction, the connection relation among the operation units and the configuration information of the parallelism degree of the operation units from the interconnection configuration unit at the corresponding level; then configuring corresponding interconnection forms according to the configuration information and controlling data flow;
and then, the interconnection control unit of any grade starts to carry out instruction operation through the basic operation circuit, and sends a result to the result output selection unit after the instruction operation is finished.
8. The method for neural network co-processing based on the RiscV extended instruction according to claim 6 or 7, wherein: the configurable coprocessor further comprises a multi-beat instruction delay mapping storage unit and a feedback completion control unit;
the multi-beat instruction delay mapping storage unit stores multi-beat delay information corresponding to all the extended instructions, inquires the delay beat number corresponding to the extended instruction after receiving the extended instruction request from the RiscV CPU, and sends the delay beat number to the completion feedback control unit;
and the feedback control unit sets the completion state feedback signal to be effective after the delay beat after receiving the instruction request reaches the value according to the delay clock beat number of the current extended instruction sent by the multi-beat instruction delay mapping storage unit, and informs the RiccV CPU to sample and receive the operation result.
9. The method for neural network coprocessing based on the RiscV extended instruction, according to claim 8, is characterized in that: the configurable coprocessor further comprises an extended instruction pipeline state storage unit and an extended instruction pipeline control unit;
the extended instruction pipeline state storage unit stores information whether all extended instructions need the CPU to stop the pipeline, inquires whether the current extended instruction needs the RiccVCPU to stop the pipeline after receiving an extended instruction request from the RiccV CPU, and sends the information to the extended instruction pipeline control unit;
and the extended instruction pipeline control unit transmits the received information to the RiscV CPU and informs the RiscV CPU whether to stop the current pipeline operation until the extended instruction finishes the operation.
CN201910671987.2A 2019-07-24 2019-07-24 Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof Active CN110502278B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910671987.2A CN110502278B (en) 2019-07-24 2019-07-24 Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910671987.2A CN110502278B (en) 2019-07-24 2019-07-24 Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof

Publications (2)

Publication Number Publication Date
CN110502278A true CN110502278A (en) 2019-11-26
CN110502278B CN110502278B (en) 2021-07-16

Family

ID=68586692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910671987.2A Active CN110502278B (en) 2019-07-24 2019-07-24 Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof

Country Status (1)

Country Link
CN (1) CN110502278B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113193950A (en) * 2021-07-01 2021-07-30 广东省新一代通信与网络创新研究院 Data encryption method, data decryption method and storage medium
JP2021111313A (en) * 2019-12-31 2021-08-02 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Information processing method and apparatus
EP3971712A1 (en) * 2020-09-22 2022-03-23 Beijing Baidu Netcom Science And Technology Co., Ltd. Voice processing system and method, electronic device and readable storage medium
WO2023284130A1 (en) * 2021-07-15 2023-01-19 深圳供电局有限公司 Chip and control method for convolution calculation, and electronic device

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1140857A (en) * 1995-04-28 1997-01-22 松下电器产业株式会社 Information processing device equipped with a coprocessor which efficiently uses register data in main processor
US7082419B1 (en) * 1999-02-01 2006-07-25 Axeon Limited Neural processing element for use in a neural network
CN102750127A (en) * 2012-06-12 2012-10-24 清华大学 Coprocessor
CN104391674A (en) * 2014-10-22 2015-03-04 积成电子股份有限公司 Sampling value linear interpolation calculation device based on FPGA (Field Programmable Gate Array) and calculation method
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array
CN105930132A (en) * 2016-03-14 2016-09-07 上海剑桥科技股份有限公司 Coprocessor
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN106990940A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of vector calculation device
CN108734274A (en) * 2017-04-24 2018-11-02 英特尔公司 Calculation optimization mechanism for deep neural network
CN108845828A (en) * 2018-05-29 2018-11-20 深圳市国微电子有限公司 A kind of coprocessor, matrix operation accelerated method and system
US20180349764A1 (en) * 2017-06-06 2018-12-06 The Regents Of The University Of Michigan Sparse Video Inference Processor For Action Classification And Motion Tracking
CN109144573A (en) * 2018-08-16 2019-01-04 胡振波 Two-level pipeline framework based on RISC-V instruction set
CN109542512A (en) * 2018-11-06 2019-03-29 腾讯科技(深圳)有限公司 A kind of data processing method, device and storage medium
CN109857460A (en) * 2019-02-20 2019-06-07 南京华捷艾米软件科技有限公司 Matrix convolution calculation method, interface, coprocessor and system based on RISC-V framework

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1140857A (en) * 1995-04-28 1997-01-22 松下电器产业株式会社 Information processing device equipped with a coprocessor which efficiently uses register data in main processor
US7082419B1 (en) * 1999-02-01 2006-07-25 Axeon Limited Neural processing element for use in a neural network
CN102750127A (en) * 2012-06-12 2012-10-24 清华大学 Coprocessor
CN104391674A (en) * 2014-10-22 2015-03-04 积成电子股份有限公司 Sampling value linear interpolation calculation device based on FPGA (Field Programmable Gate Array) and calculation method
CN105630735A (en) * 2015-12-25 2016-06-01 南京大学 Coprocessor based on reconfigurable computational array
CN106990940A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of vector calculation device
CN105930132A (en) * 2016-03-14 2016-09-07 上海剑桥科技股份有限公司 Coprocessor
CN106940815A (en) * 2017-02-13 2017-07-11 西安交通大学 A kind of programmable convolutional neural networks Crypto Coprocessor IP Core
CN108734274A (en) * 2017-04-24 2018-11-02 英特尔公司 Calculation optimization mechanism for deep neural network
US20180349764A1 (en) * 2017-06-06 2018-12-06 The Regents Of The University Of Michigan Sparse Video Inference Processor For Action Classification And Motion Tracking
CN108845828A (en) * 2018-05-29 2018-11-20 深圳市国微电子有限公司 A kind of coprocessor, matrix operation accelerated method and system
CN109144573A (en) * 2018-08-16 2019-01-04 胡振波 Two-level pipeline framework based on RISC-V instruction set
CN109542512A (en) * 2018-11-06 2019-03-29 腾讯科技(深圳)有限公司 A kind of data processing method, device and storage medium
CN109857460A (en) * 2019-02-20 2019-06-07 南京华捷艾米软件科技有限公司 Matrix convolution calculation method, interface, coprocessor and system based on RISC-V framework

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
阎强: "卷积神经网络处理器的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2021111313A (en) * 2019-12-31 2021-08-02 バイドゥ オンライン ネットワーク テクノロジー (ベイジン) カンパニー リミテッド Information processing method and apparatus
JP6998991B2 (en) 2019-12-31 2022-01-18 バイドゥ オンライン ネットワーク テクノロジー(ペキン) カンパニー リミテッド Information processing methods and equipment
EP3971712A1 (en) * 2020-09-22 2022-03-23 Beijing Baidu Netcom Science And Technology Co., Ltd. Voice processing system and method, electronic device and readable storage medium
JP2022051669A (en) * 2020-09-22 2022-04-01 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Voice processing system, voice processing method, electronic device, and readable storage medium
JP7210830B2 (en) 2020-09-22 2023-01-24 ベイジン バイドゥ ネットコム サイエンス テクノロジー カンパニー リミテッド Speech processing system, speech processing method, electronic device and readable storage medium
CN113193950A (en) * 2021-07-01 2021-07-30 广东省新一代通信与网络创新研究院 Data encryption method, data decryption method and storage medium
CN113193950B (en) * 2021-07-01 2021-12-10 广东省新一代通信与网络创新研究院 Data encryption method, data decryption method and storage medium
WO2023284130A1 (en) * 2021-07-15 2023-01-19 深圳供电局有限公司 Chip and control method for convolution calculation, and electronic device

Also Published As

Publication number Publication date
CN110502278B (en) 2021-07-16

Similar Documents

Publication Publication Date Title
CN110502278B (en) Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof
Yu et al. Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks
CN110689138B (en) Operation method, device and related product
KR100236527B1 (en) Single instruction multiple data processing using multiple banks of vector registers
Van Praet et al. Instruction set definition and instruction selection for ASIPs
US5815715A (en) Method for designing a product having hardware and software components and product therefor
JP2005531848A (en) Reconfigurable streaming vector processor
KR100491593B1 (en) Data processor
US20060026578A1 (en) Programmable processor architecture hirarchical compilation
CN110073329A (en) Memory access equipment calculates equipment and the equipment applied to convolutional neural networks operation
AU2014203218B2 (en) Memory configuration for inter-processor communication in an MPSoC
JP5154119B2 (en) Processor
JP2001142922A (en) Design method for semiconductor integrated circuit device
JPH07234792A (en) Compile processor
CN114510339B (en) Computing task scheduling method and device, electronic equipment and readable storage medium
KR20130114688A (en) Architecture optimizer
JP2001202397A (en) Architecture design supporting system for system-on-chip and architecture generating method
CN111047036B (en) Neural network processor, chip and electronic equipment
US6675289B1 (en) System and method for executing hybridized code on a dynamically configurable hardware environment
Marconi Online scheduling and placement of hardware tasks with multiple variants on dynamically reconfigurable field-programmable gate arrays
US7536534B2 (en) Processor capable of being switched among a plurality of operating modes, and method of designing said processor
CN111091181B (en) Convolution processing unit, neural network processor, electronic device and convolution operation method
Gealow et al. System design for pixel-parallel image processing
De Beeck et al. Crisp: A template for reconfigurable instruction set processors
Wittenburg et al. HiPAR-DSP: A parallel VLIW RISC processor for real time image processing applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 350000 building 18, 89 software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant after: Ruixin Microelectronics Co., Ltd

Address before: 350000 building 18, 89 software Avenue, Gulou District, Fuzhou City, Fujian Province

Applicant before: Fuzhou Rockchips Electronics Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant