CN113642722A - Chip for convolution calculation, control method thereof and electronic device - Google Patents

Chip for convolution calculation, control method thereof and electronic device Download PDF

Info

Publication number
CN113642722A
CN113642722A CN202110800143.0A CN202110800143A CN113642722A CN 113642722 A CN113642722 A CN 113642722A CN 202110800143 A CN202110800143 A CN 202110800143A CN 113642722 A CN113642722 A CN 113642722A
Authority
CN
China
Prior art keywords
convolution
instruction
parameter data
calculation
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110800143.0A
Other languages
Chinese (zh)
Inventor
吕启深
向真
李艳
薛荣
阳浩
邱方驰
余鹏
余英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Power Supply Bureau Co Ltd
Original Assignee
Shenzhen Power Supply Bureau Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Power Supply Bureau Co Ltd filed Critical Shenzhen Power Supply Bureau Co Ltd
Priority to CN202110800143.0A priority Critical patent/CN113642722A/en
Priority to PCT/CN2021/121621 priority patent/WO2023284130A1/en
Publication of CN113642722A publication Critical patent/CN113642722A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

The application relates to a chip for convolution calculation, which comprises a memory, a processor and a convolution calculation module, wherein the memory is used for storing convolution parameter data and convolution calculation results; the processor is connected with the memory, is based on a RISC-V open source instruction set architecture, and is used for receiving a self-customization instruction of a user and generating a control instruction based on the self-customization instruction; and the convolution calculation module is connected with the processor and the memory and is used for receiving the control instruction and the convolution parameter data, calculating based on the control instruction and the convolution parameter data and outputting a convolution calculation result. The chip for convolution calculation adopts the simplest structure RISC-V, so that a great number of redundant instructions can be omitted, the kernel design is simple, and the power consumption is reduced. Meanwhile, the convolution acceleration calculation is realized by a convolution calculation module instead of software application in a kernel, so that the convolution acceleration calculation speed is greatly improved.

Description

Chip for convolution calculation, control method thereof and electronic device
Technical Field
The present disclosure relates to the field of pulse modulation, and in particular, to a chip for convolution calculation, a control method thereof, and an electronic device.
Background
With the popularity of deep neural networks, the number and variety of neural network computing accelerator products has increased dramatically, and neural network computing accelerator application-specific chips have become a key component of many consumer, communication, medical, and industrial products that use various hardware approaches to implement certain specific kinds of computations. Convolution is the most common one, and this operation is rather large in neural network calculation, consuming a lot of calculation time and power consumption. And the convolution parameters and types are complicated, and a single calculation structure cannot well finish the calculation acceleration of various convolutions, so that the conventional convolution accelerator needs certain configurability, can be reconfigured on an application site, and further meets the calculation requirement which changes at any time. In such accelerator chips, it is often necessary to embed a processor to implement operations such as configuration of chip functions, execution of code, and the like.
However, in the processor field of today, the mainstream architectures include x86 architecture and ARM architecture, which, although the technology is mature, retain many lagging instructions for compatibility, resulting in large instruction number and serious instruction redundancy, and the area and power consumption are inevitably large when designing new processors with them.
Furthermore, the use of the commercial x86 and ARM architectures also presents high patent and licensing costs, which can cause high learning costs for later designers and many concessions in processor design.
Disclosure of Invention
Therefore, it is necessary to provide a chip for convolution calculation, a control method thereof, and an electronic apparatus, which are based on RISC-V open source instruction set architecture, so that a user can perform custom operation, convolution calculation speed is effectively increased, power consumption optimization is realized, and cost is reduced.
One aspect of the present application provides a chip for convolution calculation, including a memory, a processor, and a convolution calculation module, where the memory is used to store convolution parameter data and convolution calculation results; the processor is connected with the memory, is based on a RISC-V open source instruction set architecture, and is used for receiving a self-customization instruction of a user and generating a control instruction based on the self-customization instruction; and the convolution calculation module is connected with the processor and the memory and is used for receiving the control instruction and the convolution parameter data, calculating based on the control instruction and the convolution parameter data and outputting a convolution calculation result.
In the chip for convolution calculation in the above embodiment, the memory is provided for storing convolution parameter data and convolution calculation results, so that the processor and the convolution calculation module connected to the memory can extract data in the memory at any time, and based on an instruction issued by a user to the processor, complex convolution calculation can be realized. The chip for convolution calculation adopts the simplest structure RISC-V, so that a great number of redundant instructions can be omitted, the kernel design is simple, and the power consumption is reduced. Meanwhile, the convolution acceleration calculation is realized by a convolution calculation module instead of software application in a kernel, so that the convolution acceleration calculation speed is greatly improved.
In one embodiment, the processor comprises a basic instruction submodule and an extended instruction submodule, wherein the basic instruction submodule is used for realizing a standard instruction set defined by a RISC-V standard; the extended instruction submodule is used for realizing a user-defined self-customization instruction set.
In one embodiment, the convolution calculation module comprises a register set and a matrix module, wherein the register set is connected with the extended instruction submodule and is used for realizing information interaction between the extended instruction submodule and the calculation module; the matrix module is connected with the extended instruction submodule through the register group and the memory and is used for receiving the control instruction and the convolution parameter data, carrying out convolution calculation based on the control instruction and the convolution parameter data and outputting a convolution calculation result.
In one embodiment, the register set includes a command register and a response register, where the command register is connected to both the extended instruction submodule and the matrix module, and is configured to receive the control instruction and generate a control signal based on the control instruction; the response register is connected with the extended instruction submodule and the calculation module and is used for acquiring the convolution calculation result and generating a response signal based on the convolution calculation result.
In one embodiment, the control signals include operation control signals and precision control signals, the matrix module includes a predetermined number of computing units, and each computing unit includes a control register, a data register and a multiplier-adder, wherein the control register is connected to the command register set and is configured to send the received operation control signals and the received precision control signals to the data register; the data register is connected with the control register and the memory and is used for sending the received operation control signal, the precision control signal and the convolution parameter data to the multiplier-adder; and the multiplier-adder is connected with the data register and used for receiving the operation control signal, the precision control signal and the convolution parameter data, calculating the convolution parameter data based on the operation control signal and the precision control signal and outputting a calculation result.
In one embodiment, the convolution parameter data includes first convolution parameter data and second convolution parameter data, and the data register includes a first input data chain and a second input data chain, where the first input data chain is connected to the memory and is used for storing and/or outputting the first convolution parameter data; and the second input data chain is connected with the memory and used for storing and/or outputting second convolution parameter data.
In one embodiment, the control signal further comprises a displacement control signal, and the processor is further configured to: acquiring the self-customization instruction and the convolution parameter data; generating the displacement control signal based on the self-customizing instruction and the convolution parameter data; controlling a displacement of second convolution parameter data in the second input data chain based on the displacement control signal.
In one embodiment, the matrix module further includes an adder, and the adder is configured to obtain the calculation result of each calculation unit, perform summation operation on the calculation result of the calculation unit, and output a convolution calculation result.
Another aspect of the present application provides an electronic device including any one of the chips for convolution calculation described in the embodiments of the present application.
The electronic device in the above embodiment adopts the chip for convolution calculation in the embodiment of the present application, and the memory is configured to store convolution parameter data and convolution calculation results, so that the processor and the convolution calculation module connected to the memory can extract data in the memory at any time, and based on an instruction issued by a user to the processor, complex convolution calculation can be realized. RISC-V is used as a new open source instruction set, has no high patent authorization cost, supports the self-defined instruction of a designer, adopts a chip based on a RISC-V framework, and ensures that the designer can carry out special optimization on the electronic device according to the application scene, thereby not only reducing the energy consumption and the cost of the electronic device, but also ensuring that the application range is wider and the applicability is higher.
Yet another aspect of the present application provides a control method for convolution calculation, including the steps of:
acquiring a user self-customization instruction;
a processor based on a RISC-V open source instruction set architecture generates a control instruction according to the user self-customized instruction;
and acquiring convolution parameter data and the control instruction based on a convolution calculation module, calculating based on the control instruction and the convolution parameter data, and outputting a convolution calculation result.
The control method for convolution calculation in the application utilizes the characteristic that the RISC-V kernel can define the instruction set by a user, can realize convolution calculation only by a small amount of instruction subsets, is more than hundreds of instruction sets of a common x86 or ARM instruction set processor, and is more than other architectures on the scale of the RISC-V kernel-based processor, so the power consumption is lower.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain drawings of other embodiments based on these drawings without any creative effort.
Fig. 1 is a schematic diagram of a chip structure for convolution calculation according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a chip structure for convolution calculation according to another embodiment of the present application;
FIG. 3 is a schematic diagram of a chip structure for convolution calculation according to another embodiment of the present application;
FIG. 4 is a diagram illustrating a chip structure for convolution calculation according to still another embodiment of the present application;
FIG. 5 is a schematic diagram of a control signal controlled convolution calculation process according to an embodiment of the present application;
FIG. 6 is a diagram of a data register according to an embodiment of the present application;
FIG. 7 is a flowchart illustrating a control method for convolution calculation according to an embodiment of the present application;
fig. 8 is a flowchart illustrating a control method for convolution calculation according to another embodiment of the present application.
Detailed Description
To facilitate an understanding of the present application, the present application will now be described more fully with reference to the accompanying drawings. Preferred embodiments of the present application are illustrated in the accompanying drawings. This application may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Where the terms "comprising," "having," and "including" are used herein, another element may be added unless an explicit limitation is used, such as "only," "consisting of … …," etc. Unless mentioned to the contrary, terms in the singular may include the plural and are not to be construed as being one in number.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present application.
In this application, unless otherwise expressly stated or limited, the terms "connected" and "connecting" are used broadly and encompass, for example, direct connection, indirect connection via an intermediary, communication between two elements, or interaction between two elements. The specific meaning of the above terms in the present application can be understood by those of ordinary skill in the art as appropriate.
In the processor field, the architecture currently in the mainstream is x86 and ARM architecture, and after decades of development, the architecture documents of the modern x86 and ARM architecture are hundreds and thousands of pages long and have numerous versions, one main reason is that the development process of the architecture is accompanied by the continuous development and maturity of the modern processor architecture technology, and as a commercial architecture, in order to maintain the backward compatibility of the architecture, the architecture has to retain many outdated definitions, or in order to define new architecture parts, the existing technology parts become extremely tedious over time. However, RISC-V is an instruction set that, in contrast to most instruction sets, is free to serve any purpose, allowing anyone to design, manufacture and market RISC-V chips and software. The wireless sensor network data transmission method gives consideration to the transmission quantity and the transmission speed of data, is an excellent framework of the heterogeneous IoT era, and a series of ecology derived from the heterogeneous IoT era is gradually improved. RISC-V also has the advantages that: the source is opened, the design cost of the CPU is low, and great economic benefit can be created by any innovation generated in the hardware through cooperation; the method is simple, only 40 basic instruction sets are provided, and the high requirements of the embedded type and the Internet of things on the code volume are met; and the method is flexible, reserves a large amount of coding space and 4 user instructions, and can be used for expanding an instruction set. While RISC-V is not the first open source instruction set, it is of great interest because its design makes it suitable for use in modern computing devices. Designers have considered performance and power efficiency in these applications. The instruction set also has a lot of supporting software, which addresses the common weaknesses of new instruction sets.
In an embodiment of the present application, as shown in fig. 1, a chip 100 for convolution calculation is provided, which includes a memory 10, a processor 20 and a convolution calculation module 30, wherein the memory 10 is used for storing convolution parameter data and convolution calculation results; the processor 20 is connected with the memory 10, and is used for receiving a self-customization instruction of a user based on a RISC-V open source instruction set architecture and generating a control instruction based on the self-customization instruction; the convolution calculation module 30 is connected to both the processor 20 and the memory 10, and is configured to receive the control instruction and the convolution parameter data, perform calculation based on the control instruction and the convolution parameter data, and output a convolution calculation result.
In the chip for convolution calculation in the above embodiment, the memory is provided for storing convolution parameter data and convolution calculation results, so that the processor and the convolution calculation module connected to the memory can extract data in the memory at any time, and based on an instruction issued by a user to the processor, complex convolution calculation can be realized. The chip for convolution calculation adopts the simplest structure RISC-V, so that a great number of redundant instructions can be omitted, the kernel design is simple, and the power consumption is reduced. Meanwhile, the convolution acceleration calculation is realized by a convolution calculation module instead of software application in a kernel, so that the convolution acceleration calculation speed is greatly improved.
In one embodiment, as shown in fig. 2, the processor 20 includes a basic instruction sub-module 21 and an extended instruction sub-module 22, wherein the basic instruction sub-module 21 is used for implementing a standard instruction set defined by RISC-V standard; the extended instructions submodule 22 is operable to implement a user-defined, self-customised instruction set.
Specifically, the most significant difference of the RISC-V architecture compared with other mature commercial architectures is that it is a modular architecture, and the processor conforming to the RISC-V ISA standard in this application includes a basic instruction submodule 21, which is used to implement a standard instruction set defined by the RISC-V standard, where the standard instruction set includes RV32I, RV32E, RV64I and RV128I, where RV32I is a 32-bit integer instruction set, RV32E is a subset of RV32I, and is used in a small embedded scenario, RV64I is a 64-bit integer instruction set, and is compatible with RV32I, RV128I is a 128-bit integer instruction set, and is compatible with RV64I and RV 32I; the processor 20, which conforms to the RISC-V ISA standard, also includes an extended instruction submodule 22 for implementing a user-defined, self-customized instruction set. Not only is the RISC-V architecture compact, but the different parts can be organized in a modular fashion, trying to meet a variety of different applications with a unified architecture, which is not available with the x86 and ARM architectures. The RISC-V ISA open source means that corresponding chip architectures can be created for different application scenes, application acceleration can be more efficient by means of corresponding customized instruction tools, and the characteristic of multi-core isomerism also promotes power consumption optimization.
In one embodiment, as shown in fig. 3, the convolution calculation module 30 includes a register group 31 and a matrix module 32, where the register group 31 is connected to the extended instruction submodule 22, and is used to implement information interaction between the extended instruction submodule 22 and the matrix module 32; the matrix module 32 is connected to the extended instruction submodule 22 through the register group 30 and the memory 10, and is configured to receive the control instruction and the convolution parameter data, perform convolution calculation based on the control instruction and the convolution parameter data, and output a convolution calculation result.
Specifically, the control instruction of the extended instruction submodule is received and stored through the register group and is sent to the matrix module, signal interaction of the extended instruction submodule for controlling the matrix module to carry out operation is completed, in addition, the matrix module receives the control instruction, generates a response signal and sends the response signal to the extended instruction submodule through the register group, and signal interaction of the matrix module for feeding back the extended instruction submodule is completed.
In one embodiment, as shown in fig. 4, the register set includes a command register and a response register, where the command register is connected to both the extended instruction submodule and the matrix module, and is configured to receive the control instruction and generate a control signal based on the control instruction; the response register is connected with the extended instruction submodule and the calculation module and is used for acquiring the convolution calculation result and generating a response signal based on the convolution calculation result.
Specifically, the control instruction of the extended instruction submodule is received and stored through the set command register and is sent to the matrix module, the extended instruction submodule controls the matrix module to carry out signal interaction of operation, in addition, the calculation result of the matrix module is received and stored through the set response register and is sent to the extended instruction submodule, and the signal interaction of the matrix module for feeding back the extended instruction submodule is completed.
In one embodiment, please refer to fig. 4 again, where the control signals include operation control signals and precision control signals, the matrix module includes a predetermined number of computing units, and any computing unit includes a control register, a data register, and a multiplier-adder, where the control register is connected to the command register set and is configured to send the received operation control signals and the received precision control signals to the data register; the data register is connected with the control register and the memory and is used for sending the received operation control signal, the precision control signal and the convolution parameter data to the multiplier-adder; and the multiplier-adder is connected with the data register and used for receiving the operation control signal, the precision control signal and the convolution parameter data, calculating the convolution parameter data based on the operation control signal and the precision control signal and outputting a calculation result.
Specifically, the convolution calculation module comprises a plurality of groups of different command registers and response registers, wherein the formats and meanings of the command registers and the response registers can be freely customized, and the command registers and the response registers correspond to the self-defined expansion instructions of the expansion instruction submodule in the processor, namely the processor has special instructions, and a single instruction can complete access and control on the convolution calculation module. The processor and the convolution calculation module can both initiate read-write requests to the memory, the two modules share the memory, and the read-write sequence is arbitrated by the memory controller, so that the control efficiency is greatly improved.
Further, the control signal includes an operation control signal and a precision control signal, for example, as shown in fig. 5, the operation control signal is a 2-bit OP code, the precision control signal is a 1-bit OP code, and the calculating unit performs the following operations according to the operation control signal OP code: 00 → Z ═ X × Y +0 ═ X × Y (multiplication), 10 → Z ═ 1 × Y + X ═ X + Y (addition), 11 → Z ═ 1 × Y + X ═ X-Y (subtraction). The calculation unit executes the following operations according to the precision control signal OP code: 0 → 16 bit precision, 1 → 8 bit progress. Namely, the control signal can control the operation process of the convolution calculation module through simple coding.
In one embodiment, as shown in fig. 6, the convolution parameter data includes a first convolution parameter data and a second convolution parameter data, and the data register includes a first input data chain and a second input data chain, where the first input data chain is connected to the memory and is used for storing and/or outputting the first convolution parameter data; and the second input data chain is connected with the memory and used for storing and/or outputting second convolution parameter data.
Specifically, the calculation unit may read convolution parameter data in the memory by setting a first input data chain and a second input data chain connected to the memory.
As an example, in the one-dimensional convolution calculation, the one-dimensional convolution formula of order 64 is as follows:
Figure BDA0003164341080000101
the chip using the convolution calculation in the embodiment of the application only needs to use the coefficient akAnd loading data into a memory to be stored as first convolution parameter data, loading sampling points x (n-k) into the memory to be stored as second convolution parameter data, respectively reading the first convolution parameter data and the second convolution parameter data by a first input data chain and a second input data chain, and conveying the first convolution parameter data and the second convolution parameter data to a calculating unit to finish a convolution calculating process.
In one embodiment, the control signal further comprises a displacement control signal, and the processor is further configured to: acquiring the self-customization instruction and the convolution parameter data; generating the displacement control signal based on the self-customizing instruction and the convolution parameter data; controlling a displacement of second convolution parameter data in the second input data chain based on the displacement control signal.
As an example, two-dimensional convolution is used for image spatial domain processing. The smoothing and sharpening of an image is obtained by convolving an image pixel matrix B (n × n) with a specific mask matrix W (m × m), for example, when m is 3, a new pixel point can be calculated by the following equation:
Figure BDA0003164341080000111
with the convolution computed chip in the embodiment of the present application, the first input data chain is used to store the coefficients in the mask W, as in the case of one-dimensional convolution, and we can extract a window of pixels from a single data stream. The image pixels are input line by line from the point of the convolution window up to the point of the convolution window down until two complete lines and three pixels of the third line of the convolution window are stored in the second input data chain, at which point a 3 x 3 convolution can be performed, i.e. the data in the first data chain is multiplied by the corresponding elements in the second data chain and summed. And from this moment on, the convolution window is shifted by one position each time a new pixel is inserted into the second input data chain. Specifically, the displacement of the second convolution parameter data in the second input data chain is controlled by the displacement control signal, and since the second input data chain stores 64 pixels at most, the image must be cut at a width of 30 pixels in the vertical direction. The image processing for the cut point and the repositioning of the convolution window are both controlled by the displacement control signal.
In one embodiment, the matrix module further includes an adder, and the adder is configured to obtain the calculation result of each calculation unit, perform summation operation on the calculation result of the calculation unit, and output a convolution calculation result.
Specifically, with continued reference to fig. 6, the matrix module further includes an adder, and the adder sums the outputs of the 64 calculation units, and the final output is the convolution calculation result.
Another aspect of the present application provides an electronic device including any one of the chips for convolution calculation described in the embodiments of the present application.
The electronic device in the above embodiment adopts the chip for convolution calculation in the embodiment of the present application, and the memory is configured to store convolution parameter data and convolution calculation results, so that the processor and the convolution calculation module connected to the memory can extract data in the memory at any time, and based on an instruction issued by a user to the processor, complex convolution calculation can be realized. RISC-V is used as a new open source instruction set, has no high patent authorization cost, supports the self-defined instruction of a designer, adopts a chip based on a RISC-V framework, and ensures that the designer can carry out special optimization on the electronic device according to the application scene, thereby not only reducing the energy consumption and the cost of the electronic device, but also ensuring that the application range is wider and the applicability is higher.
Yet another aspect of the present application provides a control method for convolution calculation, as shown in fig. 7, including the steps of:
step S202: acquiring a user self-customization instruction;
step S204: a processor based on a RISC-V open source instruction set architecture generates a control instruction based on the user self-customization instruction;
step S206: and acquiring convolution parameter data and the control instruction based on a convolution calculation module, calculating based on the control instruction and the convolution parameter data, and outputting a convolution calculation result.
The control method for convolution calculation in the application utilizes the characteristic that the RISC-V kernel can define the instruction set by a user, can realize convolution calculation only by a small amount of instruction subsets, is more than hundreds of instruction sets of a common x86 or ARM instruction set processor, and is more than other architectures on the scale of the RISC-V kernel-based processor, so the power consumption is lower.
Specifically, the processor based on the RISC-V open source instruction set architecture generates a control instruction based on the user self-customized instruction, and drives the convolution calculation module to execute the following tasks: setting the calculation precision of convolution calculation by setting 0 or 1 of a register configured by the convolution calculation module, setting an algorithm of product calculation by setting 00, 01 and 11 of the register configured by the convolution calculation module, executing enabling/disabling loading of a shift register by a control instruction, transferring a start address and a range of a calculation array reading memory, loading data of the memory into a calculation unit, reading a calculation instruction by the convolution calculation module, calculating, and storing a result into the memory.
As an example, as shown in fig. 8, first, an instruction configures the precision and operation type of the convolution computation module (PE), and shift control (shift not shift, how many times shift). Then, the coefficients of the convolution calculation are read in and loaded into the a groups of registers. Then reading in the image data to be calculated and loading the image data into the b groups of registers. And starting to calculate, and under the condition of more data, continuously shifting new data into the b groups of registers to finish convolution. And after one group of data is used up, the accumulator outputs a result to complete the one-dimensional convolution. If the multidimensional convolution is to be calculated, the method loops to the step of reading in the coefficients and loading a group of registers again. And completing the N-dimensional convolution after circulating for N times. And accumulating the results of each time, and finally outputting a convolution result.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), among others.
It should be noted that the above-mentioned embodiments are only for illustrative purposes and are not meant to limit the present invention.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A chip for convolution calculations, comprising:
the memory is used for storing convolution parameter data and convolution calculation results;
the processor is connected with the memory, is based on a RISC-V open source instruction set architecture, and is used for receiving a self-customization instruction of a user and generating a control instruction based on the self-customization instruction;
and the convolution calculation module is connected with the processor and the memory and used for receiving the control instruction and the convolution parameter data, calculating based on the control instruction and the convolution parameter data and outputting a convolution calculation result.
2. The chip of claim 1, wherein the processor comprises:
the basic instruction submodule is used for realizing a standard instruction set defined by RISC-V standard;
and the extended instruction submodule is used for realizing a user-defined self-customized instruction set.
3. The chip of claim 2, wherein the convolution computation module comprises:
the register group is connected with the extended instruction submodule and used for realizing information interaction between the extended instruction submodule and the computing module;
and the matrix module is connected with the extended instruction submodule through the register group and the memory and is used for receiving the control instruction and the convolution parameter data, performing convolution calculation based on the control instruction and the convolution parameter data and outputting a convolution calculation result.
4. The chip of claim 3, wherein the register set comprises:
the command register is connected with the extended instruction submodule and the matrix module and used for receiving the control instruction and generating a control signal based on the control instruction;
and the response register is connected with the extended instruction submodule and the calculation module and is used for acquiring the convolution calculation result and generating a response signal based on the convolution calculation result.
5. The chip of claim 4, wherein the control signals comprise operation control signals and precision control signals, the matrix module comprises a predetermined number of computing units, and any of the computing units comprises:
the control register is connected with the command register group and used for sending the received operation control signal and the precision control signal to a data register;
the data register is connected with the control register and the memory and is used for sending the received operation control signal, the precision control signal and the convolution parameter data to a multiplier-adder;
and the multiplier-adder is connected with the data register and used for receiving the operation control signal, the precision control signal and the convolution parameter data, calculating the convolution parameter data based on the operation control signal and the precision control signal and outputting a calculation result.
6. The chip of claim 5, wherein the convolution parameter data comprises a first convolution parameter data and a second convolution parameter data, and the data register comprises:
the first input data chain is connected with the memory and used for storing and/or outputting first volume parameter data;
and the second input data chain is connected with the memory and is used for storing and/or outputting second convolution parameter data.
7. The chip of claim 6, in which the control signals further comprise displacement control signals, the processor further configured to:
acquiring the self-customization instruction and the convolution parameter data;
generating the displacement control signal based on the self-customizing instruction and the convolution parameter data;
controlling a displacement of second convolution parameter data in the second input data chain based on the displacement control signal.
8. The chip of claim 7, wherein the matrix module further includes an adder, and the adder is configured to obtain the calculation result of each calculation unit, perform a summation operation on the calculation results of the calculation units, and output a convolution calculation result.
9. An electronic device, comprising:
the chip of any one of claims 1-8.
10. A control method for convolution calculations, the method comprising:
acquiring a user self-customization instruction;
a processor based on a RISC-V open source instruction set architecture generates a control instruction according to the user self-customized instruction;
and acquiring convolution parameter data and the control instruction based on a convolution calculation module, calculating based on the control instruction and the convolution parameter data, and outputting a convolution calculation result.
CN202110800143.0A 2021-07-15 2021-07-15 Chip for convolution calculation, control method thereof and electronic device Pending CN113642722A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110800143.0A CN113642722A (en) 2021-07-15 2021-07-15 Chip for convolution calculation, control method thereof and electronic device
PCT/CN2021/121621 WO2023284130A1 (en) 2021-07-15 2021-09-29 Chip and control method for convolution calculation, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110800143.0A CN113642722A (en) 2021-07-15 2021-07-15 Chip for convolution calculation, control method thereof and electronic device

Publications (1)

Publication Number Publication Date
CN113642722A true CN113642722A (en) 2021-11-12

Family

ID=78417464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110800143.0A Pending CN113642722A (en) 2021-07-15 2021-07-15 Chip for convolution calculation, control method thereof and electronic device

Country Status (2)

Country Link
CN (1) CN113642722A (en)
WO (1) WO2023284130A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117749736A (en) * 2024-02-19 2024-03-22 深圳市纽创信安科技开发有限公司 Chip and ciphertext calculation method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951395B (en) * 2017-02-13 2018-08-17 上海客鹭信息技术有限公司 Parallel convolution operations method and device towards compression convolutional neural networks
CN110490311A (en) * 2019-07-08 2019-11-22 华南理工大学 Convolutional neural networks accelerator and its control method based on RISC-V framework
CN110377874B (en) * 2019-07-23 2023-05-02 江苏鼎速网络科技有限公司 Convolution operation method and system
CN110502278B (en) * 2019-07-24 2021-07-16 瑞芯微电子股份有限公司 Neural network coprocessor based on RiccV extended instruction and coprocessing method thereof
CN112633505B (en) * 2020-12-24 2022-05-27 苏州浪潮智能科技有限公司 RISC-V based artificial intelligence reasoning method and system
CN112860320A (en) * 2021-02-09 2021-05-28 山东英信计算机技术有限公司 Method, system, device and medium for data processing based on RISC-V instruction set

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117749736A (en) * 2024-02-19 2024-03-22 深圳市纽创信安科技开发有限公司 Chip and ciphertext calculation method
CN117749736B (en) * 2024-02-19 2024-05-17 深圳市纽创信安科技开发有限公司 Chip and ciphertext calculation method

Also Published As

Publication number Publication date
WO2023284130A1 (en) 2023-01-19

Similar Documents

Publication Publication Date Title
JP7329533B2 (en) Method and accelerator apparatus for accelerating operations
CN112214726B (en) Operation accelerator
US20220357946A1 (en) Deep vision processor
US20210224125A1 (en) Operation Accelerator, Processing Method, and Related Device
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
US20200233803A1 (en) Efficient hardware architecture for accelerating grouped convolutions
US20230026006A1 (en) Convolution computation engine, artificial intelligence chip, and data processing method
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
CN108629406B (en) Arithmetic device for convolutional neural network
KR20190066473A (en) Method and apparatus for processing convolution operation in neural network
US11449739B2 (en) General padding support for convolution on systolic arrays
US11972348B2 (en) Texture unit circuit in neural network processor
CN111860807A (en) Fractal calculation device and method, integrated circuit and board card
CN113642722A (en) Chip for convolution calculation, control method thereof and electronic device
Wu Review on FPGA-based accelerators in deep learning
CN112966729A (en) Data processing method and device, computer equipment and storage medium
Lee et al. MVP: An Efficient CNN Accelerator with Matrix, Vector, and Processing-Near-Memory Units
KR20210014897A (en) Matrix operator and matrix operation method for artificial neural network
Hsiao et al. Comparison of Digit-Serial and Bit-Level Designs for Acceleration of Convolutional Neural Network Computation
Kang et al. Tensor virtualization technique to support efficient data reorganization for CNN accelerators
US11544213B2 (en) Neural processor
CN110765413A (en) Matrix summation structure and neural network computing platform
CN113469333B (en) Artificial intelligence processor, method and related products for executing neural network model
US20230259282A1 (en) Core group memory processsing unit architectures and configurations
US20220222509A1 (en) Processing non-power-of-two work unit in neural processor circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination