CN112667288A - Data operation circuit, data processing device, chip, card board and electronic equipment - Google Patents

Data operation circuit, data processing device, chip, card board and electronic equipment Download PDF

Info

Publication number
CN112667288A
CN112667288A CN201910979193.2A CN201910979193A CN112667288A CN 112667288 A CN112667288 A CN 112667288A CN 201910979193 A CN201910979193 A CN 201910979193A CN 112667288 A CN112667288 A CN 112667288A
Authority
CN
China
Prior art keywords
data
instruction
arithmetic
register
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910979193.2A
Other languages
Chinese (zh)
Inventor
不公告发明人
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Simm Computing Technology Co ltd
Original Assignee
Beijing Simm Computing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Simm Computing Technology Co ltd filed Critical Beijing Simm Computing Technology Co ltd
Priority to CN201910979193.2A priority Critical patent/CN112667288A/en
Priority to PCT/CN2020/103467 priority patent/WO2021073188A1/en
Publication of CN112667288A publication Critical patent/CN112667288A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses a data operation circuit, a data processing device, a chip, a card board and electronic equipment. The data arithmetic circuit includes: one or more arithmetic units; the arithmetic unit includes: the circuit comprises an input variable register, a first constant register, a comparator, a switch, a second constant register, a multiplier and an output variable register. The data arithmetic circuit provided by the embodiment of the invention is provided with the arithmetic unit which can realize the arithmetic function of the ReLU function, the Leaky ReLU function or the PReLU function, and the data arithmetic circuit has the advantages of high integration level, small area, high calculation efficiency and low power consumption.

Description

Data operation circuit, data processing device, chip, card board and electronic equipment
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data operation circuit, a data processing apparatus, a chip, a card board, and an electronic device.
Background
With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher.
Chips are the cornerstone of data processing, which fundamentally determines the ability of people to process data. From the application field, the chip mainly has two routes: one is a general chip route, such as a Central Processing Unit (CPU), which provides great flexibility but is less computationally efficient in Processing domain-specific algorithms; the other is a special chip route, such as a Tensor Processing Unit (TPU), which can exert higher effective computing power in some specific fields, but has poorer or even no Processing capability in the more versatile and general fields.
Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.
Disclosure of Invention
Objects of the invention
The invention aims to provide a data operation circuit, a data processing device, a chip, a card board and an electronic device. The data operation circuit can realize the operation function of the ReLU function, the Leaky ReLU function or the PReLU function, and has the advantages of high integration level, small area, high calculation efficiency and low power consumption.
(II) technical scheme
To solve the above problem, a first aspect of the present invention provides a data operation circuit, including: one or more arithmetic units; the arithmetic unit includes: the input variable register, the first constant register, the comparator, the switch, the second constant register, the multiplier and the output variable register; the input variable register and the second constant register are respectively connected to two input ends of the multiplier; the output end of the multiplier and the input variable register are respectively connected to the two input ends of the switch; the input variable register and the first constant register are respectively connected to two input ends of the comparator; the output end of the comparator is connected to the control end of the switch; the output end of the switch is connected to the output variable register.
In the data arithmetic circuit according to the embodiment of the present invention, the arithmetic unit is provided in the data arithmetic circuit, and the arithmetic unit can realize the arithmetic function of the ReLU function, the leakage ReLU function, or the prilu function. The data operation circuit has the advantages of high integration level, small area, high calculation efficiency and low power consumption.
Furthermore, the number of the operation units is multiple, and the operation units are arranged in an array.
In a second aspect of the present invention, a data processing apparatus is provided, which includes a storage module, a decoder, an instruction fetching module, a control module, and the data arithmetic circuit provided in the first aspect; the storage module is used for storing instructions and data; the control module is used for instructing the instruction fetching module to fetch the instruction from the storage module; the decoder is used for decoding the instruction and sending the decoded instruction to the data operation circuit; the data operation circuit is used for performing data operation based on the decoded instruction, taking the data out of the storage module and storing the operation result of the data into the storage module.
Further, the data operation circuit is configured to perform data operation based on the decoded instruction, take out the data from the storage module, and store an operation result of the data in the storage module, and specifically includes: each arithmetic unit in the data arithmetic circuit performs data arithmetic based on the decoded instruction, takes out the data from the storage module, and stores an arithmetic result of the data to the storage module.
Furthermore, the instruction is encoded by adopting a standard RISC-V instruction encoding mode.
Further, the format of the instruction is a standard RISC-V instruction format.
Further, the instruction is a matrix operation instruction, and the matrix operation instruction comprises an instruction name, a destination address register, a first source address register and a second source address register; the first source address register is used for storing source data, and the second source address register is used for storing a coefficient operated on the source data.
The second source address register may be a matrix address register, a vector address register, or a constant address register.
A third aspect of the invention provides a chip comprising one or more of the data processing apparatus provided in the second aspect.
A fourth aspect of the invention provides a card including one or more chips provided by the third aspect.
A fifth aspect of the invention provides an electronic device comprising one or more cards as provided in the fourth aspect.
(III) advantageous effects
The technical scheme of the invention has the following beneficial technical effects:
in the data arithmetic circuit according to the embodiment of the present invention, the arithmetic unit is provided in the data arithmetic circuit, and the arithmetic unit can realize the arithmetic function of the ReLU function, the leakage ReLU function, or the prilu function. The data operation circuit has the advantages of high integration level, small area, high calculation efficiency and low power consumption.
Drawings
FIG. 1 is a schematic diagram of a ReLU function;
FIG. 2 is a schematic diagram of a Leaky ReLU function or a PReLU function;
FIG. 3 is a schematic diagram of a data operation circuit according to an embodiment of the present invention;
FIG. 4 is a schematic diagram showing the configuration of an arithmetic unit in the data arithmetic circuit according to the embodiment of the present invention;
FIG. 5 is a schematic diagram of a data operation circuit according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
In the description of the present invention, it should be noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
FIG. 1 is a schematic diagram of a ReLU function; fig. 2 is a schematic diagram of the leakage ReLU function or the prilu function.
As shown in fig. 1 and 2, a functional Linear rectification function (ReLU), also called a modified Linear Unit, is an activation function (activation function) commonly used in artificial neural networks, and generally refers to a nonlinear function represented by a ramp function and its variants. In neural networks, ReLU (rectified Linear Unit) and its variants such as Leaky ReLU, PReLU, etc. are commonly used nonlinear activation functions.
In the ReLU function shown in fig. 1, Yi is 0 when xi is equal to or greater than 0, and Yi is 0 when xi is less than 0, so that the ReLU function can implement two calculation methods according to the difference in xi values.
In the Leaky ReLU function or the PReLU function shown in FIG. 2, when xi is greater than or equal to 0, Yi is xi, and when xi is less than 0, Yi is aixi, therefore, the Leaky ReLU function or the PReLU function can realize two calculation functions according to the difference of xi values.
Based on the characteristics of the above functions, multiple unit circuits with separate functions are required to be combined with step processing of a computer program to realize the calculation function of the above functions. The various individual functional unit circuits are here for example multipliers and comparators.
As an example, the following is a computer software program that implements the functionality of the ReLU function, the leakage ReLU function, or the prilu function for the above-described scheme:
Figure BDA0002234625650000051
it should be noted that "If (X) in the above computer codei>=0)Ci=1;Else C i0; "means that if X isi≧ 0, then 1 is assigned to CiOtherwise, 0 is given to CiIn this part of the computer program, the comparator function is implemented in conjunction with the first separate unit circuit.
If (C) in the above computer codei==1)Yi=Xi(ii) a "means if CiWhen the value is 1, X is addediImparting YiThe part of the computer program is the second single partIn conjunction with the first separate cell circuit, again performs the function of the comparator.
"Else Y" in the above computer codei=ai*Xi"means that if Ci ≠ 1, then ai*XiImparting YiThe part is that a third single unit circuit is combined with a second single unit circuit to realize the function of a multiplier.
It can be seen that in the above solution, the computer software program is required to combine three separate unit circuits to call three times, so as to be able to implement the function of the ReLU function. If each element of a certain matrix needs to realize the function of the function, then during calculation, the function operation of the matrix can be realized only by controlling the multiple calls of three independent unit circuits in sequence, which results in complex operation sequence, long calculation time, more independent unit circuits, more calls, large power consumption, large area of the calculation device and high cost.
The data arithmetic circuit provided by the embodiment of the present invention will be described in detail below. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
Fig. 3 is a schematic structural diagram of a data operation circuit according to an embodiment of the present invention.
As shown in fig. 3, the data operation circuit includes an arithmetic operation unit, a logical operation unit, an access/fetch data unit, and a floating point operation unit. The arithmetic operation unit realizes arithmetic operation through a multiplier of the data operation circuit. The logic operation unit may be implemented by a comparator of the data operation circuit. The floating-point arithmetic unit may be implemented by a multiplier. The data arithmetic circuit is provided with one or more data arithmetic units PU, each of which can realize comparison and multiplication functions, and further realize the functions of a ReLU function, a Leaky ReLU function or a PReLU function.
Fig. 4 is a schematic diagram of an arithmetic unit in the data arithmetic circuit according to the embodiment of the present invention.
As shown in fig. 4, the data operation circuit includes: one or more arithmetic units PU; the arithmetic unit PU includes: an input variable register Rin1, a first constant register CO, a comparator CMP, a switch S0, a second constant register C1, a multiplier, and an output variable register Rout.
The input variable register Rin1 and the second constant register C1 are respectively connected to two input ends of the multiplier; the output end of the multiplier and the input variable register Rin1 are respectively connected to two input ends of the switch S0; the input variable register Rin1 and the first constant register C0 are respectively connected to two input terminals of the comparator CMP; the output terminal of the comparator CMP is connected to the control terminal of the switch S0; the output terminal of the switch S0 is connected to the output variable register Rout.
The value of the input variable register Rin1 corresponds to the variable xi in the ReLU function, the value of the output variable register Rout corresponds to yi in the ReLU function, the first constant register CO stores a first predetermined constant, which is a comparison threshold, and the predetermined constant is 0 in the ReLU function, the leakage ReLU function, or the preelu function. The second constant register C1 stores a second preset constant, which is a multiplication coefficient ai in the ReLU function, the leakage ReLU function, or the prilu function.
It will be appreciated that in the ReLU function, the multiplication coefficient ai is equal to 0, and in the Leaky ReLU function or the PReLU function, the multiplication coefficient ai is equal to a particular coefficient value.
The operation process of the operation unit is as follows:
the comparison result of the value xi at Rin1 and the comparison threshold C0 is used as a control signal of the switch S0 to control the port No. 1 or the port No. 0 of the switch S0 to be turned on.
Specifically, a first input of the comparator receives the value xi at Rin1 and a second input receives the comparison threshold at the first constant register C0.
It is arranged that if the data at the first input of the comparator is greater than or equal to the data at the second input, the control signal generated is used to control the port No. 1 of the switch S0 to be connected, in which case Rout outputs the value xi at the input variable register Rin1 obtained by the switch S0. This is equivalent to implementing yi ═ xi in the ReLU function or the leakage ReLU function or the preelu function.
Setting that if the data of the first input end of the comparator is smaller than the data of the second input end, the generated control signal is used for controlling the port 0 of the switch S0 to be communicated, in this case, the switch S0 obtains the product of xi at the variable register Rin1 obtained from the multiplier and the second preset constant ai stored in the second constant register C1, and Rout outputs the product obtained by the switch S0. What is output corresponds to yi ═ ai ×, xi in the leakage ReLU function or the prilu function, and it is needless to say that if ai is set to 0 in advance, the function of the ReLU function can be realized.
In the data arithmetic circuit according to the embodiment of the present invention, the arithmetic unit is provided in the data arithmetic circuit, and the arithmetic unit can realize the arithmetic function of the ReLU function, the leakage ReLU function, or the prilu function. Compared with the method that a plurality of independent unit circuits with different functions are called for many times, the data operation circuit provided by the embodiment of the invention has the advantages of high integration level, small area, high calculation efficiency and low power consumption.
Fig. 5 is a schematic structural diagram of a data operation circuit according to an embodiment of the present invention.
As shown in fig. 5, the data operation circuit includes a plurality of Processing Units (PUs) arranged in an array. The data arithmetic circuit can realize the ReLU function, the Leaky ReLU function or the PReLU function operation of the matrix.
In the example shown in fig. 5, the data operation circuit is provided with MxN operation units, i.e., M rows and N columns of operation units. Wherein, M > is 1, N > is 2.
The arithmetic unit PU is a basic unit for realizing the operation of the ReLU function, the Leaky ReLU function or the PReLU function, and the plurality of data arithmetic PUs arranged in the array can realize the operation of the ReLU function, the Leaky ReLU function or the PReLU function by each element in the matrix.
The specific process for realizing the ReLU function operation of each element in the matrix is as follows:
the data arithmetic circuit reads data xi of an input matrix from the storage module respectively, distributes the data xi to different arithmetic units for ReLU function arithmetic, takes one input data x1 as an example, inputs the data x1 to a first register of one arithmetic unit, namely as Rin1, the arithmetic unit compares the data x1 with a comparison threshold value at a first constant register C0 to obtain a comparison result, controls the connection of a port No. 1 or a port No. 0 of a switch S0 according to the comparison result, and outputs a numerical value obtained from the switch S0 through an output register Rout of the arithmetic unit so as to realize the arithmetic of a Relu function, a Leaky ReLU function or a PReLU function, namely realize the output of yi ═ xi or yi ═ ai.
During the calculation, one or more rows of data can be simultaneously taken out and distributed to the corresponding arithmetic units, and simultaneously, the data can be continuously read from the storage module while calculation is possible, and the read data is input into the data arithmetic unit array.
And after the calculation is finished, the data operation circuit stores the operation result into the storage module.
The operation of the data arithmetic circuit provided in the above embodiment to implement the Relu function, the leakage Relu function, or the prilu function will be described below by dividing into different embodiments.
Example 1
The first input matrix may be a 3x6 matrix, for example
Figure BDA0002234625650000081
The plurality of arithmetic units of the data arithmetic circuit is a 3 × 6 array. The preset multiplication coefficient ai is 0.
When the instruction is to implement a ReLU function for each element of the first input matrix, then the data operation unit implements the operations of: the row 1 and column 1 data operation unit PU receives the row 1 and column 1 data a11 (the value is 0) of the first input matrix and stores the data into Rin1, compares the data a11 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 0. The row 1 and column 2 data operation unit PU receives the row 1 and column 2 data a12 (with the value of 7) of the first input matrix and stores the data in Rin1, compares the data a12 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 7.
By analogy, the output matrix is obtained as follows:
Figure BDA0002234625650000091
example 2
The first input matrix may be a 3x6 matrix, for example
Figure BDA0002234625650000092
The plurality of arithmetic units of the data arithmetic circuit is a 3 × 6 array. The preset multiplication coefficient ai is 2.
When the instruction is to implement the function of the PReLU for each element of the first input matrix, then the data operation unit implements the following operations:
the row 1 and column 1 data operation unit PU receives the row 1 and column 1 data a11 (the value is 0) of the first input matrix and stores the data into Rin1, compares the data a11 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 0. The row 1, column 2 data operation unit PU receives the row 1, column 2 data a12 (with the value of 7) of the first input matrix, stores the data a1, compares the data a12 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 7.
By analogy, the output matrix is obtained as follows:
Figure BDA0002234625650000093
example 3
The first input matrix may be a 3x6 matrix, for example
Figure BDA0002234625650000094
If the arithmetic element array in the data arithmetic circuit is 3x3 array, the matrix will be divided into 3x3 groups for calculation.
When the instruction is to implement a ReLU function for each element of the first input matrix, then the data operation unit implements the operations of: the row 1 and column 1 data operation unit PU receives the row 1 and column 1 data a11 (the value is 0) of the first input matrix and stores the data into Rin1, compares the data a11 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 0. And analogizing in sequence, the arithmetic units in the 3 rd row and the 3 rd column obtain the data in the 3 rd row and the 3 rd column of the output matrix.
Then, the arithmetic unit of row 1 and column 1 receives row 1 and column 4 data a14 of the first input matrix, and compares the data a11 with a first preset constant (which may be set to 0) stored in the first constant register CO. By analogy, the arithmetic unit in the 3 rd row and the 3 rd column obtains the data in the 3 rd row and the 6 th column of the output matrix. And combining the results of the two operations to obtain a final output matrix.
It is to be understood that when the arithmetic unit is plural in the data arithmetic circuit, the data arithmetic circuit can be adapted to implement the function of the ReLU function, the leakage ReLU function, or the prilu function for the elements in the matrix. Of course, the data operation circuit can be applied to neural network calculation.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.
In the data arithmetic circuit provided by the embodiment of the invention, the data arithmetic unit can realize the calculation function of the ReLU function, the Leaky ReLU function or the PReLU function of a certain number or matrix through one instruction, and compared with the unit circuits with multiple independent functions which realize the calculation function of the functions by combining step processing of a computer program, on the first hand, a plurality of sets of control circuits are not required to be separately designed, so that the area of the device is reduced, and the cost is reduced. In a second aspect, the operation of multiple sets of control circuitry is reduced, thereby reducing power consumption. In the third aspect, the arithmetic unit in the embodiment of the present invention can realize the calculation function of the ReLU function, the leakage ReLU function, or the prilu function, and a final result can be obtained by one instruction without obtaining the result by multiple operations, thereby saving the calculation time. In a fourth aspect, the arithmetic unit in the embodiment of the present invention can implement the ReLU function, the leak ReLU function, or the prilu function operation of the matrix only by configuring the corresponding register, and the combination is very flexible, and the device has strong versatility. The data arithmetic circuit has a small structure and high arithmetic efficiency.
Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.
As shown in fig. 6, the data processing apparatus includes: a memory module 10, a decoder 20, an instruction fetching module 30, a control module 40 and one or more data operation circuits 50 provided in the above embodiments.
The memory module 10 is used for storing instructions and data.
A control module 40 for instructing the instruction fetching module 30 to fetch instructions from the storage module 10;
a decoder 20 for decoding the instruction and sending the decoded instruction to the data arithmetic circuit 50;
and a data arithmetic circuit 50 for performing data arithmetic based on the decoded instruction, taking out data from the memory block 10, and storing an arithmetic result of the data in the memory block 10, wherein the data arithmetic circuit 50 includes a plurality of arithmetic units.
The data operation circuit 50 can control the connection of different ports of the switch according to different instructions, so as to realize the function of the instruction. Of course, the data arithmetic circuit 50 may also implement the ReLU function, the leakage ReLU function, or the prilu function operation of the matrix according to the corresponding instruction.
In one embodiment, the data operation circuit 50 is configured to perform data operation based on the decoded instruction, fetch data from a storage module, and store an operation result of the data in the storage module, and specifically includes:
each arithmetic unit in the data arithmetic circuit 50 performs data arithmetic based on the decoded instruction, takes out data from the memory module 10, and stores the arithmetic result of the data to the memory module 10.
In one embodiment, since the data required to perform the ReLU function, the leakage ReLU function, or the prilu function in the neural network calculation process is large, it is usually in a matrix manner. In order to facilitate the operation of the data operation circuit, the present embodiment designs the instruction as a matrix operation instruction, which can be processed by a plurality of operation units PU in the data operation circuit in parallel, thereby improving the operation efficiency.
In this embodiment, the ReLU instruction is encoded by RISC-V rule, and uses the Custom-made part in the standard RISC-V main instruction code allocation table, namely the part "Custom-3 (1111011)". The format of the instructions also follows the RISC-V instruction format, with the different instruction formats as follows:
Figure BDA0002234625650000121
in one embodiment, the format of the instruction may be: the instruction name, the destination address register, the source matrix address register, and the second source address register. The destination address register is used for storing the address of the destination matrix, for example, storing data obtained after the source matrix operation; the first source address register is used for storing source data, such as a source matrix; the second source address register is used to store coefficients that operate on the source data, such as the multiplication coefficients ai that perform the ReLU function, Leaky ReLU function, or PReLU function operations.
In one embodiment, the instructions are velkrelu. mm (rd), (rs1), (rs 2). Wherein, Velkrelu is an extension, mm is an operation of a matrix and a matrix, velkrelu.mm is an instruction name, rd is a destination matrix address register, rs1 is a first source address register and stores a source matrix, rs2 is a second source address register, specifically a matrix address register, and stores a coefficient matrix for the operation of original data.
In one embodiment, the first source address register rs1 stores an active matrix, which is a4 × 4 matrix a, the second source address register rs2 stores a4 × 4 matrix v of multiplication coefficients, and the data operation unit PU is a4 × 4 array b.
The meaning of the matrix-matrix instructions is: taking out the data of the source matrix from the first source address register rs1, for example, taking out the data in rows from the element a11 to the element a14 until the element a41 to the element a44 are taken out and sequentially allocated to the PU units PU11, PU12 … … and Rin1 of the PU44 in the data arithmetic unit PU array b; and the data of the multiplication coefficient matrix v is fetched from the second source address register rs2, for example, by a row, from element v11 to element v14 until element v41 to element v44 are fetched and allocated to each PU unit PU11, PU12 … … in the data arithmetic unit PU array b to the second constant register C1 in the PU 44.
The comparators in the operation units respectively compare the data in Rin1 with the comparison threshold value in the first constant register C0, and if the data in Rin1 is greater than or equal to the comparison threshold value in the first constant register C0, the processing unit takes the data in Rin1 as the operation result to execute the ReLU function, the Leaky ReLU function or the PReLU function on the element. If the data in Rin1 is smaller than the comparison threshold value in the first constant register C0, the processing unit multiplies the data in Rin1 and the value of the second constant register C1 in the processing unit to obtain an operation result, and stores the operation result in the storage area designated by the destination matrix address register rd, so that a single-element ReLU function, a leakage ReLU function or a prilu function is realized and a result matrix is obtained.
In one embodiment, the instructions are Velkrelu. mv (rd), (rs1), (rs 2). Wherein, Velkrelu is an extension, mv is a matrix-vector, which represents the operation of matrix and vector, Velkrelu. mv is an instruction name, rd is a destination matrix address register, rs1 is a first source address register, which stores a source matrix, rs2 is a second source address register, specifically a vector address register, which stores coefficient vectors for the operation of source data.
In one embodiment, the first source address register rs1 stores an active matrix, the source matrix is a4 × 4 matrix, the second source address register rs2 stores a4 × 4 matrix v of multiplication coefficients, and the data operation unit PU is a4 × 4 array b.
The meaning of the matrix-vector instructions is: taking out the data of the source matrix from the first source address register rs1, for example, taking out the data in rows from the element a11 to the element a14 until the element a41 to the element a44 are taken out and sequentially allocated to the PU units PU11, PU12 … … and Rin1 of the PU44 in the data arithmetic unit PU array b; and the data of the vector v of the multiplication coefficients is fetched from the second source address register rs2, and the fetching manner is, for example, fetching according to the line, the data are fetched from the element v1 to the element v4 and are respectively allocated to the PU units in each line in the data arithmetic unit PU array b, and the element v1 in the first line is respectively allocated to the PU11 to the PU 14; up to a fourth row element v4 is allocated to the second constant register C1 in PU41 through PU44, respectively.
The comparators in the operation units respectively compare the data in Rin1 with the comparison threshold value in the first constant register C0, and if the data in Rin1 is greater than or equal to the comparison threshold value in the first constant register C0, the processing unit takes the data in Rin1 as the operation result to execute the ReLU function, the Leaky ReLU function or the PReLU function on the element. If the data in Rin1 is smaller than the comparison threshold value in the first constant register C0, the processing unit multiplies the data in Rin1 and the value of the second constant register C1 in the processing unit to obtain an operation result, and stores the operation result in the storage area designated by the destination matrix address register rd, so that a single-element ReLU function, a leakage ReLU function or a prilu function is realized and a result matrix is obtained.
In one embodiment, the instructions are velkrelu. mf (rd), (rs1), rs 2. Where, Velkrelu is an extension, mf is a matrix-constant, which represents the operation of matrix and constant, Velkrelu. mf is an instruction name, rd is a destination matrix address register, rs1 is a source matrix address register, and rs2 is a constant address register.
In one embodiment, the source matrix is a4 x 4 matrix a stored in the source matrix address register rs1, the constant is d stored in the constant register rs2, and the data processing unit PU is a4 x 4 array b.
The meaning of the matrix-constant instruction is: taking out the data of the source matrix from the source matrix address register rs1, wherein the data is taken out in an access mode, for example, according to rows, from the element a11 to the element a14, until the element a41 to the element a44 are taken out and are sequentially and correspondingly allocated to the PU units PU11, PU12 … … and the Rin1 of the PU44 in the data operation unit PU array b; and takes the constant d from the constant register rs2 and assigns the constant d to each PU unit PU11, PU12 … …, respectively, up to the second constant register C1 in PU 44.
The comparators in the data operation units respectively compare the data in Rin1 with the comparison threshold value in the first constant register C0, and if the data in Rin1 is larger than or equal to the comparison threshold value in the first constant register C0, the processing unit takes the data in Rin1 as the operation result to execute the ReLU function, the Leaky ReLU function or the PReLU function on the element. If the data in Rin1 is smaller than the comparison threshold value in the first constant register C0, the processing unit multiplies the data in Rin1 and the value of the second constant register C1 in the processing unit to obtain an operation result, and stores the operation result in the storage area designated by the destination matrix address register rd, so that a single-element ReLU function, a leakage ReLU function or a prilu function is realized and a result matrix is obtained.
In the above embodiments of the present invention, the instruction adopts the standard RISC-V instruction encoding mode and the standard RISC-V instruction format, which improves the versatility of the data operation circuit of the present invention.
An embodiment of the present invention also provides a chip including one or more of the data processing apparatuses of the above embodiments.
An embodiment of the present invention also provides a card board including one or more chips of the above embodiments.
An embodiment of the present invention also provides an electronic device including one or more chips of the above embodiments. The electronic device may be a computer or a mobile phone or the like.
It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (10)

1. A data arithmetic circuit, comprising: one or more arithmetic units;
the arithmetic unit includes: the input variable register, the first constant register, the comparator, the switch, the second constant register, the multiplier and the output variable register;
the input variable register and the second constant register are respectively connected to two input ends of the multiplier;
the output end of the multiplier and the input variable register are respectively connected to the two input ends of the switch;
the input variable register and the first constant register are respectively connected to two input ends of the comparator; the output end of the comparator is connected to the control end of the switch;
the output end of the switch is connected to the output variable register.
2. The data arithmetic circuit of claim 1, wherein the arithmetic unit is a plurality of arithmetic units, and the plurality of arithmetic units are arranged in an array.
3. A data processing apparatus comprising a memory module, a decoder, an instruction fetch module, a control module and a data arithmetic circuit as claimed in claim 1 or 2;
the storage module is used for storing instructions and data;
the control module is used for instructing the instruction fetching module to fetch the instruction from the storage module;
the decoder is used for decoding the instruction and sending the decoded instruction to the data operation circuit;
the data operation circuit is used for performing data operation based on the decoded instruction, taking the data out of the storage module and storing the operation result of the data into the storage module.
4. The data processing apparatus according to claim 3, wherein the data operation circuit is configured to perform a data operation based on the decoded instruction, fetch the data from the storage module, and store an operation result of the data in the storage module, and specifically includes:
each arithmetic unit in the data arithmetic circuit performs data arithmetic based on the decoded instruction, takes out the data from the storage module, and stores an arithmetic result of the data to the storage module.
5. A data processing device as claimed in claim 3 or 4, characterized in that said instructions are encoded using standard RISC-V instruction encoding.
6. The data processing apparatus of claim 5, wherein the format of the instruction is a standard RISC-V instruction format.
7. The data processing apparatus according to any one of claims 3 to 6,
the instruction is a matrix operation instruction which comprises an instruction name, a destination address register, a first source address register and a second source address register; the first source address register is used for storing source data, and the second source address register is used for storing a coefficient operated on the source data.
8. A chip comprising one or more data processing devices according to any one of claims 3 to 7.
9. A card comprising one or more chips according to claim 8.
10. An electronic device comprising one or more cards according to claim 9.
CN201910979193.2A 2019-10-15 2019-10-15 Data operation circuit, data processing device, chip, card board and electronic equipment Pending CN112667288A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910979193.2A CN112667288A (en) 2019-10-15 2019-10-15 Data operation circuit, data processing device, chip, card board and electronic equipment
PCT/CN2020/103467 WO2021073188A1 (en) 2019-10-15 2020-07-22 Data operation circuit, data processing apparatus, chip, card board, and electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910979193.2A CN112667288A (en) 2019-10-15 2019-10-15 Data operation circuit, data processing device, chip, card board and electronic equipment

Publications (1)

Publication Number Publication Date
CN112667288A true CN112667288A (en) 2021-04-16

Family

ID=75400071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910979193.2A Pending CN112667288A (en) 2019-10-15 2019-10-15 Data operation circuit, data processing device, chip, card board and electronic equipment

Country Status (2)

Country Link
CN (1) CN112667288A (en)
WO (1) WO2021073188A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2382420A (en) * 2001-11-22 2003-05-28 Ubinetics Ltd Method and apparatus for determining the modulo of non-power of two numbers
CN107608715B (en) * 2017-07-20 2020-07-03 上海寒武纪信息科技有限公司 Apparatus and method for performing artificial neural network forward operations
KR102610842B1 (en) * 2017-12-01 2023-12-07 한국전자통신연구원 Processing element and operating method thereof in neural network
US11475305B2 (en) * 2017-12-08 2022-10-18 Advanced Micro Devices, Inc. Activation function functional block for electronic devices
CN108154224A (en) * 2018-01-17 2018-06-12 北京中星微电子有限公司 For the method, apparatus and non-transitory computer-readable medium of data processing

Also Published As

Publication number Publication date
WO2021073188A1 (en) 2021-04-22

Similar Documents

Publication Publication Date Title
US6754805B1 (en) Method and apparatus for configurable multi-cell digital signal processing employing global parallel configuration
Mohanty et al. Memory footprint reduction for power-efficient realization of 2-D finite impulse response filters
US20190347545A1 (en) Neural network computation device and method
CN111488976B (en) Neural network computing device, neural network computing method and related products
CN111523652B (en) Processor, data processing method thereof and image pickup device
CN110059797B (en) Computing device and related product
CN111797982A (en) Image processing system based on convolution neural network
CN111353591A (en) Computing device and related product
CN109753319B (en) Device for releasing dynamic link library and related product
CN111930681B (en) Computing device and related product
CN111488963B (en) Neural network computing device and method
CN109711540B (en) Computing device and board card
EP3842954A1 (en) System and method for configurable systolic array with partial read/write
US11934482B2 (en) Computational memory
CN112667288A (en) Data operation circuit, data processing device, chip, card board and electronic equipment
CN109740730B (en) Operation method, device and related product
CN111368967B (en) Neural network computing device and method
CN107873091B (en) Method and apparatus for sliding window arithmetic
CN109711538B (en) Operation method, device and related product
CN111368987B (en) Neural network computing device and method
CN111368986B (en) Neural network computing device and method
CN112579041A (en) Data operation circuit, data processing device, chip, card board and electronic equipment
CN111368990B (en) Neural network computing device and method
CN111367567A (en) Neural network computing device and method
CN111368985B (en) Neural network computing device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination