CN112667288A

CN112667288A - Data operation circuit, data processing device, chip, card board and electronic equipment

Info

Publication number: CN112667288A
Application number: CN201910979193.2A
Authority: CN
Inventors: 不公告发明人
Original assignee: Beijing Simm Computing Technology Co ltd
Current assignee: Beijing Simm Computing Technology Co ltd
Priority date: 2019-10-15
Filing date: 2019-10-15
Publication date: 2021-04-16
Also published as: WO2021073188A1

Abstract

The invention discloses a data operation circuit, a data processing device, a chip, a card board and electronic equipment. The data arithmetic circuit includes: one or more arithmetic units; the arithmetic unit includes: the circuit comprises an input variable register, a first constant register, a comparator, a switch, a second constant register, a multiplier and an output variable register. The data arithmetic circuit provided by the embodiment of the invention is provided with the arithmetic unit which can realize the arithmetic function of the ReLU function, the Leaky ReLU function or the PReLU function, and the data arithmetic circuit has the advantages of high integration level, small area, high calculation efficiency and low power consumption.

Description

Data operation circuit, data processing device, chip, card board and electronic equipment

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data operation circuit, a data processing apparatus, a chip, a card board, and an electronic device.

Background

With the development of science and technology, the human society is rapidly entering the intelligent era. The important characteristics of the intelligent era are that people obtain more and more data, the quantity of the obtained data is larger and larger, and the requirement on the speed of processing the data is higher and higher.

Chips are the cornerstone of data processing, which fundamentally determines the ability of people to process data. From the application field, the chip mainly has two routes: one is a general chip route, such as a Central Processing Unit (CPU), which provides great flexibility but is less computationally efficient in Processing domain-specific algorithms; the other is a special chip route, such as a Tensor Processing Unit (TPU), which can exert higher effective computing power in some specific fields, but has poorer or even no Processing capability in the more versatile and general fields.

Because the data of the intelligent era is various and huge in quantity, the chip is required to have extremely high flexibility, can process algorithms in different fields and in different days, has extremely high processing capacity, and can rapidly process extremely large and sharply increased data volume.

Disclosure of Invention

Objects of the invention

The invention aims to provide a data operation circuit, a data processing device, a chip, a card board and an electronic device. The data operation circuit can realize the operation function of the ReLU function, the Leaky ReLU function or the PReLU function, and has the advantages of high integration level, small area, high calculation efficiency and low power consumption.

(II) technical scheme

To solve the above problem, a first aspect of the present invention provides a data operation circuit, including: one or more arithmetic units; the arithmetic unit includes: the input variable register, the first constant register, the comparator, the switch, the second constant register, the multiplier and the output variable register; the input variable register and the second constant register are respectively connected to two input ends of the multiplier; the output end of the multiplier and the input variable register are respectively connected to the two input ends of the switch; the input variable register and the first constant register are respectively connected to two input ends of the comparator; the output end of the comparator is connected to the control end of the switch; the output end of the switch is connected to the output variable register.

In the data arithmetic circuit according to the embodiment of the present invention, the arithmetic unit is provided in the data arithmetic circuit, and the arithmetic unit can realize the arithmetic function of the ReLU function, the leakage ReLU function, or the prilu function. The data operation circuit has the advantages of high integration level, small area, high calculation efficiency and low power consumption.

Furthermore, the number of the operation units is multiple, and the operation units are arranged in an array.

In a second aspect of the present invention, a data processing apparatus is provided, which includes a storage module, a decoder, an instruction fetching module, a control module, and the data arithmetic circuit provided in the first aspect; the storage module is used for storing instructions and data; the control module is used for instructing the instruction fetching module to fetch the instruction from the storage module; the decoder is used for decoding the instruction and sending the decoded instruction to the data operation circuit; the data operation circuit is used for performing data operation based on the decoded instruction, taking the data out of the storage module and storing the operation result of the data into the storage module.

Further, the data operation circuit is configured to perform data operation based on the decoded instruction, take out the data from the storage module, and store an operation result of the data in the storage module, and specifically includes: each arithmetic unit in the data arithmetic circuit performs data arithmetic based on the decoded instruction, takes out the data from the storage module, and stores an arithmetic result of the data to the storage module.

Furthermore, the instruction is encoded by adopting a standard RISC-V instruction encoding mode.

Further, the format of the instruction is a standard RISC-V instruction format.

Further, the instruction is a matrix operation instruction, and the matrix operation instruction comprises an instruction name, a destination address register, a first source address register and a second source address register; the first source address register is used for storing source data, and the second source address register is used for storing a coefficient operated on the source data.

The second source address register may be a matrix address register, a vector address register, or a constant address register.

A third aspect of the invention provides a chip comprising one or more of the data processing apparatus provided in the second aspect.

A fourth aspect of the invention provides a card including one or more chips provided by the third aspect.

A fifth aspect of the invention provides an electronic device comprising one or more cards as provided in the fourth aspect.

(III) advantageous effects

The technical scheme of the invention has the following beneficial technical effects:

Drawings

FIG. 1 is a schematic diagram of a ReLU function;

FIG. 2 is a schematic diagram of a Leaky ReLU function or a PReLU function;

FIG. 3 is a schematic diagram of a data operation circuit according to an embodiment of the present invention;

FIG. 4 is a schematic diagram showing the configuration of an arithmetic unit in the data arithmetic circuit according to the embodiment of the present invention;

FIG. 5 is a schematic diagram of a data operation circuit according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings in conjunction with the following detailed description. It should be understood that the description is intended to be exemplary only, and is not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.

In the description of the present invention, it should be noted that the terms "first", "second", and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

FIG. 1 is a schematic diagram of a ReLU function; fig. 2 is a schematic diagram of the leakage ReLU function or the prilu function.

As shown in fig. 1 and 2, a functional Linear rectification function (ReLU), also called a modified Linear Unit, is an activation function (activation function) commonly used in artificial neural networks, and generally refers to a nonlinear function represented by a ramp function and its variants. In neural networks, ReLU (rectified Linear Unit) and its variants such as Leaky ReLU, PReLU, etc. are commonly used nonlinear activation functions.

In the ReLU function shown in fig. 1, Yi is 0 when xi is equal to or greater than 0, and Yi is 0 when xi is less than 0, so that the ReLU function can implement two calculation methods according to the difference in xi values.

In the Leaky ReLU function or the PReLU function shown in FIG. 2, when xi is greater than or equal to 0, Yi is xi, and when xi is less than 0, Yi is aixi, therefore, the Leaky ReLU function or the PReLU function can realize two calculation functions according to the difference of xi values.

Based on the characteristics of the above functions, multiple unit circuits with separate functions are required to be combined with step processing of a computer program to realize the calculation function of the above functions. The various individual functional unit circuits are here for example multipliers and comparators.

As an example, the following is a computer software program that implements the functionality of the ReLU function, the leakage ReLU function, or the prilu function for the above-described scheme:

it should be noted that "If (X) in the above computer code_i>＝0)C_i＝1；Else C _i0; "means that if X is_i≧ 0, then 1 is assigned to C_iOtherwise, 0 is given to C_iIn this part of the computer program, the comparator function is implemented in conjunction with the first separate unit circuit.

If (C) in the above computer code_i＝＝1)Y_i＝X_i(ii) a "means if C_iWhen the value is 1, X is added_iImparting Y_iThe part of the computer program is the second single partIn conjunction with the first separate cell circuit, again performs the function of the comparator.

"Else Y" in the above computer code_i＝a_i*X_i"means that if Ci ≠ 1, then a_i*X_iImparting Y_iThe part is that a third single unit circuit is combined with a second single unit circuit to realize the function of a multiplier.

It can be seen that in the above solution, the computer software program is required to combine three separate unit circuits to call three times, so as to be able to implement the function of the ReLU function. If each element of a certain matrix needs to realize the function of the function, then during calculation, the function operation of the matrix can be realized only by controlling the multiple calls of three independent unit circuits in sequence, which results in complex operation sequence, long calculation time, more independent unit circuits, more calls, large power consumption, large area of the calculation device and high cost.

The data arithmetic circuit provided by the embodiment of the present invention will be described in detail below. In addition, the technical features involved in the different embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

Fig. 3 is a schematic structural diagram of a data operation circuit according to an embodiment of the present invention.

As shown in fig. 3, the data operation circuit includes an arithmetic operation unit, a logical operation unit, an access/fetch data unit, and a floating point operation unit. The arithmetic operation unit realizes arithmetic operation through a multiplier of the data operation circuit. The logic operation unit may be implemented by a comparator of the data operation circuit. The floating-point arithmetic unit may be implemented by a multiplier. The data arithmetic circuit is provided with one or more data arithmetic units PU, each of which can realize comparison and multiplication functions, and further realize the functions of a ReLU function, a Leaky ReLU function or a PReLU function.

Fig. 4 is a schematic diagram of an arithmetic unit in the data arithmetic circuit according to the embodiment of the present invention.

As shown in fig. 4, the data operation circuit includes: one or more arithmetic units PU; the arithmetic unit PU includes: an input variable register Rin1, a first constant register CO, a comparator CMP, a switch S0, a second constant register C1, a multiplier, and an output variable register Rout.

The input variable register Rin1 and the second constant register C1 are respectively connected to two input ends of the multiplier; the output end of the multiplier and the input variable register Rin1 are respectively connected to two input ends of the switch S0; the input variable register Rin1 and the first constant register C0 are respectively connected to two input terminals of the comparator CMP; the output terminal of the comparator CMP is connected to the control terminal of the switch S0; the output terminal of the switch S0 is connected to the output variable register Rout.

The value of the input variable register Rin1 corresponds to the variable xi in the ReLU function, the value of the output variable register Rout corresponds to yi in the ReLU function, the first constant register CO stores a first predetermined constant, which is a comparison threshold, and the predetermined constant is 0 in the ReLU function, the leakage ReLU function, or the preelu function. The second constant register C1 stores a second preset constant, which is a multiplication coefficient ai in the ReLU function, the leakage ReLU function, or the prilu function.

It will be appreciated that in the ReLU function, the multiplication coefficient ai is equal to 0, and in the Leaky ReLU function or the PReLU function, the multiplication coefficient ai is equal to a particular coefficient value.

The operation process of the operation unit is as follows:

the comparison result of the value xi at Rin1 and the comparison threshold C0 is used as a control signal of the switch S0 to control the port No. 1 or the port No. 0 of the switch S0 to be turned on.

Specifically, a first input of the comparator receives the value xi at Rin1 and a second input receives the comparison threshold at the first constant register C0.

It is arranged that if the data at the first input of the comparator is greater than or equal to the data at the second input, the control signal generated is used to control the port No. 1 of the switch S0 to be connected, in which case Rout outputs the value xi at the input variable register Rin1 obtained by the switch S0. This is equivalent to implementing yi ═ xi in the ReLU function or the leakage ReLU function or the preelu function.

Setting that if the data of the first input end of the comparator is smaller than the data of the second input end, the generated control signal is used for controlling the port 0 of the switch S0 to be communicated, in this case, the switch S0 obtains the product of xi at the variable register Rin1 obtained from the multiplier and the second preset constant ai stored in the second constant register C1, and Rout outputs the product obtained by the switch S0. What is output corresponds to yi ═ ai ×, xi in the leakage ReLU function or the prilu function, and it is needless to say that if ai is set to 0 in advance, the function of the ReLU function can be realized.

In the data arithmetic circuit according to the embodiment of the present invention, the arithmetic unit is provided in the data arithmetic circuit, and the arithmetic unit can realize the arithmetic function of the ReLU function, the leakage ReLU function, or the prilu function. Compared with the method that a plurality of independent unit circuits with different functions are called for many times, the data operation circuit provided by the embodiment of the invention has the advantages of high integration level, small area, high calculation efficiency and low power consumption.

Fig. 5 is a schematic structural diagram of a data operation circuit according to an embodiment of the present invention.

As shown in fig. 5, the data operation circuit includes a plurality of Processing Units (PUs) arranged in an array. The data arithmetic circuit can realize the ReLU function, the Leaky ReLU function or the PReLU function operation of the matrix.

In the example shown in fig. 5, the data operation circuit is provided with MxN operation units, i.e., M rows and N columns of operation units. Wherein, M > is 1, N > is 2.

The arithmetic unit PU is a basic unit for realizing the operation of the ReLU function, the Leaky ReLU function or the PReLU function, and the plurality of data arithmetic PUs arranged in the array can realize the operation of the ReLU function, the Leaky ReLU function or the PReLU function by each element in the matrix.

The specific process for realizing the ReLU function operation of each element in the matrix is as follows:

the data arithmetic circuit reads data xi of an input matrix from the storage module respectively, distributes the data xi to different arithmetic units for ReLU function arithmetic, takes one input data x1 as an example, inputs the data x1 to a first register of one arithmetic unit, namely as Rin1, the arithmetic unit compares the data x1 with a comparison threshold value at a first constant register C0 to obtain a comparison result, controls the connection of a port No. 1 or a port No. 0 of a switch S0 according to the comparison result, and outputs a numerical value obtained from the switch S0 through an output register Rout of the arithmetic unit so as to realize the arithmetic of a Relu function, a Leaky ReLU function or a PReLU function, namely realize the output of yi ═ xi or yi ═ ai.

During the calculation, one or more rows of data can be simultaneously taken out and distributed to the corresponding arithmetic units, and simultaneously, the data can be continuously read from the storage module while calculation is possible, and the read data is input into the data arithmetic unit array.

And after the calculation is finished, the data operation circuit stores the operation result into the storage module.

The operation of the data arithmetic circuit provided in the above embodiment to implement the Relu function, the leakage Relu function, or the prilu function will be described below by dividing into different embodiments.

Example 1

The first input matrix may be a 3x6 matrix, for example

The plurality of arithmetic units of the data arithmetic circuit is a 3 × 6 array. The preset multiplication coefficient ai is 0.

When the instruction is to implement a ReLU function for each element of the first input matrix, then the data operation unit implements the operations of: the row 1 and column 1 data operation unit PU receives the row 1 and column 1 data a11 (the value is 0) of the first input matrix and stores the data into Rin1, compares the data a11 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 0. The row 1 and column 2 data operation unit PU receives the row 1 and column 2 data a12 (with the value of 7) of the first input matrix and stores the data in Rin1, compares the data a12 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 7.

By analogy, the output matrix is obtained as follows:

example 2

The first input matrix may be a 3x6 matrix, for example

The plurality of arithmetic units of the data arithmetic circuit is a 3 × 6 array. The preset multiplication coefficient ai is 2.

When the instruction is to implement the function of the PReLU for each element of the first input matrix, then the data operation unit implements the following operations:

the row 1 and column 1 data operation unit PU receives the row 1 and column 1 data a11 (the value is 0) of the first input matrix and stores the data into Rin1, compares the data a11 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 0. The row 1, column 2 data operation unit PU receives the row 1, column 2 data a12 (with the value of 7) of the first input matrix, stores the data a1, compares the data a12 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 7.

By analogy, the output matrix is obtained as follows:

example 3

The first input matrix may be a 3x6 matrix, for example

If the arithmetic element array in the data arithmetic circuit is 3x3 array, the matrix will be divided into 3x3 groups for calculation.

When the instruction is to implement a ReLU function for each element of the first input matrix, then the data operation unit implements the operations of: the row 1 and column 1 data operation unit PU receives the row 1 and column 1 data a11 (the value is 0) of the first input matrix and stores the data into Rin1, compares the data a11 with a first preset constant (which can be set to 0) stored in the first constant register CO, and controls the switch S0 to select port No. 1 to be turned on according to the comparison result, so that Rout outputs Rin1, namely 0. And analogizing in sequence, the arithmetic units in the 3 rd row and the 3 rd column obtain the data in the 3 rd row and the 3 rd column of the output matrix.

Then, the arithmetic unit of row 1 and column 1 receives row 1 and column 4 data a14 of the first input matrix, and compares the data a11 with a first preset constant (which may be set to 0) stored in the first constant register CO. By analogy, the arithmetic unit in the 3 rd row and the 3 rd column obtains the data in the 3 rd row and the 6 th column of the output matrix. And combining the results of the two operations to obtain a final output matrix.

It is to be understood that when the arithmetic unit is plural in the data arithmetic circuit, the data arithmetic circuit can be adapted to implement the function of the ReLU function, the leakage ReLU function, or the prilu function for the elements in the matrix. Of course, the data operation circuit can be applied to neural network calculation.

It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications therefrom are within the scope of the invention.

In the data arithmetic circuit provided by the embodiment of the invention, the data arithmetic unit can realize the calculation function of the ReLU function, the Leaky ReLU function or the PReLU function of a certain number or matrix through one instruction, and compared with the unit circuits with multiple independent functions which realize the calculation function of the functions by combining step processing of a computer program, on the first hand, a plurality of sets of control circuits are not required to be separately designed, so that the area of the device is reduced, and the cost is reduced. In a second aspect, the operation of multiple sets of control circuitry is reduced, thereby reducing power consumption. In the third aspect, the arithmetic unit in the embodiment of the present invention can realize the calculation function of the ReLU function, the leakage ReLU function, or the prilu function, and a final result can be obtained by one instruction without obtaining the result by multiple operations, thereby saving the calculation time. In a fourth aspect, the arithmetic unit in the embodiment of the present invention can implement the ReLU function, the leak ReLU function, or the prilu function operation of the matrix only by configuring the corresponding register, and the combination is very flexible, and the device has strong versatility. The data arithmetic circuit has a small structure and high arithmetic efficiency.

As shown in fig. 6, the data processing apparatus includes: a memory module 10, a decoder 20, an instruction fetching module 30, a control module 40 and one or more data operation circuits 50 provided in the above embodiments.

The memory module 10 is used for storing instructions and data.

A control module 40 for instructing the instruction fetching module 30 to fetch instructions from the storage module 10;

a decoder 20 for decoding the instruction and sending the decoded instruction to the data arithmetic circuit 50;

and a data arithmetic circuit 50 for performing data arithmetic based on the decoded instruction, taking out data from the memory block 10, and storing an arithmetic result of the data in the memory block 10, wherein the data arithmetic circuit 50 includes a plurality of arithmetic units.

The data operation circuit 50 can control the connection of different ports of the switch according to different instructions, so as to realize the function of the instruction. Of course, the data arithmetic circuit 50 may also implement the ReLU function, the leakage ReLU function, or the prilu function operation of the matrix according to the corresponding instruction.

In one embodiment, the data operation circuit 50 is configured to perform data operation based on the decoded instruction, fetch data from a storage module, and store an operation result of the data in the storage module, and specifically includes:

each arithmetic unit in the data arithmetic circuit 50 performs data arithmetic based on the decoded instruction, takes out data from the memory module 10, and stores the arithmetic result of the data to the memory module 10.

In one embodiment, since the data required to perform the ReLU function, the leakage ReLU function, or the prilu function in the neural network calculation process is large, it is usually in a matrix manner. In order to facilitate the operation of the data operation circuit, the present embodiment designs the instruction as a matrix operation instruction, which can be processed by a plurality of operation units PU in the data operation circuit in parallel, thereby improving the operation efficiency.

In this embodiment, the ReLU instruction is encoded by RISC-V rule, and uses the Custom-made part in the standard RISC-V main instruction code allocation table, namely the part "Custom-3 (1111011)". The format of the instructions also follows the RISC-V instruction format, with the different instruction formats as follows:

in one embodiment, the format of the instruction may be: the instruction name, the destination address register, the source matrix address register, and the second source address register. The destination address register is used for storing the address of the destination matrix, for example, storing data obtained after the source matrix operation; the first source address register is used for storing source data, such as a source matrix; the second source address register is used to store coefficients that operate on the source data, such as the multiplication coefficients ai that perform the ReLU function, Leaky ReLU function, or PReLU function operations.

In one embodiment, the instructions are velkrelu. mm (rd), (rs1), (rs 2). Wherein, Velkrelu is an extension, mm is an operation of a matrix and a matrix, velkrelu.mm is an instruction name, rd is a destination matrix address register, rs1 is a first source address register and stores a source matrix, rs2 is a second source address register, specifically a matrix address register, and stores a coefficient matrix for the operation of original data.

In one embodiment, the first source address register rs1 stores an active matrix, which is a4 × 4 matrix a, the second source address register rs2 stores a4 × 4 matrix v of multiplication coefficients, and the data operation unit PU is a4 × 4 array b.

The meaning of the matrix-matrix instructions is: taking out the data of the source matrix from the first source address register rs1, for example, taking out the data in rows from the element a11 to the element a14 until the element a41 to the element a44 are taken out and sequentially allocated to the PU units PU11, PU12 … … and Rin1 of the PU44 in the data arithmetic unit PU array b; and the data of the multiplication coefficient matrix v is fetched from the second source address register rs2, for example, by a row, from element v11 to element v14 until element v41 to element v44 are fetched and allocated to each PU unit PU11, PU12 … … in the data arithmetic unit PU array b to the second constant register C1 in the PU 44.

The comparators in the operation units respectively compare the data in Rin1 with the comparison threshold value in the first constant register C0, and if the data in Rin1 is greater than or equal to the comparison threshold value in the first constant register C0, the processing unit takes the data in Rin1 as the operation result to execute the ReLU function, the Leaky ReLU function or the PReLU function on the element. If the data in Rin1 is smaller than the comparison threshold value in the first constant register C0, the processing unit multiplies the data in Rin1 and the value of the second constant register C1 in the processing unit to obtain an operation result, and stores the operation result in the storage area designated by the destination matrix address register rd, so that a single-element ReLU function, a leakage ReLU function or a prilu function is realized and a result matrix is obtained.

In one embodiment, the instructions are Velkrelu. mv (rd), (rs1), (rs 2). Wherein, Velkrelu is an extension, mv is a matrix-vector, which represents the operation of matrix and vector, Velkrelu. mv is an instruction name, rd is a destination matrix address register, rs1 is a first source address register, which stores a source matrix, rs2 is a second source address register, specifically a vector address register, which stores coefficient vectors for the operation of source data.

In one embodiment, the first source address register rs1 stores an active matrix, the source matrix is a4 × 4 matrix, the second source address register rs2 stores a4 × 4 matrix v of multiplication coefficients, and the data operation unit PU is a4 × 4 array b.

The meaning of the matrix-vector instructions is: taking out the data of the source matrix from the first source address register rs1, for example, taking out the data in rows from the element a11 to the element a14 until the element a41 to the element a44 are taken out and sequentially allocated to the PU units PU11, PU12 … … and Rin1 of the PU44 in the data arithmetic unit PU array b; and the data of the vector v of the multiplication coefficients is fetched from the second source address register rs2, and the fetching manner is, for example, fetching according to the line, the data are fetched from the element v1 to the element v4 and are respectively allocated to the PU units in each line in the data arithmetic unit PU array b, and the element v1 in the first line is respectively allocated to the PU11 to the PU 14; up to a fourth row element v4 is allocated to the second constant register C1 in PU41 through PU44, respectively.

In one embodiment, the instructions are velkrelu. mf (rd), (rs1), rs 2. Where, Velkrelu is an extension, mf is a matrix-constant, which represents the operation of matrix and constant, Velkrelu. mf is an instruction name, rd is a destination matrix address register, rs1 is a source matrix address register, and rs2 is a constant address register.

In one embodiment, the source matrix is a4 x 4 matrix a stored in the source matrix address register rs1, the constant is d stored in the constant register rs2, and the data processing unit PU is a4 x 4 array b.

The meaning of the matrix-constant instruction is: taking out the data of the source matrix from the source matrix address register rs1, wherein the data is taken out in an access mode, for example, according to rows, from the element a11 to the element a14, until the element a41 to the element a44 are taken out and are sequentially and correspondingly allocated to the PU units PU11, PU12 … … and the Rin1 of the PU44 in the data operation unit PU array b; and takes the constant d from the constant register rs2 and assigns the constant d to each PU unit PU11, PU12 … …, respectively, up to the second constant register C1 in PU 44.

The comparators in the data operation units respectively compare the data in Rin1 with the comparison threshold value in the first constant register C0, and if the data in Rin1 is larger than or equal to the comparison threshold value in the first constant register C0, the processing unit takes the data in Rin1 as the operation result to execute the ReLU function, the Leaky ReLU function or the PReLU function on the element. If the data in Rin1 is smaller than the comparison threshold value in the first constant register C0, the processing unit multiplies the data in Rin1 and the value of the second constant register C1 in the processing unit to obtain an operation result, and stores the operation result in the storage area designated by the destination matrix address register rd, so that a single-element ReLU function, a leakage ReLU function or a prilu function is realized and a result matrix is obtained.

In the above embodiments of the present invention, the instruction adopts the standard RISC-V instruction encoding mode and the standard RISC-V instruction format, which improves the versatility of the data operation circuit of the present invention.

An embodiment of the present invention also provides a chip including one or more of the data processing apparatuses of the above embodiments.

An embodiment of the present invention also provides a card board including one or more chips of the above embodiments.

An embodiment of the present invention also provides an electronic device including one or more chips of the above embodiments. The electronic device may be a computer or a mobile phone or the like.

It is to be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims

1. A data arithmetic circuit, comprising: one or more arithmetic units;

the arithmetic unit includes: the input variable register, the first constant register, the comparator, the switch, the second constant register, the multiplier and the output variable register;

the input variable register and the second constant register are respectively connected to two input ends of the multiplier;

the output end of the multiplier and the input variable register are respectively connected to the two input ends of the switch;

the input variable register and the first constant register are respectively connected to two input ends of the comparator; the output end of the comparator is connected to the control end of the switch;

the output end of the switch is connected to the output variable register.

2. The data arithmetic circuit of claim 1, wherein the arithmetic unit is a plurality of arithmetic units, and the plurality of arithmetic units are arranged in an array.

3. A data processing apparatus comprising a memory module, a decoder, an instruction fetch module, a control module and a data arithmetic circuit as claimed in claim 1 or 2;

the storage module is used for storing instructions and data;

the control module is used for instructing the instruction fetching module to fetch the instruction from the storage module;

the decoder is used for decoding the instruction and sending the decoded instruction to the data operation circuit;

the data operation circuit is used for performing data operation based on the decoded instruction, taking the data out of the storage module and storing the operation result of the data into the storage module.

4. The data processing apparatus according to claim 3, wherein the data operation circuit is configured to perform a data operation based on the decoded instruction, fetch the data from the storage module, and store an operation result of the data in the storage module, and specifically includes:

each arithmetic unit in the data arithmetic circuit performs data arithmetic based on the decoded instruction, takes out the data from the storage module, and stores an arithmetic result of the data to the storage module.

5. A data processing device as claimed in claim 3 or 4, characterized in that said instructions are encoded using standard RISC-V instruction encoding.

6. The data processing apparatus of claim 5, wherein the format of the instruction is a standard RISC-V instruction format.

7. The data processing apparatus according to any one of claims 3 to 6,

the instruction is a matrix operation instruction which comprises an instruction name, a destination address register, a first source address register and a second source address register; the first source address register is used for storing source data, and the second source address register is used for storing a coefficient operated on the source data.

8. A chip comprising one or more data processing devices according to any one of claims 3 to 7.

9. A card comprising one or more chips according to claim 8.

10. An electronic device comprising one or more cards according to claim 9.