CN112906876A - Circuit for realizing activation function and processor comprising same - Google Patents

Circuit for realizing activation function and processor comprising same Download PDF

Info

Publication number
CN112906876A
CN112906876A CN201911133061.4A CN201911133061A CN112906876A CN 112906876 A CN112906876 A CN 112906876A CN 201911133061 A CN201911133061 A CN 201911133061A CN 112906876 A CN112906876 A CN 112906876A
Authority
CN
China
Prior art keywords
data
division
unit
module
logistic regression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911133061.4A
Other languages
Chinese (zh)
Inventor
孙宇航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201911133061.4A priority Critical patent/CN112906876A/en
Publication of CN112906876A publication Critical patent/CN112906876A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Executing Machine-Instructions (AREA)

Abstract

The invention discloses an arithmetic circuit which is suitable for carrying out arithmetic on data. The arithmetic circuit comprises an exponential arithmetic unit, a division arithmetic unit, a logistic regression function arithmetic unit and a selection unit. The selection unit is suitable for selecting the operation output of the exponential operation unit, the division operation unit or the logistic regression function operation unit as an operation result according to the operation mode. The logistic regression function operation unit is coupled to the exponential operation unit and the division operation unit, and is suitable for performing exponential operation in the logistic regression function operation by using the exponential operation unit and performing division operation in the logistic regression function operation by using the division operation unit when performing the logistic regression function operation on the data. The invention also discloses a processor, a system on a chip and intelligent equipment comprising the arithmetic circuit.

Description

Circuit for realizing activation function and processor comprising same
Technical Field
The invention relates to the field of processors, in particular to a circuit design for neural network operation.
Background
In the field of artificial intelligence, various neural network models have been proposed to address issues in image, video, audio, and natural language processing, among others. Neural network models typically include multiple network layers, each layer having a large number of nodes, with values at each layer node being passed to nodes in the next layer and activated by values from nodes in the previous layer. Therefore, in the neural network model, a large number of activation function calculations are required.
In the prior art, the activation function calculation is usually performed in a software manner, and this manner is only processed by using conventional processor instructions, so that when a large number of activation function calculations are performed, the problem of poor operation performance exists, and thus the training and reasoning speed of the neural network is significantly influenced.
In the prior art, hardware acceleration schemes also exist, but the existing hardware acceleration schemes only perform hardware acceleration processing with one or more activation functions as targets, and do not see a hardware acceleration scheme that integrates operations of various activation functions and provides multiple activation function operations at the same time.
Therefore, a new design scheme of an arithmetic circuit is needed, which can provide hardware acceleration processing of efficient activation function operation.
Disclosure of Invention
To this end, the present invention provides a new operational circuit, processor and system on a chip in an attempt to solve or at least alleviate at least one of the problems identified above.
According to an aspect of the present invention, there is provided an arithmetic circuit adapted to operate on data. The arithmetic circuit comprises an exponential arithmetic unit, a division arithmetic unit, a logistic regression function arithmetic unit and a selection unit. The index operation unit is suitable for performing index operation on data; the division operation unit is suitable for performing division operation on data; the logistic regression (sigmoid) function operation unit is suitable for carrying out logistic regression function operation on the data; and the selection unit is suitable for selecting the operation output of the exponential operation unit, the division operation unit or the logistic regression function operation unit as an operation result according to the operation mode. The logistic regression function operation unit is coupled to the exponential operation unit and the division operation unit, and is suitable for performing exponential operation in the logistic regression function operation by using the exponential operation unit and performing division operation in the logistic regression function operation by using the division operation unit when performing the logistic regression function operation on the data.
Alternatively, in an operation circuit according to the present invention, the exponent operation unit includes: the index data input module is suitable for receiving data to be subjected to index operation; the index selection module is suitable for determining the interval to which the data belong according to the size of the data to be operated; an exponential parameter module adapted to provide a parameter corresponding to the determined interval, the parameter being a parameter required for taylor expansion of the exponential operation; the exponential multiplication module is suitable for carrying out multiplication operation by utilizing the provided parameters and data so as to carry out Taylor expansion processing on the data; and the exponential result output module is suitable for acquiring the Taylor expansion processing result of the multiplication module so as to provide an exponential operation result.
Alternatively, in the arithmetic circuit according to the present invention, the exponent operation unit further includes an exponent operation control block adapted to instruct the exponent operation unit to start an exponent operation process on the received data when a valid input instruction signal is received, and to generate a valid output instruction signal to instruct an exponent operation result of the data provided at the exponent result output block after a predetermined operation period of the exponent operation unit has elapsed.
Alternatively, in the arithmetic circuit according to the present invention, the division arithmetic unit includes: the division data input module is suitable for receiving data to be subjected to division operation; the first division selection parameter selection module is suitable for determining a first division parameter according to the size of data subjected to division operation; the first division shifting module is suitable for shifting the data subjected to division operation according to the first division parameter; a second division parameter selection module adapted to select a second or third division parameter based on the data; the multiplication module is suitable for performing multiplication operation on the output of the first division shifting module and the output of the second division parameter selection module; the second division shifting module is suitable for performing shifting operation on the output of the multiplication module; and the division result output module is suitable for acquiring the output of the second division shifting module and taking the output as the operation result of carrying out division operation on the data.
Optionally, in the arithmetic circuit according to the present invention, the division unit further includes a division control module adapted to instruct the division unit to start a division process on the received data upon receipt of a valid input instruction signal, and to generate a valid output instruction signal to instruct a division result of the data provided at the division result output module after a predetermined operation period of the division unit has elapsed.
Alternatively, in the arithmetic circuit according to the present invention, the logistic regression function arithmetic unit includes: the logistic regression operation data input module is suitable for receiving data to be subjected to logistic regression function operation; the first logistic regression operation module is coupled to the index operation unit and is suitable for sending the data to the index operation unit for carrying out index operation and obtaining an index operation result; the second logistic regression operation module is coupled to the division operation unit and is suitable for sending the exponential operation result obtained by the first operation module to the division operation unit to obtain a division operation result and multiplying the exponential operation result and the division operation result to obtain a multiplication result; the logistic regression shifting module is suitable for shifting the multiplication result of the second logistic regression operation module; and the logistic regression operation result output module is suitable for obtaining the shift output of the logistic regression shift module and is used as a result of carrying out logistic regression operation on the data.
Optionally, in the arithmetic circuit according to the present invention, the logistic regression function arithmetic unit further includes: and a logistic regression operation control module adapted to instruct the logistic regression operation unit to start logistic regression operation processing on the received data when the valid input instruction signal is received, and to generate a valid output instruction signal to instruct that a logistic regression operation result of the data is provided at the logistic regression operation result output module after a predetermined operation period of the logistic regression operation unit has elapsed.
Optionally, the arithmetic circuit according to the present invention further includes a hyperbolic tangent (tanh) function arithmetic unit adapted to perform a hyperbolic tangent function operation on the data; the selection unit is suitable for selecting the operation output of the exponential operation unit, the division operation unit, the logistic regression function operation unit or the hyperbolic tangent function operation unit as an operation result according to an operation mode, wherein the hyperbolic tangent function operation unit is coupled to the logistic regression function operation unit and is suitable for performing the logistic regression function operation in the hyperbolic tangent function operation by using the logistic regression function operation unit when the hyperbolic tangent function operation is performed on data.
Alternatively, in the operation circuit according to the present invention, the hyperbolic tangent function operation unit includes: the hyperbolic tangent operation data input module is suitable for receiving data to be subjected to hyperbolic tangent function operation; the hyperbolic tangent operation module is coupled to the logistic regression function operation unit and is suitable for carrying out inverse processing on data to be operated, sending the data after the inverse processing to the logistic regression function operation unit for carrying out logistic regression function operation and carrying out shift processing on the operation result of the logistic regression function; and the hyperbolic tangent operation result output module is suitable for obtaining the displacement output of the hyperbolic tangent operation module and taking the displacement output as a result of performing the hyperbolic tangent function operation on the data.
Optionally, in the operation circuit according to the present invention, the hyperbolic tangent function operation unit further includes: and the hyperbolic tangent function operation control module is suitable for instructing the hyperbolic tangent function operation unit to start hyperbolic tangent function operation processing on the received data when an effective input instruction signal is received, and generating an effective output instruction signal after a preset operation period of the hyperbolic tangent function operation unit is passed so as to instruct that the hyperbolic tangent function operation result of the data is provided at the hyperbolic tangent function operation result output module.
Optionally, the arithmetic circuit according to the present invention further includes a linear rectification function (ReLU) arithmetic unit adapted to perform a linear rectification function operation on the data, and the selection unit is adapted to select an operation output of the exponential arithmetic unit, the division arithmetic unit, the logistic regression function arithmetic unit, the hyperbolic tangent function arithmetic unit, or the linear rectification function (ReLU) arithmetic unit as the operation result according to the operation mode.
Alternatively, in an operation circuit according to the present invention, the linear rectification function operation unit includes: a linear rectification selection unit adapted to output a value of 0 when a value of data to be subjected to a linear rectification function operation is less than 0; and outputting the data as an operation result of the linear rectification function operation when the value of the data is not less than 0.
Alternatively, in the arithmetic circuit according to the present invention, when the most significant bit of the binary representation of data is 1, it indicates that the value of the data is less than 0.
According to another aspect of the invention, there is provided a processor comprising an arithmetic circuit according to the invention.
According to yet another aspect of the invention, a system on a chip is provided, comprising an instruction processing apparatus or a processor according to the invention.
According to yet another aspect of the invention, there is also provided a smart device comprising a system on chip according to the invention.
According to the scheme of the invention, by specially designing the exponential operation unit and the division operation unit, the design of the activation function (such as a logistic regression (sigmoid) function) operation unit based on the exponential operation and the division operation can be simplified, and the application range of the circuit is increased.
In addition, according to the aspect of the present invention, for the exponent operation unit, the data is divided into a plurality of sections according to the size of the data to be subjected to the exponent operation, and taylor expansion is performed using different parameters in each section to approximate the exponent operation result, providing a high-precision and high-efficiency design scheme of the exponent operation circuit.
In addition, according to the scheme of the invention, other operation circuits of the activation function (such as a hyperbolic tangent tanh function and a linear rectification ReLU function) can be provided, and the circuits can also utilize the designed operation circuits of exponential operation, division operation and logistic regression (sigmoid) function, so that the circuit design is simplified.
Thus, according to the scheme of the invention, hardware implementation of a plurality of activation functions can be integrated into one circuit unit, repeated design of the circuit is reduced, and the integration level of the circuit is improved. The integration of the arithmetic circuit into a processing chip such as a system on chip (SoC) can not only significantly improve the speed of the chip for performing neural network computation, but also significantly reduce the energy consumption, so that the processing chip is more suitable for being used on edge and end computing devices such as intelligent devices and IoT devices with higher requirements on energy consumption.
Drawings
To the accomplishment of the foregoing and related ends, certain illustrative aspects are described herein in connection with the following description and the annexed drawings, which are indicative of various ways in which the principles disclosed herein may be practiced, and all aspects and equivalents thereof are intended to be within the scope of the claimed subject matter. The above and other objects, features and advantages of the present disclosure will become more apparent from the following detailed description read in conjunction with the accompanying drawings. Throughout this disclosure, like reference numerals generally refer to like parts or elements.
FIG. 1 shows a schematic diagram of a circuit design of an operational circuit according to one embodiment of the invention;
FIG. 2 shows a schematic diagram of a circuit design of an exponential operation cell according to one embodiment of the invention;
FIG. 3 shows a schematic diagram of a circuit design of a divide operation unit according to one embodiment of the invention;
FIG. 4 shows a schematic diagram of a circuit design of a logistic regression (sigmoid) function operation unit according to one embodiment of the invention;
FIG. 5 shows a schematic diagram of a circuit design of a hyperbolic tangent function arithmetic unit in accordance with one embodiment of the present invention;
FIG. 6 shows a schematic diagram of a circuit design of a linear rectification function arithmetic unit according to one embodiment of the present invention;
FIG. 7 shows a schematic diagram of an instruction processing apparatus according to an embodiment of the invention;
FIG. 8 shows a schematic diagram of a processor, according to an embodiment of the invention; and
FIG. 9 shows a schematic diagram of a system on chip (SoC) according to one embodiment of the invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
FIG. 1 shows a schematic diagram of a circuit design of an arithmetic circuit 100 according to one embodiment of the invention. The circuit layout shown in fig. 1 is an overall layout of the arithmetic circuit 100. As shown in fig. 1, the arithmetic circuit 100 includes an exponent operation unit 110, a division operation unit 120, a logistic regression (sigmoid) function operation unit 130, and a selection unit 140. After receiving data to be operated on at the data input terminal data _ in 150, the arithmetic circuit 100 determines the type of operation to be operated on according to the received operation mode (mode _ sel), and transmits the data to the corresponding arithmetic unit (110, 120, or 130). The operation results of the respective operation units are sent to the selection unit 140. The selection unit 140 selects the operation result of the corresponding operation unit as the final operation result according to the operation mode, and sends the final operation result to the data output terminal data _ out 160.
Optionally, the arithmetic circuit 100 further comprises a control unit FSM 170, and the control unit 170 controls the execution process of the arithmetic circuit 100. FSM 170, upon receiving valid input indication signal valid _ in, determines that data to be operated on has been sent to data input 150, thus instructing arithmetic circuit 100 to start arithmetic processing on the data at data input 150. At the same time the FSM 170 starts counting operation cycles. Since different arithmetic units have different operation cycles, when FSM 170 determines that the operation cycle of the arithmetic unit corresponding to the operation mode has been reached (which means that the corresponding operation has been completed and the operation result has been provided at data output 160), FSM 170 may generate a valid output indication signal valid _ out. In this way, other modules coupled to the arithmetic circuit 100 can obtain the operation result for the input data _ in from the data output terminal data _ out 160 when receiving the valid _ out signal.
As shown in fig. 1, a logistic regression (sigmoid) function operation unit 130 is coupled to the exponent operation unit 110 and the division operation unit 120. When the sigmoid function operation is performed on data, a part of the sigmoid function operation related to the exponential operation and the division operation may be performed using the exponential operation provided by the exponential operation unit 110 and the division operation provided by the division operation unit 120.
In the present invention, the logistic regression function refers to a sigmoid function, and the two may be used instead of each other throughout the text. The geometry of the sigmoid function is a sigmoid curve (sigmoid curve) that maps a value between 0 and 1. Therefore, in neural networks, sigmoid functions are often used as activation functions. In a neural network, the activation function of a node defines the output of the node at a given input or set of inputs.
For input z, sigmoid function g (z) can be expressed as:
Figure BDA0002278851710000061
fig. 2 shows a schematic diagram of a circuit design of the exponential operation unit 110 in the operation circuit 100 according to an embodiment of the present invention. In the circuit design shown in fig. 2, the result of the exponential operation is approximated by taylor expansion. Taylor expansion, also known as taylor's formula, is a formula in which the information of a function at a certain point describes the values in its vicinity. If the function is sufficiently smooth, given the derivative values of the orders of the function at a certain point, the taylor formula can factor these derivative values to construct a polynomial to approximate the values of the function in the neighborhood of this point. That is, the taylor expansion result with a specific parameter can be used as the result of the exponential operation.
According to an embodiment of the present invention, for one input, an exponential operation can be performed using an 8-stage taylor expansion, that is:
m1=1+(input-a);
m2=(input-a)^2;
m3=(input-a)^3;
m4=(input-a)^4;
m5=(input-a)^5;
m6=(input-a)^6;
m7=(input-a)^7;
m8=(input-a)^8;
y=A1*(m1+B1*m2+C1*m3+D1*m4+E1*m5+F1*m6+G1*m7+H1*m8)
where a is the starting point of the input interval, and a1, B1, C1, D1, E1, F1, G1, and H1 are parameters to be used in this interval.
To provide high accuracy, the input may be divided into intervals and taylor expansion performed with different parameters for each interval. According to one embodiment of the present invention, the input may be divided into 18 intervals, and 8-stage Taylor expansion is performed at each interval.
It is understood that the invention is not limited to the number of intervals and the number of stages of taylor expansion, which can be varied according to the precision of the exponential operation, without departing from the scope of the invention.
As shown in fig. 2, the exponent operation unit 110 includes a data input module data _ in 112, a selection module SEL 114, an exponent parameter module 116, a multiplication operation module 118, and a result output module 119.
The data input module data _ in 112 receives data to be subjected to an exponential operation. The selection module SEL 114, which is then coupled to the data input module 112, determines the section in which the data is located, for example, to which of the above-mentioned 18 sections the data belongs, according to the size of the data.
Subsequently, the index parameter module 116 selects parameters corresponding to the determined intervals, such as the above-mentioned a-values and parameters A1, B1, C1, D1, E1, F1, G1, and H1.
The multiplication module 118 performs multiplication operations required for taylor expansion, corresponding addition operations, and the like. According to an embodiment of the present invention, the multiplication module 118 may include 4 multipliers, so that 4 multiplications may be performed in parallel in one operation cycle, thereby improving the execution efficiency. The invention is not limited by the number of multipliers, which can be increased or decreased according to actual needs without departing from the scope of the invention.
The multiplication operation module 118 may output the operation result data _ out at the result output unit 119 when the result of taylor expansion, i.e., the result of the exponential operation, is calculated.
Optionally, as shown in fig. 2, the exponent operation unit 110 further includes a control circuit. The control circuit includes a valid input indication signal valid _ in, an operation control module FSM 111, and a valid output indication signal valid _ out. The external module coupled to the exponent operation unit 110 sets valid _ in to indicate that data has been put in place after transmitting the data to be subjected to the exponent operation to the data input module data _ in 112. After determining that valid _ in is set, the operation control module 111 instructs the exponent operation unit 110 to start an exponent operation on the data at data _ in, and simultaneously starts counting an operation period, i.e., a clock period of the circuit. As described above, the exponent operation unit 110 has a specific clock cycle in which the exponent operation is performed. When the calculation control module 111 determines that the specific clock period has been reached, it indicates that the exponent operation for the data has been completed, and the provision of the exponent operation result has been started at data _ out 119. Therefore, the calculation control module 111 outputs a valid output indication signal or sets valid _ out. Thus, the external module can determine that the operation result is ready after finding that valid _ out is set, and thus, can obtain the exponent operation result from the data _ out 119.
According to an embodiment of the present invention, when the multiplication module 118 employs 4 multipliers, if only one multiplication can be realized in one clock cycle, the exponent operation unit 110 needs 23 clock cycles to complete one exponent operation. It should be noted that the present invention is not limited thereto, and the number of clock cycles required for the exponent operation unit 110 may be reset according to the number of stages of taylor expansion and the number of multipliers.
Fig. 3 shows a schematic diagram of a circuit design of the division operation unit 120 in the operation circuit 100 according to an embodiment of the present invention. In the circuit design shown in fig. 3, a newton-raphson method is used to perform division calculation, that is, U in the division can be rewritten as 1/a:
for function f (U) 1/U-a, the value of U is calculated when the function value is 0.
And may then be written as U n +1 Un (2-a Un), an iterative process is performed on the above equation, when Un +1 converges rapidly to near Un, indicating that the value of Un is already very close to 1/a, thus achieving the result of the division operation.
According to one embodiment of the invention, the algorithm for performing the division calculation is as follows:
let n=floor(log2(a));
a=a*2^-n;
x=x0;
loop 5 times:
x=x*(2-a*x);
loop end;
x=x*2^-n;
return x;
that is, in the circuit design shown in fig. 3, the division result is obtained using a plurality of multiplication and shift operations using the newton-raphson method. It should be noted that the invention is not limited by the specific parameters n, x0 and the number of iterations used in the method, and all ways in which multiplication and shift operations can be used to obtain a division result are within the scope of the invention.
As shown in fig. 3, the division unit 120 includes a data input module data _ in 121, a first parameter selection module SEL122, a first shift module 123, a second parameter selection module 124, a multiplication module 125, a second shift module 126, and a result output module data _ out 127.
The data input module data _ in 121 receives data to be divided. The first parameter selection module SEL122, which is then coupled to the data _ in 121, determines the shift parameter n according to the size of the data to be divided. For example, n may be set equal to floor (log2(a)), where a is the data to be divided.
The first shifting module 123 shifts a according to the shifting parameter n determined by the first parameter selecting module 122, for example, in the case of binary, shifts a to the right by n bits to obtain the value a ═ a × 2^ -n.
The second parameter selection module 124 determines an initial parameter according to the value of the data a to be divided, for example, one of coe _0 and coe _1 may be selected as an initial value of the iteration. According to one embodiment of the invention, coe _0 may be set to 0 and coe _1 may be set to 0.75, coe _1 being selected when the a value is greater than 0.5, otherwise coe _0 is selected as the iteration initial value.
The multiplication operation module 125 multiplies the data shifted by the first shift module 123 and the initial value selected by the second parameter selection module 124, for example, by iteration in the newton-raphson method, to obtain a result after a plurality of iterations. For example, in the above example, multiple iterations of multiplication (e.g., 5 iterations) of x ═ x (2-a ×) are performed in the multiplication operation module 125 to obtain x values after the multiple iterations.
Subsequently, in the second shifting module 126, the multiple iteration results output by the multiplication module 125 are shifted again to obtain the final division result. For example, in the above example, an operation of shifting x ═ x × 2^ -n by n bits to the right is performed to obtain a final result x as a division operation result and output to the result output module 127 as data _ out.
Optionally, as shown in fig. 3, the division operation unit 120 further includes a control circuit. The control circuit includes a valid input indication signal valid _ in, an operation control module FSM 128, and a valid output indication signal valid _ out. The external module coupled to the division operation unit 120 sets valid _ in to indicate that data has been put in place after transmitting the data to be subjected to the exponential operation to the data input module data _ in 121. The operation control module 128 instructs the division operation unit 120 to start an exponential operation on the data at data _ in after determining that valid _ in is set, and simultaneously starts counting an operation period, i.e., a clock period of the circuit. As described above, the division unit 120 has a specific clock cycle in which division is performed. When the calculation control module 128 determines that the specific clock cycle has been reached, it indicates that the division operation for the data has been completed, and the provision of the division result has been started at data _ out 127. Therefore, the calculation control module 128 outputs a valid output indication signal or sets valid _ out. Thus, the external module can determine that the operation result is ready after finding that valid _ out is set, and thus, can obtain the division operation result from data _ out 127.
Fig. 4 shows a schematic diagram of a circuit design of a logistic regression (sigmoid) function operation unit 130 according to an embodiment of the present invention.
The sigmoid function is an activation function, and for input z, the sigmoid function g (z) can be expressed as:
Figure BDA0002278851710000101
including an exponential operation on z and a corresponding division operation. Accordingly, sigmoid function operation section 130 performs exponent and division operations in the sigmoid function using exponent operation section 110 and division operation section 120.
As shown in fig. 4, the sigmoid function operation unit 130 includes a data input module data _ in 131, a first sigmoid operation module 132, a second sigmoid operation module 133, a shift module 134, and a result output module data _ out 135.
Data input module 131 receives data to be sigmoid computed. The first sigmoid operation module 132 is coupled to the data input module 131 and the exponent operation unit 110, receives data from the data input module 131, sends the data to the exponent operation unit 110 for performing exponent operation, and obtains an exponent operation result from the exponent operation unit 110. The circuit design of the exponential operation unit 110 has been described above with reference to fig. 2. According to one embodiment, the first sigmoid operation module 132 transmits data to the data input module 112 of the exponent operation unit 110 while setting the valid input indication signal valid _ in of the exponent operation unit 110, such that the exponent operation unit 110 starts an exponent operation on the data and sets the valid output indication signal valid _ out when an exponent operation result is obtained. The first sigmoid operation module 132 obtains an exponent operation result from the result output unit 119 of the exponent operation unit 110 when receiving a valid _ out signal. The first sigmoid computation module 132 performs addition computation to compute the sigmoid function after obtaining the exponential computation result1+ e in (1)z
The second sigmoid operation module 133 is coupled to the first sigmoid operation module 132 and the division operation unit 120, and configured to calculate an exponential operation result 1+ e obtained by the first sigmoid operation module 132zSent to the division unit 120 to obtain the division result 1/(1+ e)z). The circuit design of the division operation unit 120 has been described above with reference to fig. 3. According to one embodiment, the second sigmoid operation module 133 transmits data to the data input module 121 of the division operation unit 120 while setting the valid input indication signal valid _ in of the division operation unit 120. In this way, the division operation unit 120 starts a division operation on the data, and sets the valid _ out indicating signal when the division operation result is obtained. The second sigmoid operation module 133 obtains a division result from the result output unit 127 of the division unit 120 when receiving a valid _ out signal.
Second sigmoid operation module 133 further performs exponential operation on result ezAnd multiplying the result of the division operation to obtain a sigmoid function operation result.
Alternatively, according to one embodiment, the operation results of the exponent operation unit 110 and the divide operation unit 120 are both 32-bit numbers, so that the multiplication result will produce a 64-bit length result. To finally obtain a 32-bit result, the shift module 134 is coupled to the second sigmoid operation module 133, and performs a shift operation on the multiplication result of the second sigmoid operation module 133 to obtain a 32-bit result final sigmoid function operation result, and outputs the result to the result output module 135 as a result of performing a sigmoid function operation on the data.
It should be noted that the shifting module 134 is optional, and the number of shifts is also optional, and whether to employ the shifting module 134 and the number of shifts in the shifting module 134 may be selected according to the actual requirements of the sigmoid function operation without departing from the scope of the present invention.
Optionally, as shown in fig. 4, the sigmoid function operation unit 130 further includes a control circuit. The control circuit includes a valid input indication signal valid _ in, an operation control module FSM 136, and a valid output indication signal valid _ out. The external module coupled to the sigmoid function operation unit 130 sets valid _ in to indicate that data has been in place after transmitting the data to be sigmoid operated to the data input module data _ in 131. After determining that valid _ in is set, the operation control module 136 instructs the sigmoid function operation unit 130 to start an exponential operation on the data at data _ in, and simultaneously starts counting an operation period, i.e., a clock period of the circuit. As described above, the sigmoid function operation unit 130 has a certain clock cycle in which the sigmoid function operation is performed. When the operation control module 136 determines that the particular clock cycle has been reached, it indicates that the sigmoid function operation for the data has been completed and that the sigmoid function operation result has begun to be provided at data _ out 135. Therefore, the calculation control module 136 outputs the valid output indication signal or sets valid _ out. Thus, the external module can determine that the operation result is ready after finding that valid _ out is set, and thus, can obtain the sigmoid function operation result from data _ out 135.
The circuit design of the arithmetic circuit 100 according to the invention is described above with reference to fig. 1-4. The arithmetic circuit 100 provides hardware implementation for sigmoid function operation, and relatively independent exponential operation and division operation modules are additionally designed, so that the implementation logic of the sigmoid function operation is simplified, and the method can be easily expanded to other function implementation fields requiring exponential and division operation.
In addition, in the arithmetic circuit 100, by realizing taylor expansion in a hardware manner, and by dividing data into a plurality of sections and designing different taylor expansion parameters for each section, it is possible to perform an exponential operation with high efficiency and high accuracy, which significantly improves the hardware execution efficiency of performing the exponential operation.
In addition, optionally, according to an embodiment of the present invention, the arithmetic circuit 100 further includes other activation function arithmetic circuits. For example, as shown in fig. 1, the arithmetic circuit 100 further includes a hyperbolic tangent (tanh) function arithmetic unit 180. The tanh function is another activation function, and therefore, in the arithmetic circuit 100, the tanh function arithmetic unit 180 is juxtaposed with other arithmetic units, such as the sigmoid function arithmetic unit 130, the exponent arithmetic unit 110, and the division arithmetic unit 120. The operation circuit 100 may select whether or not to be operated by the hyperbolic tangent function operation unit 180 according to an operation mode, and the selection unit 140 may select an operation result output by the hyperbolic tangent function operation unit 180 as a final operation result according to the operation mode (mode _ sel).
In consideration of the correlation between the hyperbolic tangent function and the sigmoid function, the hyperbolic tangent function operation unit 180 is coupled to the sigmoid function operation unit 130, and performs the hyperbolic tangent function operation using the sigmoid function operation performed by the sigmoid function operation unit 130.
Fig. 5 shows a schematic circuit design diagram of an operation unit of a hyperbolic tangent (tanh) function according to an embodiment of the present invention.
The hyperbolic tangent function g (z) can be described as:
Figure BDA0002278851710000121
thus, the relationship between the hyperbolic tangent (tanh (z)) function and sigmoid (z)) function is as follows:
tanh(z)=2*sigmoid(2z)-1
as shown in fig. 5, the hyperbolic tangent function operation unit 180 includes a data input module data _ in 181, a hyperbolic tangent (tanh) operation module 182, and a result output module data _ out 183.
The data input module data _ in 181 receives data to be subjected to tanh function operation. The hyperbolic tangent operation module 182 is coupled to the data input module 181 and the sigmoid function operation unit 130, receives data to be operated from the data input module 181, preprocesses the data according to the association between tanh and sigmoid, sends the preprocessed data to the sigmoid function operation unit 130 to perform sigmoid function operation, obtains an operation result to perform post-processing, and sends the post-processed result as the tanh function operation result to the result output module data _ out 183 to be output.
According to one embodiment, the hyperbolic tangent operation module 182 performs inverse phase processing on the data, sends the data after inverse phase processing to the sigmoid function operation unit 130 to perform sigmoid function operation to obtain an operation result, and performs shift processing on the obtained sigmoid function operation result so as to send the shift output to the result output module 183 as a tanh function operation result.
The circuit design of the sigmoid function operation unit 130 has been described above with reference to fig. 4. According to one embodiment, the hyperbolic tangent function operation module 182 transmits data to the data input module 131 of the sigmoid function operation unit 130 while setting the valid input indication signal valid _ in of the sigmoid function operation unit 130, so that the sigmoid function operation unit 130 starts a sigmoid function operation on the data and sets the valid output indication signal valid _ out when a result of the sigmoid function operation is obtained. The hyperbolic tangent function operation module 182 obtains a sigmoid function operation result from the result output unit 135 of the sigmoid function operation unit 130 when receiving a valid _ out signal.
Optionally, as shown in fig. 5, the tanh function operation unit 180 further includes a control circuit. The control circuit includes a valid input indication signal valid _ in, an operation control module FSM 184, and a valid output indication signal valid _ out. The external module coupled to the tanh function operation unit 180 sets valid _ in to indicate that data has been put in place after transmitting the data to be subjected to the tanh function operation to the data input module data _ in 181. After determining that valid _ in is set, the operation control module 184 instructs the tanh function operation unit 180 to start tanh function operation on the data at data _ in, and simultaneously starts to count an operation period, i.e., a clock period of the circuit. As described above, the tanh function operation unit 180 has a specific clock cycle in which the tanh function operation is performed. When the operation control module 184 determines that the specific clock period has been reached, it indicates that the tanh function operation for the data has been completed, and the provision of the result of the tanh function operation has been started at the data _ out 183. Therefore, the calculation control module 183 outputs the valid output indication signal or sets valid _ out. Thus, the external module can determine that the operation result is ready after finding that valid _ out is set, and thus, can obtain the tanh function operation result from the data _ out 183.
In addition, optionally, according to an embodiment of the present invention, the arithmetic circuit 100 further includes other activation function arithmetic circuits. For example, as shown in fig. 1, the arithmetic circuit 100 further includes a linear rectification function (ReLU) arithmetic unit 190. The ReLU function is another activation function, and therefore, in the arithmetic circuit 100, the ReLU function arithmetic unit 190 is juxtaposed with other arithmetic units, such as the tanh function arithmetic unit 180, the sigmoid function arithmetic unit 130, the exponent arithmetic unit 110, and the division arithmetic unit 120. The operation circuit 100 may select whether or not to be operated by the ReLU function operation unit 190 according to the operation mode, and the selection unit 140 may select the operation result output by the ReLU function operation unit 190 as the final operation result according to the operation mode (mode _ sel).
Fig. 6 shows a schematic diagram of a circuit design of a linear rectification function (ReLU) arithmetic unit 190 according to an embodiment of the present invention. For a given input z, ReLU provides the following functional calculation:
Figure BDA0002278851710000131
for this, as shown in fig. 6, the linear rectification function operation unit 190 includes a data input module data _ in 192, a selection module 194, and a result output module 196.
The data input module 192 receives data to be subjected to a ReLU operation. The selection module 194 is coupled to the data input module 192 and determines the output of the selection module 194 based on the size of the data. Specifically, when the value of the data received by the data input module 192 is less than 0, a value of 0 is output; and outputting the data when the value of the data is not less than 0. The result output module 196 is coupled to the selection module 194 and obtains the output of the selection module 194 as the result of the ReLU operation.
According to one embodiment of the present invention, the data to be subjected to the ReLU operation is signed data, so the most significant bit in the binary representation thereof is a sign bit, and when the most significant bit is 1, the data is interpreted as data smaller than 0, otherwise the data is data not smaller than 0. Therefore, when the data is a binary number of 32 bits, the 32 th bit of the data, i.e., data _ in [31], may be transmitted to the selection module 194 as a selection judgment value, and the selection module 194 may select whether to output the data _ in itself or a 32-bit binary number 32' b0 of 0 on each bit according to the value of the bit.
With the arithmetic circuit 100 according to the present invention, hardware resources are greatly reduced since various functions share the hardware resources. The exponential operation unit 110 has extremely high precision, and the sigmoid function operation unit 130 and the hyperbolic tangent function operation unit 180 are based on the exponential operation unit 110, and the precision of the sigmoid function operation unit and the hyperbolic tangent function operation unit depends on the exponential operation unit 110, so that precision loss is avoided, the precision of the sigmoid function and the hyperbolic tangent function is the same as that of the exponential function, and errors are negligible.
The following table gives the operational circuit 100 and the comparison of the execution efficiency with the same precision and with the way it is implemented with conventional instructions in a processing chip with the same processing performance:
Figure BDA0002278851710000141
it can be seen that the operational circuit 100 according to the present invention can significantly improve the operational efficiency of performing various activation functions. According to an embodiment of the present invention, the arithmetic circuit 100 can be integrated into a processing chip employed in an intelligent device, an AIoT, an IoT device, or the like, so that when artificial intelligence processing is required in these edge devices or terminal devices, higher execution efficiency can be achieved with lower energy consumption.
The circuit design of the arithmetic circuit 100 has been described above in connection with fig. 1-6. The arithmetic circuit 100 may be integrated into an instruction processing apparatus such as a processor core as a hardware implementation of various exponential operations, division operations, and activation function operations.
FIG. 7 is a schematic diagram of an instruction processing apparatus 700 in which the arithmetic circuit 100 is included, according to one embodiment of the present invention. In some embodiments, instruction processing apparatus 700 may be a processor, a processor core of a multi-core processor, or a processing element in an electronic system.
As shown in FIG. 1, instruction processing apparatus 700 includes an instruction fetch unit 730. Instruction fetch unit 730 may fetch instructions to be processed from cache 710, storage 720, or other sources and send to decode unit 740. Instructions fetched by instruction fetch unit 730 include, but are not limited to, high-level machine instructions, macro instructions, or the like. The processing device 700 performs certain functions by executing these instructions.
Decode unit 740 receives incoming instructions from instruction fetch unit 730 and decodes these instructions to generate low-level micro-operations, microcode entry points, micro-instructions, or other low-level instructions or control signals. Which reflect or are derived from the received instructions. The low-level instructions or control signals may operate at a low level (e.g., circuit level or hardware level) to implement the operation of high-level instructions. Decoding unit 740 may be implemented using a variety of different mechanisms. Examples of suitable mechanisms include, but are not limited to, microcode, look-up tables, hardware implementations, Programmable Logic Arrays (PLAs). The present invention is not limited to the various mechanisms for implementing decode unit 740, and any mechanism that can implement decode unit 740 is within the scope of the present invention.
According to one embodiment, in the instruction processing apparatus 700, a specific instruction set is customized, including specific instructions defined for various functions that can be executed in the arithmetic circuit 100, such as an exponential function and various activation functions. The instruction processing apparatus 700 further includes a calculation acceleration enabling unit 780. The calculation acceleration enabling unit 780 is used to control whether the operation circuit 100 is used to perform the exponential function and the activation function operations.
The decode unit 740 decodes instructions that handle the exponent function and various activation functions, and decodes these instructions into a conventional instruction set for execution by a conventional instruction execution unit if the computation acceleration enable unit 780 instructs not to activate the arithmetic circuit 100 to perform hardware acceleration operations. Whereas if the compute acceleration enable unit 780 indicates that hardware acceleration operations are to be initiated, the decode unit 740 decodes these instructions into specialized instructions for execution by specialized instruction execution units.
These decoded instructions are then sent to execution unit 750 and executed by execution unit 750. Execution unit 750 includes circuitry operable to execute instructions. Execution unit 750, when executing these instructions, receives data input from and generates data output to register set 770, cache 710, and/or storage 720. According to one embodiment, the execution unit 750 is further coupled to the arithmetic circuit 100 for performing operations of the exponential function and other activation functions by the arithmetic circuit 100.
In one embodiment, the register set 770 includes architectural registers, also referred to as registers. Unless specified otherwise or clearly evident, the phrases architectural register, register set, and register are used herein to refer to a register that is visible (e.g., software visible) to software and/or programmers and/or that is specified by a macro-instruction to identify an operand. These registers are different from other non-architected registers in a given microarchitecture (e.g., temporary registers, reorder buffers, retirement registers, etc.). According to one embodiment, the register set 770 may include a set of vector registers 175, each of which 775 may be 512 bits, 256 bits, or 128 bits wide, or may use a different vector width. Optionally, the register set 770 may also include a set of general purpose registers 776. General purpose registers 176 may be used when an execution unit executes an instruction, such as to store jump conditions, store instruction operation results, store addresses to access data, store data read from cache 110 and/or storage device 120, and so forth.
Execution unit 750 may include a number of specific instruction execution units 750a, 750b … 750c, etc., such as, for example, an arithmetic unit, an Arithmetic Logic Unit (ALU), an integer unit, a floating point unit, a data access unit, etc., and may execute different types of instructions, respectively.
According to one embodiment of the present invention, the instruction execution unit 750a is coupled to the operation circuit 100, and when performing the operation of the exponential function and various activation functions, sends data required for the operation and a corresponding calculation mode to the operation circuit 100, and obtains an execution result of the operation circuit 100 as a corresponding operation result.
To avoid obscuring the description, a relatively simple instruction processing apparatus 700 has been shown and described. It should be understood that instruction processing apparatus 700 may have a different form, for example, other embodiments of instruction processing apparatus or processors may have multiple cores, logical processors, or execution engines.
The calculation acceleration enabling unit 780 may be configured according to an actual application environment of the instruction processing apparatus 700, when the instruction processing apparatus 700 is mainly used in a neural network, the calculation acceleration enabling unit 780 may enable the operation circuit 100, and accelerate operations of the exponential function and the activation function by using hardware, and when the instruction processing apparatus 700 is not used in the neural network field, the calculation acceleration enabling unit 780 may disable the operation circuit 100, so as to reduce additional energy consumption brought by the operation circuit 100.
Processor cores may be implemented in different processors in different ways. For example, a processor core may be implemented as a general-purpose in-order core for general-purpose computing, a high-performance general-purpose out-of-order core for general-purpose computing, and a special-purpose core for graphics and/or scientific (throughput) computing. While a processor may be implemented as a CPU (central processing unit) that may include one or more general-purpose in-order cores and/or one or more general-purpose out-of-order cores, and/or as a coprocessor that may include one or more special-purpose cores. Such a combination of different processors may result in different computer system architectures. In one computer system architecture, the coprocessor is on a separate chip from the CPU. In another computer system architecture, the coprocessor is in the same package as the CPU but on a separate die. In yet another computer system architecture, coprocessors are on the same die as the CPU (in which case such coprocessors are sometimes referred to as special purpose logic such as integrated graphics and/or scientific (throughput) logic, or as special purpose cores). In yet another computer system architecture, referred to as a system on a chip, the described CPU (sometimes referred to as an application core or application processor), coprocessors and additional functionality described above may be included on the same die. Exemplary processor and computer architectures will be described subsequently with reference to fig. 8 and 9.
Fig. 8 shows a schematic diagram of a processor 1100 according to an embodiment of the invention. As shown in solid line blocks in fig. 8, according to one embodiment, processor 1110 includes a single core 1102A, a system agent unit 1110, and a bus controller unit 1116. As shown in the dashed box in FIG. 8, the processor 1100 may also include a plurality of cores 1102A-N, an integrated memory controller unit 1114 in a system agent unit 1110, and application specific logic 1108, in accordance with another embodiment of the present invention.
According to one embodiment, processor 1100 may be implemented as a Central Processing Unit (CPU), where dedicated logic 1108 is integrated graphics and/or scientific (throughput) logic (which may include one or more cores), and cores 1102A-N are one or more general-purpose cores (e.g., general-purpose in-order cores, general-purpose out-of-order cores, a combination of both). According to another embodiment, processor 1100 may be implemented as a coprocessor in which cores 1102A-N are a number of special purpose cores for graphics and/or science (throughput). According to yet another embodiment, processor 1100 may be implemented as a coprocessor in which cores 1102A-N are a plurality of general purpose in-order cores. Thus, the processor 1100 may be a general-purpose processor, coprocessor or special-purpose processor, such as, for example, a network or communication processor, compression engine, graphics processor, GPGPU (general purpose graphics processing unit), a high-throughput Many Integrated Core (MIC) coprocessor (including 30 or more cores), embedded processor, or the like. The processor may be implemented on one or more chips. Processor 1100 may be a part of, and/or may be implemented on, one or more substrates using any of a number of processing technologies, such as, for example, BiCMOS, CMOS, or NMOS.
The memory hierarchy includes one or more levels of cache within the cores, one or more shared cache units 1106, and external memory (not shown) coupled to the integrated memory controller unit 1114. The shared cache unit 1106 may include one or more mid-level caches, such as a level two (L2), a level three (L3), a level four (L4), or other levels of cache, a Last Level Cache (LLC), and/or combinations thereof. Although in one embodiment, the ring-based interconnect unit 1112 interconnects the integrated graphics logic 1108, the shared cache unit 1106, and the system agent unit 1110/integrated memory controller unit 1114, the invention is not so limited and any number of well-known techniques may be used to interconnect these units.
The system agent 1110 includes those components of the coordination and operation cores 1102A-N. The system agent unit 1110 may include, for example, a Power Control Unit (PCU) and a display unit. The PCU may include logic and components needed to adjust the power states of cores 1102A-N and integrated graphics logic 1108. The display unit is used to drive one or more externally connected displays.
The cores 1102A-N may have various core architectures and may be homogeneous or heterogeneous in terms of the architecture instruction set. That is, two or more of the cores 1102A-N may be capable of executing the same instruction set, while other cores may be capable of executing only a subset of the instruction set or a different instruction set.
FIG. 9 shows a schematic diagram of a system on chip (SoC)1500 according to one embodiment of the invention. The system-on-chip shown in fig. 9 includes the processor 1100 shown in fig. 8, and therefore like components to those in fig. 8 have the same reference numerals. As shown in fig. 9, the interconnect unit 1502 is coupled to an application processor 1510, a system agent unit 1110, a bus controller unit 1116, an integrated memory controller unit 1114, one or more coprocessors 1520, a Static Random Access Memory (SRAM) unit 1530, a Direct Memory Access (DMA) unit 1532, and a display unit 1540 for coupling to one or more external displays. The application processor 1510 includes a set of one or more cores 1102A-N and a shared cache unit 110. The coprocessor 1520 includes integrated graphics logic, an image processor, an audio processor, and a video processor. In one embodiment, the coprocessor 1520 comprises a special-purpose processor, such as, for example, a network or communication processor, compression engine, GPGPU, a high-throughput MIC processor, embedded processor, or the like.
A system on chip (SoC) or a processor according to the present invention may be used in various smart devices to implement corresponding functions in the smart devices. Such smart devices include, but are not limited to, in-vehicle devices, smart speakers, smart display devices, IoT devices, mobile terminals, personal digital terminals, and the like.
It should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules or units or components of the devices in the examples disclosed herein may be arranged in a device as described in this embodiment or alternatively may be located in one or more devices different from the devices in this example. The modules in the foregoing examples may be combined into one module or may be further divided into multiple sub-modules.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
As used herein, unless otherwise specified the use of the ordinal adjectives "first", "second", "third", etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking, or in any other manner.
While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of this description, will appreciate that other embodiments can be devised which do not depart from the scope of the invention as described herein. Furthermore, it should be noted that the language used in the specification has been principally selected for readability and instructional purposes, and may not have been selected to delineate or circumscribe the inventive subject matter. Accordingly, many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the appended claims. The present invention has been disclosed in an illustrative rather than a restrictive sense, and the scope of the present invention is defined by the appended claims.

Claims (16)

1. An arithmetic circuit adapted to operate on data, comprising:
the index operation unit is suitable for performing index operation on the data;
a division operation unit adapted to perform division operation on the data;
a logistic regression (sigmoid) function operation unit adapted to perform logistic regression function operation on the data; and
a selection unit adapted to select an operation output of the exponential operation unit, the division operation unit, or the logistic regression function operation unit as an operation result according to an operation mode,
the logistic regression function operation unit is coupled to the exponential operation unit and the division operation unit, and is suitable for performing exponential operation in the logistic regression function operation by using the exponential operation unit and performing division operation in the logistic regression function operation by using the division operation unit when performing logistic regression function operation on the data.
2. The operation circuit of claim 1, wherein the exponent operation unit comprises:
the index data input module is suitable for receiving data to be subjected to index operation;
the index selection module is suitable for determining the interval to which the data belong according to the size of the data to be operated;
the index parameter module is suitable for providing parameters corresponding to the determined interval, and the parameters are parameters required for Taylor expansion of index operation;
an exponential multiplication module adapted to perform a multiplication operation using the provided parameters and the data to perform taylor expansion processing for the data; and
and the exponent result output module is suitable for acquiring the Taylor expansion processing result of the multiplication module so as to provide the exponent operation result.
3. The arithmetic circuit of claim 2, wherein the exponent operation unit further comprises:
an exponent operation control module adapted to instruct the exponent operation unit to start an exponent operation process on the received data upon receipt of a valid input instruction signal, and to generate a valid output instruction signal to instruct that an exponent operation result of the data is provided at the exponent result output module after a predetermined operation period of the exponent operation unit has elapsed.
4. The arithmetic circuit of any one of claims 1-3, wherein the division unit comprises:
the division data input module is suitable for receiving data to be subjected to division operation;
the first division selection parameter selection module is suitable for determining a first division parameter according to the size of the data subjected to division operation;
the first division shifting module is suitable for shifting the data subjected to division operation according to the first division parameter;
a second division parameter selection module adapted to select a second or third division parameter based on the data;
a multiplication module adapted to multiply the output of the first division shifting module and the output of the second division parameter selection module;
a second division shift module adapted to shift an output of the multiplication module; and
and the division result output module is suitable for acquiring the output of the second division shifting module and taking the output as the operation result of the division operation on the data.
5. The arithmetic circuit of claim 4, wherein the division unit further comprises:
a division control module adapted to instruct the division unit to start a division process on the received data upon receipt of a valid input instruction signal and to generate a valid output instruction signal to indicate that a division result of the data is provided at the division result output module after a predetermined operation period of the division unit has elapsed.
6. The operation circuit according to any one of claims 1 to 5, wherein the logistic regression function operation unit includes:
the logistic regression operation data input module is suitable for receiving data to be subjected to logistic regression function operation;
the first logistic regression operation module is coupled to the index operation unit and is suitable for sending the data to the index operation unit for carrying out index operation and obtaining an index operation result;
the second logistic regression operation module is coupled to the division operation unit and is suitable for sending the exponential operation result obtained by the first operation module to the division operation unit to obtain a division operation result and multiplying the exponential operation result and the division operation result to obtain a multiplication result;
the logistic regression shifting module is suitable for shifting the multiplication result of the second logistic regression operation module; and
and the logistic regression operation result output module is suitable for acquiring the shift output of the logistic regression shift module and taking the shift output as a result of carrying out logistic regression operation on the data.
7. The arithmetic circuit of claim 6, wherein the logistic regression function arithmetic unit further comprises:
a logistic regression operation control module adapted to instruct the logistic regression operation unit to start logistic regression operation processing on the received data when a valid input instruction signal is received, and to generate a valid output instruction signal to instruct that a logistic regression operation result of the data is provided at the logistic regression operation result output module after a predetermined operation period of the logistic regression operation unit has elapsed.
8. The arithmetic circuit according to any one of claims 1 to 7, further comprising a hyperbolic tangent (tanh) function arithmetic unit adapted to perform a hyperbolic tangent function operation on the data;
the selection unit is adapted to select an operation output of the exponential operation unit, the division operation unit, the logistic regression function operation unit, or the hyperbolic tangent function operation unit as an operation result according to the operation mode, an
The hyperbolic tangent function operation unit is coupled to the logistic regression function operation unit and is suitable for performing logistic regression function operation in the hyperbolic tangent function operation by using the logistic regression function operation unit when performing the hyperbolic tangent function operation on the data.
9. The arithmetic circuit of claim 8, wherein the hyperbolic tangent function arithmetic unit includes:
the hyperbolic tangent operation data input module is suitable for receiving data to be subjected to hyperbolic tangent function operation;
the hyperbolic tangent operation module is coupled to the logistic regression function operation unit and is suitable for carrying out inverse processing on the data to be operated, sending the data after the inverse processing to the logistic regression function operation unit for carrying out logistic regression function operation and carrying out shift processing on the operation result of the logistic regression function; and
and the hyperbolic tangent operation result output module is suitable for obtaining the displacement output of the hyperbolic tangent operation module and taking the displacement output as a result of performing hyperbolic tangent function operation on data.
10. The arithmetic circuit of claim 9, wherein the hyperbolic tangent function arithmetic unit further comprises:
the hyperbolic tangent function operation control module is suitable for instructing the hyperbolic tangent function operation unit to start hyperbolic tangent function operation processing on received data when an effective input instruction signal is received, and generating an effective output instruction signal after a preset operation period of the hyperbolic tangent function operation unit is passed so as to indicate that a hyperbolic tangent function operation result of the data is provided at the hyperbolic tangent function operation result output module.
11. The operational circuit of any one of claims 1-10, further comprising a linear rectification function (ReLU) operational unit adapted to perform a linear rectification function operation on the data, an
The selection unit is adapted to select an operation output of the exponential operation unit, the division operation unit, the logistic regression function operation unit, the hyperbolic tangent function operation unit, or the linear rectification function (ReLU) operation unit as an operation result according to the operation mode.
12. The operation circuit according to claim 11, wherein the linear rectification function operation unit comprises:
a linear rectification selection unit adapted to output a value of 0 when a value of data to be subjected to a linear rectification function operation is less than 0; and when the value of the data is not less than 0, outputting the data as the operation result of the linear rectification function operation.
13. The arithmetic circuit of claim 12, wherein a most significant bit of the binary representation of the data is a1, indicating that a value of the data is less than 0.
14. A processor, comprising:
an arithmetic circuit as claimed in any one of claims 1 to 13 and adapted to receive data to be operated on and an operation mode signal and to operate on the received data in accordance with the operation mode.
15. A system on a chip comprising the processor of claim 14.
16. A smart device comprising the system on a chip of claim 15.
CN201911133061.4A 2019-11-19 2019-11-19 Circuit for realizing activation function and processor comprising same Pending CN112906876A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911133061.4A CN112906876A (en) 2019-11-19 2019-11-19 Circuit for realizing activation function and processor comprising same

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911133061.4A CN112906876A (en) 2019-11-19 2019-11-19 Circuit for realizing activation function and processor comprising same

Publications (1)

Publication Number Publication Date
CN112906876A true CN112906876A (en) 2021-06-04

Family

ID=76103432

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911133061.4A Pending CN112906876A (en) 2019-11-19 2019-11-19 Circuit for realizing activation function and processor comprising same

Country Status (1)

Country Link
CN (1) CN112906876A (en)

Similar Documents

Publication Publication Date Title
Trivedi et al. Design & analysis of 16 bit RISC processor using low power pipelining
US8443170B2 (en) Apparatus and method for performing SIMD multiply-accumulate operations
US9928063B2 (en) Instruction and logic to provide vector horizontal majority voting functionality
JP6243000B2 (en) Vector processing engine with programmable data path configuration and related vector processor, system, and method for providing multi-mode vector processing
CN112099852A (en) Variable format, variable sparse matrix multiply instruction
US10275247B2 (en) Apparatuses and methods to accelerate vector multiplication of vector elements having matching indices
CN113986356A (en) Interruptible and restartable matrix multiply instruction, processor, method and system
JP2006012182A (en) Data processing system and method thereof
JP5607832B2 (en) General logic operation method and apparatus
CN107667345B (en) Packed data alignment plus computation instruction, processor, method and system
US10642614B2 (en) Reconfigurable multi-precision integer dot-product hardware accelerator for machine-learning applications
US10990397B2 (en) Apparatuses, methods, and systems for transpose instructions of a matrix operations accelerator
US20160239300A1 (en) Vector operations with operand base system conversion and re-conversion
US11474825B2 (en) Apparatus and method for controlling complex multiply-accumulate circuitry
CN111611202A (en) Systolic array accelerator system and method
CN111767512A (en) Discrete cosine transform/inverse discrete cosine transform DCT/IDCT system and method
US7693926B2 (en) Modular multiplication acceleration circuit and method for data encryption/decryption
CN112486907A (en) Hardware implementation method for multi-layer circulation task on reconfigurable processor
CN111814093A (en) Multiply-accumulate instruction processing method and device
EP1122688A1 (en) Data processing apparatus and method
CN112906876A (en) Circuit for realizing activation function and processor comprising same
US10042605B2 (en) Processor with efficient arithmetic units
Moon et al. An area-efficient standard-cell floating-point unit design for a processing-in-memory system
US9880810B1 (en) Universal shifter and rotator and methods for implementing a universal shifter and rotator
Deepika et al. Microarchitecture based RISC-V Instruction Set Architecture for Low Power Application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination