CN112862086A - Neural network operation processing method and device and computer readable medium - Google Patents

Neural network operation processing method and device and computer readable medium Download PDF

Info

Publication number
CN112862086A
CN112862086A CN202011574026.9A CN202011574026A CN112862086A CN 112862086 A CN112862086 A CN 112862086A CN 202011574026 A CN202011574026 A CN 202011574026A CN 112862086 A CN112862086 A CN 112862086A
Authority
CN
China
Prior art keywords
function
neural network
processing method
value
input value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011574026.9A
Other languages
Chinese (zh)
Inventor
李坤傧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Lanyang Intelligent Technology Co ltd
Original Assignee
Nanjing Lanyang Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Lanyang Intelligent Technology Co ltd filed Critical Nanjing Lanyang Intelligent Technology Co ltd
Priority to CN202011574026.9A priority Critical patent/CN112862086A/en
Publication of CN112862086A publication Critical patent/CN112862086A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a neural network operation processing method, a device and a computer readable medium, comprising the following steps: receiving input values of two functions of the operation; checking a value range of an input value of at least one of the functions; and selecting one of the input values of the two functions to execute operation according to the check result. The invention provides a neural network operation processing method and device for reducing power consumption, shortening processing time and improving performance, and can maximally multiplex hardware into an architecture design of activation-normalization, normalization-activation and activation-weight.

Description

Neural network operation processing method and device and computer readable medium
Technical Field
The invention discloses a neural network operation processing method, a neural network operation processing device and a computer readable medium, and relates to the technical field of engineering such as low power consumption design and neural network operation.
Background
With the rapid development of artificial intelligence technology, neural network operation has been widely and successfully applied in the fields of images, voice, characters and the like with mass data. In a specific application scenario of a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), and the like, multiplication is one of the most basic operations in many applications, and therefore, the power consumption and processing time of the multiplication operation usually account for a large part of the total power consumption and processing time.
In the prior art, a common conventional method is to check whether one of the two inputs to the multiplication operation is zero. For example, the layers accelerator checks whether the input from the feature map is zero to prevent the MAC datapath from switching when the input is zero. If the input is N bits of data, an N-bit comparator is required. For example, N-bit data in the Eyeris accelerator is 16-bit data. When the input data size is a runtime variable, such as 16-bit, 8-bit, or 4-bit data, the comparator is also required to have the capability of comparing the variable bit length data. See reference 1 for details: y. h.chen, t.krishna, j.emer, v.sze, "eyeris: an Energy-Efficient Reconfigurable Accelerator (Eyeris: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks), "IEEE Journal of Solid State Circuits (JSSC), ISSCC specific Issue, Vol.52, No.1, pp.127-138, January 2017.
To achieve a deeper neural network, such as a 1001-layer neural network, contrary to the traditional "post-activation" solution perspective, he camama, dian rain, no paniculate swallowwort, grand sword are in reference 2: the 'preactivation' scheme of the weighting layer is proposed in the 'identity mapping' in the deep residual network. As shown in fig. 1, wherein fig. 1(a) is a "post-activation" diagram and fig. 1(b) is a "pre-activation" diagram, the activation function ReLu (linear correction unit) is executed after BN (batch normalization). In reference document 2, the weight layer is called a convolutional layer. Each of the blocks shown in FIG. 1, such as BN, ReLu, Weight, and Addition, is a layer of the neural network model. "layers" and "functions" are used interchangeably. Wherein:
ReLu function: f (x) max (0, x); if the input x is negative, the output of ReLu f (x) is f (x) 0, otherwise the output of ReLu is x.
BN (batch normalization): for specific details, see reference 3: ioffe, S., Szegedy, C., Batchnormal: accurate deep network tracking internal covariate shift. in: ICML (2015).
Since there are other types of normalization, the calculation can be summarized as: g (x) ═ μ λ. A simplified illustration is shown in fig. 2. When a is the input (x- μ) and B is the normalized scaling factor λ, the output of the normalized activation f (c) ═ f (B x a), where f is the activation function (ReLu in this case).
Furthermore, in conventional methods, such as the method used in eyeris, the PE result needs to be stored in a memory, such as a Global Buffer (Global Buffer) in eyeris, the data is read from the memory, ReLu is executed, and the ReLu result is stored in the memory (e.g., DRAM in eyeris). Eyeris performs run-length coding (RLC) on the ReLu results before storing them into DRAM. Then the eeyeris reads back the compressed ReLu result, performs RLC decoding, and stores the decoded result into Global Buffer together with the weight of the filter, and finally performs multiplication or multiply-add/multiply-accumulate using a PE array having a plurality of MACs. This approach requires storing the ReLu results in memory and then reading them from memory to perform the "layer Weight" process, which relatively lengthens the processing time.
In a word, in the prior art, when a large number of multiplication calculations are performed, problems such as huge calculation amount requirements, memory occupation requirements, high bandwidth requirements and the like are often caused, so that high requirements are provided for realizing a large-scale neural network by hardware.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the defects of the prior art, the invention provides a neural network operation processing method, a device and a computer readable medium, which achieve the purpose of reducing the power consumption and the processing time of multiplication operation by checking the symbol or value range information of at least one of the inputs.
The invention adopts the following technical scheme for solving the technical problems:
in a first aspect, the present invention discloses a neural network operation processing method, including: receiving input values of two functions of the operation; checking a value range of an input value of at least one of the functions; and determining the execution strategy of the function according to the check result.
Further, the checking the value range of the input value of the at least one function includes checking a sign of the input value of the at least one function, specifically checking sign information or a sign bit of the input value of the function; the operation comprises a multiplication operation.
Further, the determining the execution policy of the function includes: if the range of the input value of the checked function is a positive value, one of the two functions is executed in a first mode, otherwise, one of the two functions is executed in a second mode.
The executing one of two functions in the first mode comprises: a multiplication operation is performed.
The executing one of two functions in the second mode comprises: and assigning the output of one of the two functions to a set constant value.
Further, the method further comprises: predicting the range of the operation result by checking the value range of the input value of at least one function; if the range of the predicted operation result is a positive value, the function is executed, otherwise, the function is not executed.
The determining the execution policy of the function specifically includes: if the value range of the input value of the checked function indicates that the input value of the checked function is a negative value, the function is not executed; and executing the function if the value range of the input value of the checked function indicates that the input value of the checked function is a positive value.
And if the function is not executed, the constant result is 0.
The determining the execution policy of the function may also include: if the value range of the input value of the checked first function is a positive value, selecting a first coefficient as the input value of a second function; and if the range of the input value of the checked first function is a negative value, selecting the second coefficient as the input value of the second function.
The performing of the operation comprises multiplying the input value of the examined function with the selected coefficient.
Further, the executing of the function further comprises: if the value range of the input value of the checked first function is a positive value, selecting a first coefficient as an input and multiplying the input value of the first function by the selected first coefficient; if the range of the input value of the checked first function is negative, selecting a first coefficient and a second coefficient as inputs and multiplying the input value of the first function by the selected first coefficient and the selected second coefficient.
As a preferred embodiment of the present invention, the multiplication operation is implemented as a multiplier, a shifter, a series of adders and shifters, or a series of and gates and adders.
As a preferred embodiment of the present invention, the two functions are a normalization function and an activation function, respectively. Wherein the normalization function comprises: a layer normalization function, an instance normalization function, a group normalization function, or a switchable normalization function; the activation function includes a ReLu function, a PReLu function, a ReLu6 function, or a ReLuN function.
In a second aspect, the present invention also discloses a neural network operation processing apparatus, including a memory, a processor and a computer program stored on the memory and operable on the processor, where the processor includes a plurality of multipliers, and is characterized in that the processor implements the following steps when executing the computer program: receiving input values of two functions of the operation; checking a value range of an input value of at least one of the functions; and determining the execution strategy of the function according to the check result.
In a third aspect, the invention also discloses a computer readable medium having a non-volatile program code executable by a processor, the program code causing the processor to execute the neural network operation processing method.
Compared with the prior art, the invention adopting the technical scheme has the following technical effects: a neural network operation processing method and apparatus for reducing power consumption, shortening processing time and improving performance are provided.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the aforementioned and other objects, features and advantages of the invention more comprehensible, preferred embodiments accompanied with figures are described in detail below
The attached drawings are described in detail as follows.
Drawings
FIG. 1 is a schematic diagram of a "post-activation" scheme and a "pre-activation" scheme in the prior art.
Fig. 2 is a diagram illustrating the execution of the activation function ReLu after BN in the prior art.
FIG. 3 is a schematic of the process of the present invention.
Fig. 4 is a schematic diagram of a neg indicator.
Fig. 5 is a schematic diagram of an exemplary PE array.
Fig. 6 is an exemplary circuit schematic for skip detection.
FIG. 7 is a schematic diagram of the application of "ReLu-Weight" fusion.
FIG. 8 is a schematic diagram of the "ReLu-Weight" fusion method in the present invention.
FIG. 9 is another exemplary circuit schematic for skip detection.
FIG. 10 is a schematic diagram of the "PReLu-Weight" fusion method in the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
The technical scheme of the invention is further explained in detail by combining the attached drawings:
fig. 3 shows a schematic diagram of the processing method of the present invention, where the normalization function is a BN function, the activation function is a ReLu function, and the ReLu function may be replaced by other types of activation functions such as prilu, ReLu6, and ReLuN. The BN function may be replaced by other types of normalization functions such as layer normalization, instance normalization, group normalization, switchable normalization, etc.
The first design innovation point of the invention is as follows: when the activation function ReLu is executed after the BN function is executed, the proposed method is applied to reduce power consumption and processing time. The method comprises the following specific steps:
when applying the ReLu function to the multiplication result C ═ AxB, i.e., f (C) ═ f (AxB) shown in fig. 2, the proposed method is to check the sign of a and the sign of B, if they are different, C will be a negative value, and f (C) will be zero. Thus, unnecessary multiplication calculations can be avoided by actually checking the signs of the two inputs.
In one specific embodiment of the present invention, a ═ -1 and B ═ 7 are shown in the form of a 4-bit 2 complement as shown in the following table.
And f (c) ═ 7. In the complement of 2, the sign bit of a is 1 'B1, and the sign bit of B is also 1' B1.
Figure BDA0002861222150000041
As shown in fig. 4, the exemplary detection circuit for checking whether C is negative (neg) can be implemented as a simple xor gate. If the neg indicator is true (i.e., neg 1' b1), indicating that C is negative, then the multiplication used to calculate C AxB may be avoided or skipped to reduce power consumption.
In some implementations, the multiplication operation requires multiple cycles. For example, two-stage pipelined multipliers are implemented in eyeris, described in the background. Thus, avoiding multiplication also shortens the processing time.
In another embodiment, a bit-serial multiplier is employed, so skipping multiplication can reduce the number of clock cycles required to perform the bit-serial multiplication operation by the bit-serial multiplier. The bit-serial multiplier may be implemented by only a shifter for representing a power of two multiplication. The bit-serial multiplier may also employ an add-shift or and-gate-shift architecture to represent arbitrary coefficients, rather than powers of two coefficients.
In another embodiment, as shown in FIG. 5, using a PE array with multiply or multiply-add/multiply-accumulate (MAC) units that can be performed in parallel, if all the neg indicators (neg 1, neg 2, neg 3, neg 4) indicate that C1-C4 will be 0, then processing time can be reduced since no multiply or MAC operation need be performed. If any neg indicator N indicates that CN will be positive, its associated MAC unit still needs to compute CN, and therefore total processing time cannot be reduced. However, the ability to associate other MAC units whose neg indicator indication results in a negative value may be reduced.
The examples in the table below show a-1 and B-7 in the form of 4-bit signed magnitudes. In this example, since C ═ 7, f (C) is 0.
Figure BDA0002861222150000051
The present method can still use the simple xor gate described above to generate the neg indicator with the sign bit of a and the sign bit of B. Other numerical systems besides 2's complement and sign magnitude may also be employed.
The proposed method can also be combined with a check to determine if the input is zero. An example of such a detection circuit is shown in fig. 6.
In another embodiment, input B is designed as an unsigned data type. At this time, if the circuit is designed specifically for this use case, the exclusive or gate can be omitted and the sign bit of a is directly used as the neg indicator.
The second design innovation point of the invention is that: the optimization method proposed for the "pre-activation" of the weight layer, namely the execution of the activation function ReLu (linear correction unit) after the execution of the BN function. Two pre-activation operations, namely BN, are shown in FIG. 71→ReLu1And BN2→ReLu2. Wherein: BN1And BN2Respectively, the first and second BN layers. Similarly, ReLu1And ReLu2Representing the first and second ReLu layers, respectively.
We will introduce another optimization method for the second aspect of the invention. In conventional designs, we can utilize weighting layers1And layer BN2The layers of (1) are fused so as to optimize the multiplication amount of the two layers. Application of our proposed method to layer ReLu2And layer Weight2Not only can data access be reduced, but also the power consumption for executing multiplication can be reduced.
As mentioned in the background of the present application, conventional methods, such as those used in eyeris, require storing PE results to memory (e.g., Global Buffer in eyeris), reading data from memory, executing ReLu, and then storing the ReLu results to memory (e.g., DRAM in eyeris). Eyeris performs run-length coding (RLC) on the ReLu results before storing them into DRAM. Then the eeyeris reads back the compressed ReLu result, performs RLC decoding, and stores the decoded result into Global Buffer together with the weight of the filter, and finally performs multiplication or multiply-add/multiply-accumulate using a PE array having a plurality of MACs.
The conventional approach is to perform the ReLu function after the multiplier-adder unit, while the present invention performs layer fusion of the "ReLu layer" and the "Weight layer". Furthermore, the method uses a very simple circuit to perform the ReLu function before the arithmetic unit, such as a multiplier or a multiplier-adder unit. Through the fusion, the invention does not need to store the results of executing the ReLu function in the memory and then read the results from the memory to execute the 'Weight layer' processing. This approach works equally well for other activation functions used in the design.
The 1 × 1 convolution is taken as a weight layer with a 4-channel MxN input feature map. The output signature for position (0,0) is:
y0(0,0)=w 10*x0(0,0)+w 11*x1(0,0)+w 12*x2(0,0)+w13*x3(0,0);
wherein w 10-w 13 are the weights of the first kernel in this weight layer. Typically, a weight layer has multiple cores.
x0(0,0)、x1(0,0)、x2(0,0)、x3(0,0) is input feature map data of position (0, 0).
FIG. 8 illustrates this layer fusion for one embodiment, as the previous layer of the weight layer is the ReLu layer. A is the input of the ReLu layer, B is the Weight of the Weight layer, and f () is the ReLu function. With such layer fusion, only a single memory access for reading the a and f (a) results of the ReLu function is provided directly as one of the inputs to the multiply or multiply-add/multiply-accumulate operation. Fig. 8 is a simplified diagram, not showing the accumulation portion of the weight layer. When either f (A) or B is zero, the multiplication step may be skipped.
As previously mentioned, the ReLu function is: f (x) max (0, x); that is, if the input x is negative, the output of the ReLu function is f (x) 0, otherwise the output of the ReLu function is x.
Detecting if f (a) is zero can be done by checking if a is less than or equal to 0, as shown in fig. 9 (a). Another exemplary implementation for such detection is to re-use the aforementioned xor gate in fig. 6 in the manner shown in fig. 9 (b). Furthermore, if the circuit is designed specifically for this use case, the exclusive or gate can be omitted and the sign bit of a is used directly as the neg indicator.
If there are multiple PEs, for example 8 PEs, then there will be skip 0-skip 7. If all of these skip signals indicate that an operation is to be skipped, we can not only reduce power consumption, but also skip processing cycles. In this case, look-ahead circuits (look-ahead circuits) may be implemented to check the symbols of several a. Since the symbol check circuit is very simple, a corresponding look-ahead circuit is also easy to implement.
For example, when using a (s, t) to represent the input of the s-th PE in period t, then:
a (0,0), A (1,0), A (2,0), … …, A (7,0) represent inputs A of PE 0-PE 7 in cycle 0;
a (0,1), A (1,1), A (2,1), … …, A (7,1) represent inputs A of PE 0-PE 7 in cycle 1;
a (0,2), A (1,2), A (2,2), … …, A (7,2) represent inputs A of PE 0-PE 7 in cycle 2;
a (0,3), A (1,3), A (2,3), … …, A (7,3) represent inputs A of PE 0-PE 7 in cycle 3;
if all these A skip signals indicate that the operation is to be skipped, four cycles can be saved, thereby improving performance.
In addition to ReLu, other types of activation functions may also benefit from this layer fusion. For example, parameterised relu (pralu). In another embodiment of the present invention, a PReLu function is employed, wherein:
f(x)=max(αx,x);α∈[0,1),or,f(x)=max(αx,x);α∈(0,1)。
Figure BDA0002861222150000071
in the above embodiment, the calculation of "PReLu-Weight" fusion C ═ B x f (a) may be changed to:
C=B x A,for A≥0;
C=αx B x A,for A<0。
one of which is shown in fig. 10 (a). Alpha x B can be computed in the pipeline, thus eliminating the need to first store intermediate results BxA to memory and read it back to execute the PReLu function, i.e., alpha x (B x A).
In another embodiment, since B and α are constants, α x B need not be calculated at runtime, and α x B can be pre-calculated as B' offline. Furthermore, the symbol of a may be checked first in order to read out B or B' from the memory. As shown in fig. 10(b), the,Bindicating data B or B' read from the memory according to the symbol of a. The neg and skip circuits shown in fig. 9 can be used in this embodiment.
If the fusion method of the invention is not adopted, two multiplication operations are needed to obtain the final C, namely:
C=B x f(A)while,
f(A)=A,for A≥0;
f(A)=αx A,for A<0。
similarly, the PReLu function can be used for normalization-activation fusion. When a is the input and B is the normalized scaling factor, the output of normalization-activation is f (c) ═ f (B × a):
F(C)=B x A,for B x A≥0;
F(C)=λ*B x A,for B x A<0。
on the other hand, the above-described fusion methods for "ReLu-Weight" fusion and "PReLu-Weight" fusion can also be applied to ReLu-Normalization and PReLu-Normalization fusions.
Another benefit of the present invention is that hardware can be maximally multiplexed into the active-normalized, normalized-active, and active-weighted architectural design.
The embodiments of the present invention have been described in detail with reference to the drawings, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the gist of the present invention. Although the present invention has been described with reference to the preferred embodiments, it should be understood that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (19)

1. A neural network operation processing method, characterized by comprising:
receiving input values of two functions of the operation;
checking the value range of the input value of at least one function;
and determining the execution strategy of the function according to the check result.
2. The neural network operation processing method of claim 1, wherein: the checking of the value range of the input value of the at least one function comprises checking the sign of the input value of the at least one function.
3. The neural network operation processing method of claim 1, wherein: the operation comprises a multiplication operation.
4. The neural network operation processing method of claim 1, wherein determining the execution strategy of the function includes:
if the range of the input value of the checked function is a positive value, one of the two functions is executed in a first mode, otherwise, one of the two functions is executed in a second mode.
5. The neural network operation processing method of claim 4, wherein the performing one of the two functions in the first mode comprises: a multiplication operation is performed.
6. The neural network operation processing method of claim 4, wherein the performing one of the two functions in the second mode includes: and assigning the output of one of the two functions as a set constant value.
7. The neural network arithmetic processing method of claim 1, wherein the method further comprises: predicting the range of the operation result by checking the value range of the input value of at least one function;
if the range of the predicted operation result is a positive value, the function is executed, otherwise, the function is not executed.
8. The neural network operation processing method of claim 1, wherein the determining an execution strategy of the function specifically includes:
if the value range of the input value of the checked function indicates that the input value of the checked function is a negative value, the function is not executed;
and executing the function if the value range of the input value of the checked function indicates that the input value of the checked function is a positive value.
9. The neural network operation processing method of claim 1, wherein the determining of the execution strategy of the function includes:
if the input value range of the checked first function is a positive value, selecting a first coefficient as an input parameter of a second function;
and if the input value range of the checked first function is a negative value, selecting a second coefficient as the input parameter of the second function.
10. The neural network operation processing method of claim 9, wherein the execution of the function includes multiplying the input value of the checked function by the selected coefficient.
11. The neural network operation processing method of claim 1, wherein the execution of the operation further comprises:
if the value range of the input value of the checked first function is a positive value, selecting a first coefficient as an input and multiplying the input value of the first function by the selected first coefficient;
if the range of the input value of the checked first function is negative, selecting a first coefficient and a second coefficient as inputs and multiplying the input value of the first function by the selected first coefficient and the selected second coefficient.
12. The neural network arithmetic processing method of claim 2, wherein: and checking the value range of the input value of at least one function, wherein the value range comprises the sign information or the sign bit of the input value of the function.
13. The neural network operation processing method according to claim 7 or 8, wherein: and if the operation is not executed, the constant result is 0.
14. The neural network operation processing method of claim 3, wherein the multiplication operation is implemented as a multiplier, a shifter, a series of adders and shifters, or a series of AND gates and adders.
15. The neural network arithmetic processing method of claim 1, wherein the two functions are a normalization function and an activation function, respectively.
16. The neural network arithmetic processing method of claim 15, wherein the normalization function includes: a layer normalization function, an instance normalization function, a group normalization function, or a switchable normalization function.
17. The neural network arithmetic processing method of claim 15, wherein the activation function includes a ReLu function, a prilu function, a ReLu6 function, or a ReLuN function.
18. A neural network arithmetic processing device, comprising a memory, a processor and a computer program stored on the memory and operable on the processor, the processor including a plurality of multipliers, wherein the processor implements the following steps when executing the computer program:
receiving input values of two functions of the operation;
checking a value range of an input value of at least one of the functions;
and determining the execution strategy of the function according to the check result.
19. A computer-readable medium having non-volatile program code executable by a processor, the program code causing the processor to perform the method of any of claims 1 to 17.
CN202011574026.9A 2020-12-25 2020-12-25 Neural network operation processing method and device and computer readable medium Pending CN112862086A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011574026.9A CN112862086A (en) 2020-12-25 2020-12-25 Neural network operation processing method and device and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011574026.9A CN112862086A (en) 2020-12-25 2020-12-25 Neural network operation processing method and device and computer readable medium

Publications (1)

Publication Number Publication Date
CN112862086A true CN112862086A (en) 2021-05-28

Family

ID=75997412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011574026.9A Pending CN112862086A (en) 2020-12-25 2020-12-25 Neural network operation processing method and device and computer readable medium

Country Status (1)

Country Link
CN (1) CN112862086A (en)

Similar Documents

Publication Publication Date Title
CN107608715B (en) Apparatus and method for performing artificial neural network forward operations
CN106951962B (en) Complex arithmetic unit, method and electronic device for neural network
US20240211252A1 (en) Computer processor for higher precision computations using a mixed-precision decomposition of operations
CN111213125B (en) Efficient direct convolution using SIMD instructions
CN108701250B (en) Data fixed-point method and device
US20210264273A1 (en) Neural network processor
EP3719639B1 (en) Systems and methods to perform floating-point addition with selected rounding
CN111915001B (en) Convolution calculation engine, artificial intelligent chip and data processing method
KR20080089313A (en) Method and apparatus for performing multiplicative functions
US10579338B2 (en) Apparatus and method for processing input operand values
CN113853601A (en) Apparatus and method for matrix operation
Li et al. Accelerating binarized neural networks via bit-tensor-cores in turing gpus
US20050172210A1 (en) Add-compare-select accelerator using pre-compare-select-add operation
US20140207838A1 (en) Method, apparatus and system for execution of a vector calculation instruction
US20230161555A1 (en) System and method performing floating-point operations
CN116795324A (en) Mixed precision floating-point multiplication device and mixed precision floating-point number processing method
CN112988110A (en) Floating point processing device and data processing method
WO2020161458A1 (en) Encoding special value in anchored-data element
CN112862086A (en) Neural network operation processing method and device and computer readable medium
CN111459548A (en) Dual load instruction
CN115713104A (en) Data processing circuit for neural network, neural network circuit and processor
CN115344826A (en) Computing device, operating method, and machine-readable storage medium
US6615228B1 (en) Selection based rounding system and method for floating point operations
Kageyama et al. Implementation of Floating‐Point Arithmetic Processing on Content Addressable Memory‐Based Massive‐Parallel SIMD matriX Core
US8180822B2 (en) Method and system for processing the booth encoding 33RD term

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination