CN110222815B - Configurable activation function device and method suitable for deep learning hardware accelerator - Google Patents

Configurable activation function device and method suitable for deep learning hardware accelerator Download PDF

Info

Publication number
CN110222815B
CN110222815B CN201910344947.7A CN201910344947A CN110222815B CN 110222815 B CN110222815 B CN 110222815B CN 201910344947 A CN201910344947 A CN 201910344947A CN 110222815 B CN110222815 B CN 110222815B
Authority
CN
China
Prior art keywords
unit
activation function
arithmetic unit
input
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910344947.7A
Other languages
Chinese (zh)
Other versions
CN110222815A (en
Inventor
沈沙
沈松剑
李毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Kuxin Microelectronics Co ltd
Original Assignee
Shanghai Artosyn Microelectronic Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Artosyn Microelectronic Co ltd filed Critical Shanghai Artosyn Microelectronic Co ltd
Priority to CN201910344947.7A priority Critical patent/CN110222815B/en
Publication of CN110222815A publication Critical patent/CN110222815A/en
Application granted granted Critical
Publication of CN110222815B publication Critical patent/CN110222815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Complex Calculations (AREA)
  • Advance Control (AREA)

Abstract

The invention provides a configurable activating function device and a method suitable for a deep learning hardware accelerator, wherein the configurable activating function device comprises the following steps: a first arithmetic unit: the input end is connected with a symbol integer input data source to obtain symbol integer input data; a multiplexer: the two input ends are respectively connected with the output end of the first arithmetic unit and the signed integer input data source; a second arithmetic unit: the input end is connected with the output end of the multiplexer; a modified linear unit: the input end is connected with the output end of the second arithmetic unit; a third arithmetic unit: the input end is connected with the output end of the correction linear unit. The invention can support hardware accelerating units of various activating function operations and can also support batch normalization operation. The input and output data are integer data, the highest precision of the input and output data can reach 32 bits, and the precision of the intermediate calculation result can reach 64 bits.

Description

Configurable activation function device and method suitable for deep learning hardware accelerator
Technical Field
The invention relates to the technical field of electronic circuits, in particular to a hardware structure with 64bit precision and a configurable activation function suitable for a deep learning hardware accelerator and an implementation method.
Background
Deep learning is a field in machine learning that is very close to artificial intelligence, and aims to establish a neural network to simulate the process of learning and analysis of the human brain. The main idea of deep learning is to stack a plurality of layers, take the output of a lower layer as the input of a higher layer, and a multilayer sensor with a plurality of hidden layers is the embodiment of a deep learning structure. In this way, deep learning is able to discover a distributed feature representation of data by combining lower-level features to form more abstract, higher-level representation attributes. How to make the precision of the deep learning operation higher is a difficult problem in the presence of a plurality of engineers.
CN109389212A discloses a reconfigurable activation quantization pooling system facing to a low-bit-width convolutional neural network, which is faced to a low-precision (less than or equal to 4bit precision) convolutional network and cannot meet the high-precision requirement of the market.
Disclosure of Invention
In view of the defects in the prior art, the present invention aims to provide a configurable activation function device and method suitable for a deep learning hardware accelerator.
The invention provides a configurable activation function device suitable for a deep learning hardware accelerator, which comprises:
a first arithmetic unit: the input end is connected with a symbol integer input data source to obtain symbol integer input data, and operation is executed according to the operation parameters;
a multiplexer: the two input ends are respectively connected with the output end of the first arithmetic unit and the signed integer input data source, and one input is selected to be transmitted to the output end according to the preset requirement;
a second arithmetic unit: the input end is connected with the output end of the multiplexer, and operation is executed according to the operation parameters;
a modified linear unit: the input end of the second arithmetic unit is connected with the output end of the second arithmetic unit, and the correction linear operation is carried out on the arithmetic result of the second arithmetic unit;
a third arithmetic unit: the input end is connected with the output end of the correction linear unit and executes operation according to the operation parameters.
Preferably, the first arithmetic unit, the second arithmetic unit, and the third arithmetic unit each include:
an adder: adding the input signed integer input data and the operation parameter;
a multiplier: the input end is connected with the output end of the adder and multiplies the input addition result by the operation parameter;
an arithmetic shifter: the input end is connected with the output end of the multiplier, and the input multiplication result is subjected to arithmetic shift.
Preferably, the signed input data provided by the signed input data source is 32bit signed input data.
Preferably, the operation parameters include:
bias parameters: the adder is transmitted to the first arithmetic unit, the second arithmetic unit and the third arithmetic unit, and the bit width is 32 bits;
grade and offset parameters: and the bit width of the multiplier is 64 bits and is transmitted to the first arithmetic unit, the second arithmetic unit and the third arithmetic unit.
Preferably, the bias parameters are stored in a first on-chip SRAM cache, and the slope and bias parameters are stored in a second on-chip SRAM cache.
According to the configurable activating function method suitable for the deep learning hardware accelerator provided by the invention, the configurable activating function device suitable for the deep learning hardware accelerator is provided, and the execution steps comprise:
step 1: selecting a calculation operation according to the type of the current activation function to be calculated;
step 2: if the current activation function needs to accumulate the bias parameters, loading the bias parameters from the outside to the inside of the first on-chip SRAM cache; if the current activation function does not need to accumulate the offset value, filling all the SRAMs in the SRAM cache on the first chip with 0 values;
and step 3: if the calculation operation selected in the step 1 is a correction linear unit, filling all the SRAMs in the second on-chip SRAM cache with 0 values, and then jumping to a step 4, otherwise, loading gradient and bias parameters from the outside to the second on-chip SRAM cache, and then jumping to the step 4;
and 4, step 4: if the current calculation needs to be carried out with batch normalization operation, and the batch normalization operation needs to be completed before the calculation of the activation function, the calculation result of the first operation unit is output to the multiplexer, the multiplexer inputs the calculation result into the second operation unit, then the step 5 is skipped, if the current calculation does not need to be carried out with batch normalization operation, the multiplexer inputs signed integer input data into the second operation unit, and then the step 5 is skipped;
and 5: the second arithmetic unit adds the offset parameter and the output of the multi-path selector, the addition result is multiplied by the gradient and the offset parameter, and the product is output to the correction linear unit through the shift of the arithmetic shifter;
step 6: if the type of the currently calculated activation function is a modified linear unit, the input data is marked as x, and the output data is marked as f (x), the modified linear unit performs the following operations on the signed integer input data x:
Figure BDA0002041993610000031
if the type of the currently calculated activation function is not a modified linear unit, the modified linear unit performs the following operations on the signed integer input data x:
f(x)=x;
and 7: if the current calculation needs to be performed with batch normalization operation, and the batch normalization operation needs to be completed after the activation function calculation, the third operation unit completes addition, multiplication and shift operation required by the batch normalization operation, and the operation result is used as final output data; and if the batch normalization operation is not required in the current calculation, directly taking the output of the modified linear unit as final output data.
Preferably, the calculation operation in step 1 includes:
a modified linear unit, a parameterized modified linear unit, a leakage modified linear unit, an exponential linear unit, an S-shaped activation function and a hyperbolic tangent activation function.
Preferably, in the step 5, the second operation unit makes an addition result of the offset parameter and the output of the multiplexer a sum of 32 bits, and a result of multiplying the addition result by the slope and the offset parameter is a product of 64 bits, and shifts the result by the arithmetic shifter to obtain an output of 32 bits.
Preferably, the first arithmetic unit, the second arithmetic unit, and the third arithmetic unit each include:
an adder: adding the input signed integer input data and the operation parameter;
a multiplier: the input end is connected with the output end of the adder and multiplies the input addition result by the operation parameter;
an arithmetic shifter: the input end is connected with the output end of the multiplier, and the input multiplication result is subjected to arithmetic shift.
Preferably, the bias parameter is 32 bits wide, and the slope and bias parameters are 64 bits wide.
Compared with the prior art, the invention has the following beneficial effects:
the invention can support hardware accelerating units of various activating function operations, and the supported activating function types comprise: modified Linear Unit (ReLu), Parametric modified Linear Unit (Parametric ReLu), leakage modified Linear Unit (leak ReLu), Exponential Linear Unit (ELU), sigmoid activation function (sigmoid), hyperbolic tangent activation function (Tanh). Batch Normalization operations (Batch Normalization) may also be supported. The input and output data are integer data, the highest precision of the input and output data can reach 32 bits, and the precision of the intermediate calculation result can reach 64 bits.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:
FIG. 1 is a schematic structural diagram of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.
As shown in fig. 1, the present embodiment provides a configurable activation function apparatus suitable for a deep learning hardware accelerator, including:
first arithmetic unit 1: the input end is connected with a symbol integer input data source to obtain symbol integer input data, and operation is executed according to the operation parameters;
the multiplexer 2: the two input ends are respectively connected with the output end of the first arithmetic unit and the signed integer input data source, and one input is selected to be transmitted to the output end according to the preset requirement;
second arithmetic unit 3: the input end is connected with the output end of the multiplexer, and operation is executed according to the operation parameters;
the correction linear unit 4: the input end is connected with the output end of the second arithmetic unit, and the correction linear operation is carried out on the arithmetic result of the second arithmetic unit;
third arithmetic unit 5: the input end is connected with the output end of the correction linear unit and executes operation according to the operation parameters.
In this embodiment, the first operation unit, the second operation unit, and the third operation unit each include:
the adder 6: adding the input signed integer input data and the operation parameters, inputting 2 32-bit signed integer data, finishing the operation of adding two 32-bit signed integers, and outputting 32-bit signed integer data;
the multiplier 7: the input end is connected with the output end of the adder, the input addition result and the operation parameter are multiplied, 2 32-bit signed integer data are input, and 64-bit signed integer data are output;
arithmetic shifter 8: the input end is connected with the output end of the multiplier, and the input multiplication result is subjected to arithmetic shift, wherein the input is a 64-bit signed integer, and the output is a 32-bit signed integer.
In fig. 1, X is 32-bit signed input data and Y is 32-bit signed output data, and the data stream moves in the direction of the arrows in the figure.
The operation parameters comprise:
bias parameters: the data are stored in an SRAM cache on the first chip and transmitted to adders of a first arithmetic unit, a second arithmetic unit and a third arithmetic unit, the bit width is 32 bits, and the depth is 1024;
grade and offset parameters: and the bit width is 64 bits, and the depth is 64.
The working principle of the invention is as follows:
step 1: according to the type of the current activation function to be calculated, the following calculation operation is selected:
a) a modified linear unit (ReLu);
b) parameterized modified linear units (parametrical ReLu, preelu);
c) a leakage corrected linear unit (leak ReLu);
d) exponential Linear Unit (ELU);
e) a sigmoid activation function (sigmoid);
f) hyperbolic tangent activation function (Tanh);
step 2: if the current activation function needs to accumulate the bias parameters, loading the bias parameters from the outside to the inside of the first on-chip SRAM cache; if the current activation function does not need to accumulate the offset value, filling all the SRAMs in the SRAM cache on the first chip with 0 values;
and step 3: if the calculation operation selected in the step 1 is a correction linear unit, filling all the SRAMs in the second on-chip SRAM cache with 0 values, and then jumping to a step 4, otherwise, loading gradient and bias parameters from the outside to the second on-chip SRAM cache, and then jumping to the step 4;
and 4, step 4: if the current calculation needs to be carried out with batch normalization operation, and the batch normalization operation needs to be completed before the calculation of the activation function, the calculation result of the first operation unit is output to the multiplexer, the multiplexer inputs the calculation result into the second operation unit, then the step 5 is skipped, if the current calculation does not need to be carried out with batch normalization operation, the multiplexer inputs signed integer input data into the second operation unit, and then the step 5 is skipped;
and 5: the second arithmetic unit adds the offset parameter and the output of the multi-path selector, the addition result is multiplied by the gradient and the offset parameter, and the product is output to the correction linear unit through the shift of the arithmetic shifter;
step 6: if the type of the currently calculated activation function is a modified linear unit, the input data is marked as x, and the output data is marked as f (x), the modified linear unit performs the following operations on the signed integer input data x:
Figure BDA0002041993610000061
if the type of the currently calculated activation function is not a modified linear unit, the modified linear unit performs the following operations on the signed integer input data x:
f(x)=x;
and 7: if the current calculation needs to be performed with batch normalization operation, and the batch normalization operation needs to be completed after the activation function calculation, the third operation unit completes addition, multiplication and shift operation required by the batch normalization operation, and the operation result is used as final output data; and if the batch normalization operation is not required in the current calculation, directly taking the output of the modified linear unit as final output data.
After the 7 steps, the calculation processes of the activation function and the batch normalization operation are completed, the input data is X, and the output data is Y.
The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims (9)

1. A configurable activation function apparatus adapted for use in a deep learning hardware accelerator, comprising:
a first arithmetic unit: the input end is connected with a symbol integer input data source to obtain symbol integer input data, and operation is executed according to the operation parameters;
a multiplexer: the two input ends are respectively connected with the output end of the first arithmetic unit and the signed integer input data source, and one input is selected to be transmitted to the output end according to the preset requirement;
a second arithmetic unit: the input end is connected with the output end of the multiplexer, and operation is executed according to the operation parameters;
a modified linear unit: the input end of the second arithmetic unit is connected with the output end of the second arithmetic unit, and the correction linear operation is carried out on the arithmetic result of the second arithmetic unit;
a third arithmetic unit: the input end is connected with the output end of the correction linear unit and executes operation according to the operation parameters;
the operation parameters comprise a first operation parameter and a second operation parameter;
the first arithmetic unit, the second arithmetic unit, and the third arithmetic unit each include:
an adder: adding the input signed integer input data and a first operation parameter;
a multiplier: the input end is connected with the output end of the adder and multiplies the input addition result by the second operation parameter;
an arithmetic shifter: the input end is connected with the output end of the multiplier, and the input multiplication result is subjected to arithmetic shift.
2. The configurable activation function device of claim 1, wherein the source of signed integer input data provides signed integer input data that is 32-bit signed integer input data.
3. The configurable activation function device for a deep learning hardware accelerator of claim 1, wherein the first operation parameter comprises a first bias parameter: the adder is transmitted to the first arithmetic unit, the second arithmetic unit and the third arithmetic unit, and the bit width is 32 bits;
the second operation parameter comprises a gradient and a second offset parameter: and the bit width of the multiplier is 64 bits and is transmitted to the first arithmetic unit, the second arithmetic unit and the third arithmetic unit.
4. The configurable activation function device for a deep-learning hardware accelerator of claim 3, wherein the first bias parameter is stored in a first on-chip SRAM cache, and the slope and the second bias parameter are stored in a second on-chip SRAM cache.
5. A configurable activation function method suitable for a deep learning hardware accelerator, wherein, with the configurable activation function device suitable for a deep learning hardware accelerator of claim 1, the execution steps include:
step 1: selecting a calculation operation according to the type of the current activation function to be calculated;
step 2: if the current activation function needs to accumulate the first bias parameter, loading the first bias parameter from the outside to the inside of the SRAM cache on the first chip; if the current activation function does not need to accumulate the offset value, filling all the SRAMs in the SRAM cache on the first chip with 0 values;
and step 3: if the calculation operation selected in the step 1 is to correct the linear unit, filling all the SRAMs in the second on-chip SRAM cache with 0 values, and otherwise, loading the gradient and the second bias parameter from the outside to the second on-chip SRAM cache;
and 4, step 4: if the current calculation needs to be carried out with batch normalization operation, and the batch normalization operation needs to be completed before the calculation of the activation function, the calculation result of the first operation unit is output to the multiplexer, the multiplexer inputs the calculation result into the second operation unit, and if the current calculation does not need to be carried out with batch normalization operation, the multiplexer inputs signed integer input data into the second operation unit;
and 5: the second arithmetic unit adds the first offset parameter and the output of the multi-path selector, the addition result is multiplied by the gradient and the second offset parameter, and the result is output to the correction linear unit through the shift of the arithmetic shifter;
step 6: if the type of the currently calculated activation function is a modified linear unit, the input data is marked as x, and the output data is marked as f (x), the modified linear unit performs the following operations on the signed integer input data x:
Figure FDA0003022519330000021
if the type of the currently calculated activation function is not a modified linear unit, the modified linear unit performs the following operations on the signed integer input data x:
f(x)=x
and 7: if the current calculation needs to be performed with batch normalization operation, and the batch normalization operation needs to be completed after the activation function calculation, the third operation unit completes addition, multiplication and shift operation required by the batch normalization operation, and the operation result is used as final output data; and if the batch normalization operation is not required in the current calculation, directly taking the output of the modified linear unit as final output data.
6. The method of claim 5, wherein the computing operation in step 1 comprises:
a modified linear unit, a parameterized modified linear unit, a leakage modified linear unit, an exponential linear unit, an S-shaped activation function and a hyperbolic tangent activation function.
7. The configurable activation function method for a deep learning hardware accelerator as claimed in claim 5, wherein in step 5, the second operation unit makes the addition result of the first offset parameter and the output of the multiplexer a sum of 32 bits, the result of the multiplication of the addition result and the slope and the second offset parameter is a product of 64 bits, and the result is shifted by the arithmetic shifter to obtain an output of 32 bits.
8. The configurable activation function method for a deep learning hardware accelerator of claim 5, wherein the operational parameters comprise a first operational parameter and a second operational parameter;
the first arithmetic unit, the second arithmetic unit, and the third arithmetic unit each include:
an adder: adding the input signed integer input data and a first operation parameter;
a multiplier: the input end is connected with the output end of the adder and multiplies the input addition result by the second operation parameter;
an arithmetic shifter: the input end is connected with the output end of the multiplier, and the input multiplication result is subjected to arithmetic shift.
9. The configurable activation function method for a deep learning hardware accelerator of claim 5, wherein the bit width of the first bias parameter is 32 bits, and the slope and the bit width of the second bias parameter are 64 bits.
CN201910344947.7A 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator Active CN110222815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910344947.7A CN110222815B (en) 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910344947.7A CN110222815B (en) 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator

Publications (2)

Publication Number Publication Date
CN110222815A CN110222815A (en) 2019-09-10
CN110222815B true CN110222815B (en) 2021-09-07

Family

ID=67819967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910344947.7A Active CN110222815B (en) 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator

Country Status (1)

Country Link
CN (1) CN110222815B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220114413A1 (en) * 2020-10-12 2022-04-14 Black Sesame International Holding Limited Integer-based fused convolutional layer in a convolutional neural network
CN112256094A (en) * 2020-11-13 2021-01-22 广东博通科技服务有限公司 Deep learning-based activation function device and use method thereof
CN114997391B (en) * 2022-08-02 2022-11-29 深圳时识科技有限公司 Leakage method in electronic nervous system, chip and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121796A1 (en) * 2016-11-03 2018-05-03 Intel Corporation Flexible neural network accelerator and methods therefor
CN109102065A (en) * 2018-06-28 2018-12-28 广东工业大学 A kind of convolutional neural networks accelerator based on PSoC
CN109389212A (en) * 2018-12-30 2019-02-26 南京大学 A kind of restructural activation quantization pond system towards low-bit width convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121796A1 (en) * 2016-11-03 2018-05-03 Intel Corporation Flexible neural network accelerator and methods therefor
CN109102065A (en) * 2018-06-28 2018-12-28 广东工业大学 A kind of convolutional neural networks accelerator based on PSoC
CN109389212A (en) * 2018-12-30 2019-02-26 南京大学 A kind of restructural activation quantization pond system towards low-bit width convolutional neural networks

Also Published As

Publication number Publication date
CN110222815A (en) 2019-09-10

Similar Documents

Publication Publication Date Title
CN110222815B (en) Configurable activation function device and method suitable for deep learning hardware accelerator
US11574031B2 (en) Method and electronic device for convolution calculation in neural network
CN105760933A (en) Method and apparatus for fixed-pointing layer-wise variable precision in convolutional neural network
CN105955706A (en) Divider and division operation method
CN110109646A (en) Data processing method, device and adder and multiplier and storage medium
Grover et al. Design of FPGA based 32-bit Floating Point Arithmetic Unit and verification of its VHDL code using MATLAB
CN110765411A (en) Convolution operation data multiplexing device in convolution neural network
CN111428188A (en) Convolution operation method and device
JP4210378B2 (en) Galois field multiplier and Galois field multiplication method
CN112970036B (en) Convolutional block array for implementing neural network applications and methods of use thereof
CN111797985A (en) Convolution operation memory access optimization method based on GPU
CN104951279B (en) A kind of design method of the vectorization Montgomery modular multipliers based on NEON engines
CN115293978A (en) Convolution operation circuit and method, image processing apparatus
US9612800B2 (en) Implementing a square root operation in a computer system
US11494165B2 (en) Arithmetic circuit for performing product-sum arithmetic
US6138134A (en) Computational method and apparatus for finite field multiplication
KR20220114228A (en) Processor, method for operating the same, and electronic device including the same
Sakamoto et al. Efficient methods to generate constant sns with considering trade-off between error and overhead and its evaluation
CN113554163B (en) Convolutional neural network accelerator
CN107066235A (en) Computational methods and device
EP3686735B1 (en) Processor instructions to accelerate fec encoding and decoding
Menard et al. Exploiting reconfigurable SWP operators for multimedia applications
US9678714B2 (en) Check procedure for floating point operations
EP4057131A1 (en) Method of performing hardware efficient unbiased rounding of a number
CN116957018A (en) Method for realizing channel-by-channel convolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder

Address after: Room 501, No.308 Songhu Road, Yangpu District, Shanghai 200082

Patentee after: SHANGHAI ARTOSYN MICROELECTRONIC Co.,Ltd.

Address before: Room 208, 234 Songhu Road, Yangpu District, Shanghai, 200082

Patentee before: SHANGHAI ARTOSYN MICROELECTRONIC Co.,Ltd.

CP02 Change in the address of a patent holder
CP03 Change of name, title or address

Address after: 230088 Building B2, Phase 3, Hefei Innovation Industrial Park, Intersection of Jiangjunling Road and Wanshui Road, High-tech Zone, Hefei City, Anhui Province

Patentee after: Hefei Kuxin Microelectronics Co.,Ltd.

Country or region after: China

Address before: Room 501, No.308 Songhu Road, Yangpu District, Shanghai 200082

Patentee before: SHANGHAI ARTOSYN MICROELECTRONIC Co.,Ltd.

Country or region before: China