CN110222815A - Configurable activation primitive device and method suitable for deep learning hardware accelerator - Google Patents

Configurable activation primitive device and method suitable for deep learning hardware accelerator Download PDF

Info

Publication number
CN110222815A
CN110222815A CN201910344947.7A CN201910344947A CN110222815A CN 110222815 A CN110222815 A CN 110222815A CN 201910344947 A CN201910344947 A CN 201910344947A CN 110222815 A CN110222815 A CN 110222815A
Authority
CN
China
Prior art keywords
arithmetic element
activation primitive
input
linear unit
deep learning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910344947.7A
Other languages
Chinese (zh)
Other versions
CN110222815B (en
Inventor
沈沙
沈松剑
李毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Kuxin Microelectronics Co ltd
Original Assignee
Shanghai Cool Core Microelectronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cool Core Microelectronics Co Ltd filed Critical Shanghai Cool Core Microelectronics Co Ltd
Priority to CN201910344947.7A priority Critical patent/CN110222815B/en
Publication of CN110222815A publication Critical patent/CN110222815A/en
Application granted granted Critical
Publication of CN110222815B publication Critical patent/CN110222815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Advance Control (AREA)
  • Complex Calculations (AREA)

Abstract

The present invention provides a kind of configurable activation primitive device and methods suitable for deep learning hardware accelerator, comprising: the first arithmetic element: input terminal, which is connected with symbol integer input data source and obtains, symbol integer input data;Multiple selector: two input terminals are separately connected the output end of first arithmetic element and described have symbol integer input data source;Second arithmetic element: input terminal connects the output end of the multiple selector;Correct linear unit: input terminal connects the output end of second arithmetic element;Third arithmetic element: input terminal connects the output end of the amendment linear unit.The present invention can support the hardware acceleration unit of a variety of activation primitive operations, while can also support batch normalization operation.Inputoutput data is integer type data, and the full accuracy of inputoutput data is up to 32 bits, and the precision of results of intermediate calculations reaches as high as 64 bits.

Description

Configurable activation primitive device and method suitable for deep learning hardware accelerator
Technical field
The present invention relates to electronic circuit technology fields, and in particular, to the 64bit suitable for deep learning hardware accelerator The hardware configuration and implementation method of precision, configurable activation primitive.
Background technique
Deep learning is the field of a very close artificial intelligence in machine learning, its purpose is to establish a mind The process of study and the analysis of human brain is simulated through network.The main thought of deep learning is exactly to stack multiple layers, by low layer The input as higher level is exported, the multilayer perceptron containing more hidden layers is exactly a kind of embodiment of deep learning structure.Pass through this The mode of sample, deep learning can form more abstract high-rise expression attribute by combination low-level feature, to find number According to distributed nature indicate.And how to make the precision of deep learning operation higher is the problem put in face of many engineers.
CN109389212A discloses a kind of " restructural activation quantization pondization system towards low-bit width convolutional neural networks System ", the invention (are less than, equal to 4bit precision) convolutional network towards low precision, are unable to satisfy the requirements for high precision in market.
Summary of the invention
For the defects in the prior art, the object of the present invention is to provide a kind of suitable for deep learning hardware accelerator Configurable activation primitive device and method.
A kind of configurable activation primitive device suitable for deep learning hardware accelerator provided according to the present invention, packet It includes:
First arithmetic element: input terminal, which is connected with symbol integer input data source and obtains, symbol integer input data, and Operation is executed according to operational parameter;
Multiple selector: two input terminals are separately connected the output end of first arithmetic element and described have symbol whole Type input data source, according to scheduled requirement selection, input is transmitted to output end all the way;
Second arithmetic element: input terminal connects the output end of the multiple selector, and executes operation according to operational parameter;
Correct linear unit: input terminal connects the output end of second arithmetic element, to second arithmetic element Operation result is modified linear operation;
Third arithmetic element: input terminal connects the output end of the amendment linear unit, and executes fortune according to operational parameter It calculates.
Preferably, first arithmetic element, second arithmetic element and the third arithmetic element include:
Adder: to there is symbol integer input data to be added described in input with the operational parameter;
Multiplier: input terminal connects the output end of the adder, addition result and the operational parameter to input It is multiplied;
Count shift unit: input terminal connects the output end of the multiplier, carries out the displacement that counts to the multiplied result of input.
Preferably, the symbol integer input data that has for having symbol integer input data source to provide is that 32bit has symbol Integer input data.
Preferably, the operational parameter includes:
Offset parameter: first arithmetic element, second arithmetic element and the third arithmetic element are transmitted to The adder, bit wide 32bit;
The gradient and offset parameter: first arithmetic element, second arithmetic element and the third operation are transmitted to The multiplier of unit, bit wide 64bit.
Preferably, the offset parameter is stored in the first on piece sram cache, and the gradient and offset parameter are stored in Two on piece sram caches.
A kind of configurable activation primitive method suitable for deep learning hardware accelerator provided according to the present invention, provides The above-mentioned configurable activation primitive device suitable for deep learning hardware accelerator, executing step includes:
Step 1: according to the type of current activation primitive to be calculated, selecting calculating operation;
Step 2: if the currently active function needs cumulative offset parameter, from external load offset parameter to the first on piece Inside sram cache;If the currently active function does not need cumulative bias, the SRAM in the first on piece sram cache is whole Fill 0 value;
Step 3: if the calculating operation selected in step 1 is amendment linear unit, in the second on piece sram cache SRAM is stuffed entirely with 0 value, then gos to step 4, otherwise slow from the external load gradient and offset parameter to the second on piece SRAM It deposits, then gos to step 4;
Step 4: if current calculate needs to carry out batch normalization operation, and batch normalization operation is needed in activation letter It is completed before several calculating, then the calculated result of the first arithmetic element is exported to multiple selector, and multiple selector will calculate tie again Fruit inputs the second arithmetic element, then branches to step 5, if current calculate does not need to carry out batch normalization operation, multichannel Selector will have symbol integer input data to input the second arithmetic element, then branch to step 5;
Step 5: offset parameter is added by the second arithmetic element with the output of multiple selector, addition result and the gradient It is multiplied with offset parameter, exports by the displacement for the shift unit that counts to amendment linear unit;
Step 6: if the type of the activation primitive currently calculated is amendment linear unit, input data is denoted as x, exports number According to f (x) is denoted as, then linear unit is corrected to there is symbol integer input data x to proceed as follows:
If the type of the activation primitive currently calculated is not amendment linear unit, it is whole to there is symbol to correct linear unit Type input data x is proceeded as follows:
F (x)=x;
Step 7: if current calculate needs to carry out batch normalization operation, and batch normalization operation is needed in activation letter Number is completed after calculating, then addition, multiplication and shifting function needed for third arithmetic element completion batch normalization operation, fortune Result is calculated as final output data;If current calculate does not need to carry out batch normalization operation, linear unit will be corrected Output is directly as final output data.
Preferably, the calculating operation in step 1 includes:
Linear unit, parametrization amendment linear unit, band leakage amendment linear unit, index linear unit, S type is corrected to swash Function and tanh activation primitive living.
Preferably, in the step 5, the second arithmetic element is by the addition result of the output of offset parameter and multiple selector For the sum of a 32bit, the result that addition result is multiplied with the gradient and offset parameter is the product of a 64bit, is moved by counting Position device is shifted to obtain the output of a 32bit.
Preferably, first arithmetic element, second arithmetic element and the third arithmetic element include:
Adder: to there is symbol integer input data to be added described in input with the operational parameter;
Multiplier: input terminal connects the output end of the adder, addition result and the operational parameter to input It is multiplied;
Count shift unit: input terminal connects the output end of the multiplier, carries out the displacement that counts to the multiplied result of input.
Preferably, the bit wide of offset parameter is 32bit, and the bit wide of the gradient and offset parameter is 64bit.
Compared with prior art, the present invention have it is following the utility model has the advantages that
The present invention can support the hardware acceleration unit of a variety of activation primitive operations, and the activation primitive type of support includes: to repair Linear positive unit (Rectified linear unit, ReLu), parametrization amendment linear unit (Parametric ReLu, PReLu), band leakage amendment linear unit (Leaky ReLu), index linear unit (Exponential Linear Unit, ELU), S type activation primitive (sigmoid), tanh activation primitive (Tanh).Batch normalization operation can also be supported simultaneously (Batch Normalization).Inputoutput data is integer type data, and the full accuracy of inputoutput data is up to 32 ratios Spy, and the precision of results of intermediate calculations reaches as high as 64 bits.
Detailed description of the invention
Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon:
Fig. 1 is the structural diagram of the present invention.
Specific embodiment
The present invention is described in detail combined with specific embodiments below.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, several changes and improvements can also be made.These belong to the present invention Protection scope.
As shown in Figure 1, a kind of configurable activation primitive suitable for deep learning hardware accelerator provided in this embodiment Device, comprising:
First arithmetic element 1: input terminal, which is connected with symbol integer input data source and obtains, symbol integer input data, and Operation is executed according to operational parameter;
2: two input terminals of multiple selector are separately connected the output end of the first arithmetic element and have the input of symbol integer Data source, according to scheduled requirement selection, input is transmitted to output end all the way;
Second arithmetic element 3: input terminal connects the output end of multiple selector, and executes operation according to operational parameter;
Correct linear unit 4: input terminal connects the output end of the second arithmetic element, to the operation result of the second arithmetic element It is modified linear operation;
Third arithmetic element 5: the output end of input terminal connection amendment linear unit, and operation is executed according to operational parameter.
In the present embodiment, the first arithmetic element, the second arithmetic element and third arithmetic element include:
Adder 6: symbol integer input data is added with operational parameter input, inputting has for 2 32bit Symbol integer data completes the operation that two 32 signed integers are added, and exporting has symbol integer data for 32;
Multiplier 7: input terminal connects the output end of adder, and the addition result and operational parameter to input carry out phase Multiply, inputting has symbol integer data for 2 32bit, and exporting has symbol integer data for 64bit;
Count shift unit 8: input terminal connects the output end of multiplier, carries out the displacement that counts to the multiplied result of input, defeated Entering is a 64bit signed integer, and output is a 32bit signed integer.
In Fig. 1, X is that 32bit has symbol integer input data, and Y is that 32bit has symbol integer output data, and data flow is pressed It is moved according to arrow direction in figure.
Operational parameter includes:
Offset parameter: being stored in the first on piece sram cache, is transmitted to the first arithmetic element, the second arithmetic element and third The adder of arithmetic element, bit wide 32bit, depth 1024;
The gradient and offset parameter: being stored in the second on piece sram cache, is transmitted to the first arithmetic element, the second arithmetic element With the multiplier of third arithmetic element, bit wide 64bit, depth 64.
Working principle of the present invention is as follows:
Step 1: according to the type of current activation primitive to be calculated, select following calculating operation:
A) linear unit (Rectified linear unit, ReLu) is corrected;
B) parametrization amendment linear unit (Parametric ReLu, PReLu);
C) band leakage amendment linear unit (Leaky ReLu);
D) index linear unit (Exponential Linear Unit, ELU);
E) S type activation primitive (sigmoid);
F) tanh activation primitive (Tanh);
Step 2: if the currently active function needs cumulative offset parameter, from external load offset parameter to the first on piece Inside sram cache;If the currently active function does not need cumulative bias, the SRAM in the first on piece sram cache is whole Fill 0 value;
Step 3: if the calculating operation selected in step 1 is amendment linear unit, in the second on piece sram cache SRAM is stuffed entirely with 0 value, then gos to step 4, otherwise slow from the external load gradient and offset parameter to the second on piece SRAM It deposits, then gos to step 4;
Step 4: if current calculate needs to carry out batch normalization operation, and batch normalization operation is needed in activation letter It is completed before several calculating, then the calculated result of the first arithmetic element is exported to multiple selector, and multiple selector will calculate tie again Fruit inputs the second arithmetic element, then branches to step 5, if current calculate does not need to carry out batch normalization operation, multichannel Selector will have symbol integer input data to input the second arithmetic element, then branch to step 5;
Step 5: offset parameter is added by the second arithmetic element with the output of multiple selector, addition result and the gradient It is multiplied with offset parameter, exports by the displacement for the shift unit that counts to amendment linear unit;
Step 6: if the type of the activation primitive currently calculated is amendment linear unit, input data is denoted as x, exports number According to f (x) is denoted as, then linear unit is corrected to there is symbol integer input data x to proceed as follows:
If the type of the activation primitive currently calculated is not amendment linear unit, it is whole to there is symbol to correct linear unit Type input data x is proceeded as follows:
F (x)=x;
Step 7: if current calculate needs to carry out batch normalization operation, and batch normalization operation is needed in activation letter Number is completed after calculating, then addition, multiplication and shifting function needed for third arithmetic element completion batch normalization operation, fortune Result is calculated as final output data;If current calculate does not need to carry out batch normalization operation, linear unit will be corrected Output is directly as final output data.
By above-mentioned 7 steps, the calculating process of activation primitive and batch normalization operation has all been completed, input data For X, output data Y.
Specific embodiments of the present invention are described above.It is to be appreciated that the invention is not limited to above-mentioned Particular implementation, those skilled in the art can make a variety of changes or modify within the scope of the claims, this not shadow Ring substantive content of the invention.In the absence of conflict, the feature in embodiments herein and embodiment can any phase Mutually combination.

Claims (10)

1. a kind of configurable activation primitive device suitable for deep learning hardware accelerator characterized by comprising
First arithmetic element: input terminal, which is connected with symbol integer input data source and obtains, symbol integer input data, and according to Operational parameter executes operation;
Multiple selector: two input terminals are separately connected the output end of first arithmetic element and described have symbol integer defeated Enter data source, input is transmitted to output end all the way according to scheduled requirement selection;
Second arithmetic element: input terminal connects the output end of the multiple selector, and executes operation according to operational parameter;
Correct linear unit: input terminal connects the output end of second arithmetic element, the operation to second arithmetic element As a result it is modified linear operation;
Third arithmetic element: input terminal connects the output end of the amendment linear unit, and executes operation according to operational parameter.
2. the configurable activation primitive device according to claim 1 suitable for deep learning hardware accelerator, feature It is, first arithmetic element, second arithmetic element and the third arithmetic element include:
Adder: to there is symbol integer input data to be added described in input with the operational parameter;
Multiplier: input terminal connects the output end of the adder, and the addition result and the operational parameter to input carry out It is multiplied;
Count shift unit: input terminal connects the output end of the multiplier, carries out the displacement that counts to the multiplied result of input.
3. the configurable activation primitive device according to claim 1 suitable for deep learning hardware accelerator, feature It is, the symbol integer input data that has for having symbol integer input data source to provide is that 32bit has symbol integer to input number According to.
4. the configurable activation primitive device according to claim 2 suitable for deep learning hardware accelerator, feature It is, the operational parameter includes:
Offset parameter: it is transmitted to the described of first arithmetic element, second arithmetic element and the third arithmetic element Adder, bit wide 32bit;
The gradient and offset parameter: first arithmetic element, second arithmetic element and the third arithmetic element are transmitted to The multiplier, bit wide 64bit.
5. the configurable activation primitive device according to claim 4 suitable for deep learning hardware accelerator, feature It is, the offset parameter is stored in the first on piece sram cache, and the gradient and offset parameter are stored in the second on piece SRAM Caching.
6. a kind of configurable activation primitive method suitable for deep learning hardware accelerator, which is characterized in that wanted using right It is suitable for the configurable activation primitive device of deep learning hardware accelerator described in asking 1, executing step includes:
Step 1: according to the type of current activation primitive to be calculated, selecting calculating operation;
Step 2: if the currently active function needs cumulative offset parameter, from external load offset parameter to the first on piece SRAM Caching is internal;If the currently active function does not need cumulative bias, the SRAM in the first on piece sram cache is stuffed entirely with 0 Value;
Step 3: if SRAM of the calculating operation selected in step 1 to correct linear unit, in the second on piece sram cache It is stuffed entirely with 0 value, then gos to step 4, otherwise from the external load gradient and offset parameter to the second on piece sram cache, so After go to step 4;
Step 4: if current calculate needs to carry out batch normalization operation, and batch normalization operation is needed in activation primitive It is completed before calculating, then the calculated result of the first arithmetic element is exported to multiple selector, and multiple selector is defeated by calculated result again Enter the second arithmetic element, then branch to step 5, if current calculate does not need to carry out batch normalization operation, multi-path choice Device will have symbol integer input data to input the second arithmetic element, then branch to step 5;
Step 5: offset parameter is added by the second arithmetic element with the output of multiple selector, addition result and the gradient and inclined Parameter multiplication is set, is exported by the displacement for the shift unit that counts to amendment linear unit;
Step 6: if the type of the activation primitive currently calculated is amendment linear unit, input data is denoted as x, output data note For f (x), then linear unit is corrected to there is symbol integer input data x to proceed as follows:
If the type of the activation primitive currently calculated is not amendment linear unit, linear unit is corrected to there is symbol integer defeated Enter data x to proceed as follows:
F (x)=x;
Step 7: if current calculate needs to carry out batch normalization operation, and batch normalization operation is needed in activation primitive meter It is completed after calculating, then addition, multiplication and shifting function needed for third arithmetic element completion batch normalization operation, operation knot Fruit is as final output data;If current calculate does not need to carry out batch normalization operation, the output of linear unit will be corrected Directly as final output data.
7. the configurable activation primitive method according to claim 6 suitable for deep learning hardware accelerator, feature It is, the calculating operation in step 1 includes:
It corrects linear unit, parametrization amendment linear unit, band leakage amendment linear unit, index linear unit, S type and activates letter Several and tanh activation primitive.
8. the configurable activation primitive method according to claim 6 suitable for deep learning hardware accelerator, feature It is, in the step 5, the addition result of the output of offset parameter and multiple selector is one by the second arithmetic element The sum of 32bit, the result that addition result is multiplied with the gradient and offset parameter be a 64bit product, through counting shift unit into Row displacement obtains the output of a 32bit.
9. the configurable activation primitive method according to claim 6 suitable for deep learning hardware accelerator, feature It is, first arithmetic element, second arithmetic element and the third arithmetic element include:
Adder: to there is symbol integer input data to be added described in input with the operational parameter;
Multiplier: input terminal connects the output end of the adder, and the addition result and the operational parameter to input carry out It is multiplied;
Count shift unit: input terminal connects the output end of the multiplier, carries out the displacement that counts to the multiplied result of input.
10. the configurable activation primitive method according to claim 6 suitable for deep learning hardware accelerator, feature It is, the bit wide of offset parameter is 32bit, and the bit wide of the gradient and offset parameter is 64bit.
CN201910344947.7A 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator Active CN110222815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910344947.7A CN110222815B (en) 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910344947.7A CN110222815B (en) 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator

Publications (2)

Publication Number Publication Date
CN110222815A true CN110222815A (en) 2019-09-10
CN110222815B CN110222815B (en) 2021-09-07

Family

ID=67819967

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910344947.7A Active CN110222815B (en) 2019-04-26 2019-04-26 Configurable activation function device and method suitable for deep learning hardware accelerator

Country Status (1)

Country Link
CN (1) CN110222815B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112256094A (en) * 2020-11-13 2021-01-22 广东博通科技服务有限公司 Deep learning-based activation function device and use method thereof
CN113420788A (en) * 2020-10-12 2021-09-21 黑芝麻智能科技(上海)有限公司 Integer-based fusion convolution layer in convolutional neural network and fusion convolution method
CN114997391A (en) * 2022-08-02 2022-09-02 深圳时识科技有限公司 Leakage method in electronic nervous system, chip and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121796A1 (en) * 2016-11-03 2018-05-03 Intel Corporation Flexible neural network accelerator and methods therefor
CN109102065A (en) * 2018-06-28 2018-12-28 广东工业大学 A kind of convolutional neural networks accelerator based on PSoC
CN109389212A (en) * 2018-12-30 2019-02-26 南京大学 A kind of restructural activation quantization pond system towards low-bit width convolutional neural networks

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121796A1 (en) * 2016-11-03 2018-05-03 Intel Corporation Flexible neural network accelerator and methods therefor
CN109102065A (en) * 2018-06-28 2018-12-28 广东工业大学 A kind of convolutional neural networks accelerator based on PSoC
CN109389212A (en) * 2018-12-30 2019-02-26 南京大学 A kind of restructural activation quantization pond system towards low-bit width convolutional neural networks

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420788A (en) * 2020-10-12 2021-09-21 黑芝麻智能科技(上海)有限公司 Integer-based fusion convolution layer in convolutional neural network and fusion convolution method
CN112256094A (en) * 2020-11-13 2021-01-22 广东博通科技服务有限公司 Deep learning-based activation function device and use method thereof
CN114997391A (en) * 2022-08-02 2022-09-02 深圳时识科技有限公司 Leakage method in electronic nervous system, chip and electronic equipment
CN114997391B (en) * 2022-08-02 2022-11-29 深圳时识科技有限公司 Leakage method in electronic nervous system, chip and electronic equipment

Also Published As

Publication number Publication date
CN110222815B (en) 2021-09-07

Similar Documents

Publication Publication Date Title
CN111242282B (en) Deep learning model training acceleration method based on end edge cloud cooperation
CN110222815A (en) Configurable activation primitive device and method suitable for deep learning hardware accelerator
CN106780512A (en) The method of segmentation figure picture, using and computing device
CN110447010A (en) Matrix multiplication is executed within hardware
CN109685198A (en) Method and apparatus for quantifying the parameter of neural network
CN109086076A (en) Processing with Neural Network device and its method for executing dot product instruction
CN108701250A (en) Data fixed point method and apparatus
CN107341547A (en) A kind of apparatus and method for being used to perform convolutional neural networks training
CN110084221A (en) A kind of serializing face critical point detection method of the tape relay supervision based on deep learning
CN109543140A (en) A kind of convolutional neural networks accelerator
CN108416436A (en) The method and its system of neural network division are carried out using multi-core processing module
US10534576B2 (en) Optimization apparatus and control method thereof
CN109871949A (en) Convolutional neural networks accelerator and accelerated method
CN107305485A (en) It is a kind of to be used to perform the device and method that multiple floating numbers are added
CN107276938A (en) A kind of digital signal modulation mode recognition methods and device
WO2023019899A1 (en) Real-time pruning method and system for neural network, and neural network accelerator
CN105913118A (en) Artificial neural network hardware implementation device based on probability calculation
CN107202979A (en) Relevant logarithm normal distribution radar clutter real time simulation method and system
CN109325590A (en) For realizing the device for the neural network processor that computational accuracy can be changed
CN108733347A (en) A kind of data processing method and device
CN111008691B (en) Convolutional neural network accelerator architecture with weight and activation value both binarized
CN103345379B (en) A kind of complex multiplier and its implementation
CN108804933A (en) A kind of system conversion method for big data
CN108205518A (en) Obtain device, method and the neural network device of functional value
CN115293978A (en) Convolution operation circuit and method, image processing apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP02 Change in the address of a patent holder
CP02 Change in the address of a patent holder

Address after: Room 501, No.308 Songhu Road, Yangpu District, Shanghai 200082

Patentee after: SHANGHAI ARTOSYN MICROELECTRONIC Co.,Ltd.

Address before: Room 208, 234 Songhu Road, Yangpu District, Shanghai, 200082

Patentee before: SHANGHAI ARTOSYN MICROELECTRONIC Co.,Ltd.

CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 230088 Building B2, Phase 3, Hefei Innovation Industrial Park, Intersection of Jiangjunling Road and Wanshui Road, High-tech Zone, Hefei City, Anhui Province

Patentee after: Hefei Kuxin Microelectronics Co.,Ltd.

Country or region after: China

Address before: Room 501, No.308 Songhu Road, Yangpu District, Shanghai 200082

Patentee before: SHANGHAI ARTOSYN MICROELECTRONIC Co.,Ltd.

Country or region before: China