CN109978143A - It is a kind of based on the stacking-type self-encoding encoder of SIMD framework and coding method - Google Patents

It is a kind of based on the stacking-type self-encoding encoder of SIMD framework and coding method Download PDF

Info

Publication number
CN109978143A
CN109978143A CN201910251530.6A CN201910251530A CN109978143A CN 109978143 A CN109978143 A CN 109978143A CN 201910251530 A CN201910251530 A CN 201910251530A CN 109978143 A CN109978143 A CN 109978143A
Authority
CN
China
Prior art keywords
layer
neural network
weight
sram
biasing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910251530.6A
Other languages
Chinese (zh)
Other versions
CN109978143B (en
Inventor
李丽
马博涵
傅玉祥
张衡
李伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910251530.6A priority Critical patent/CN109978143B/en
Publication of CN109978143A publication Critical patent/CN109978143A/en
Application granted granted Critical
Publication of CN109978143B publication Critical patent/CN109978143B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

It is of the invention based on the stacking-type self-encoding encoder of SIMD framework and coding method, self-encoding encoder includes DMA interface module, ANN Reasoning module and neural metwork training module;DMA interface module mainly passes through the dma mode data that DDR is read in outside piece and is stored on piece SRAM by partitioned mode, and last operation result is write back DDR by dma mode;The reasoning computing module of neural network carries out categorical reasoning operation to new sample using trained weight and biasing;The training module of neural network is mainly responsible for the weight and biasing for successively updating neural network forward from neural network the last layer.The utility model has the advantages that the neural network number of plies supported of self-encoding encoder of the invention there is no limit, therefore the reasoning and training of Large Scale Neural Networks are supported, and it realizes that part calculates the cover of time and memory access time by ping-pong operation, there is good Practical significance and broad application prospect.

Description

It is a kind of based on the stacking-type self-encoding encoder of SIMD framework and coding method
Technical field
The present invention relates to the hardware realization field of intelligent algorithm more particularly to a kind of stacking-types based on SIMD framework Self-encoding encoder and coding method.
Background technique
The development of electronic computer since with nineteen forty-one, technology can create machine intelligence, " artificial intelligence " (Artificial Intelligence) word was proposed in DARTMOUTH association in 1956, since then, was ground The persons of studying carefully have been developed numerous theoretical and principle, the concepts of artificial intelligence and have also been extended therewith.Before 2007, it is limited to algorithm at that time With the factors such as data, to chip, especially strong demand, general cpu chip can provide enough meters to artificial intelligence not yet Calculation ability.Later due to the fast development of HD video and game industry, graphics processor (GPU) chip is achieved rapidly Development.Because GPU has more logical unit for handling data, belong to high parallel organization, in processing graph data and In terms of complicated algorithm advantageously than CPU, and because the model parameter of AI deep learning is more, data scale is big, computationally intensive, this GPU becomes the mainstream of AI chip at that time instead of CPU in a period of time afterwards.Under the huge tide of artificial intelligence, also have very much Manufacturer's handling machine learning algorithm uses field programmable gate array (FPGA), and FPGA is high by its flexibility, in industry Internet and industrial robot apparatus field have huge developing market.In addition to two kinds of intelligent algorithms of GPU and FPGA add Fast chip, Google are proposed a application specific processor TPU for the design of specific intelligent algorithm, and chip area is with respect to FPGA Smaller with GPU, power consumption is also lower.
Communication network is the basis of artificial intelligence outburst, and with the arriving of 5G communication era, all things on earth interconnection will generate magnanimity Data, large-scale neural network needs powerful calculation power.As a kind of important neural network algorithm, stacking-type encodes certainly Algorithm has a wide range of applications in plurality of application scenes such as recognition of face, geography information mappings.The present invention is based on a restructural Intelligence accelerates core, proposes a kind of stacking-type of SIMD framework from the hardware realization of encryption algorithm, some hard with GPU, FPGA etc. Part accelerated mode is compared, and the implementation resource utilization is high, and hardware realization speed is fast.As the typical case in intelligent algorithm Algorithm, the implementation method have good reference and broad application prospect.
Summary of the invention
Present invention aims to overcome that above-mentioned the deficiencies in the prior art, are effectively reduced the training time of neural network, sufficiently Using storage resource, accelerate trained and reasoning calculating speed, provides a kind of stacking-type based on SIMD framework from encoding Device and coding method, are specifically realized by the following technical scheme:
The stacking-type self-encoding encoder based on SIMD framework includes: based on neural network
DMA interface module is stored on piece SRAM by partitioned mode by the dma mode data that DDR is read in outside piece, and will Last operation result writes back DDR by dma mode;
ANN Reasoning module, trained weight and biasing carry out categorical reasoning fortune to new sample for use It calculates;Neural metwork training module, will be after training sample propagated forward according to gradient descent algorithm;From the last layer of neural network Backpropagation updates the weight and biasing of neural network.
The stacking-type self-encoding encoder based on SIMD framework it is further design be, the storage of every layer of neural network SRAM, which contains, 4N source data storage bank, then the SRAM is divided into four parts, and there is N number of bank in each part, It is respectively as follows:
The first part of SRAM, storage input xj
The second part and Part III of SRAM stores weight Wij
The Part IV of SRAM stores the calculated result of every layer of neural network.
Constant memory, storage biasing bi
According to the above-mentioned stacking-type self-encoding encoder based on SIMD framework, it is self-editing to provide a kind of stacking-type based on SIMD framework Code method, this method includes algorithm reasoning process and algorithm training process, and algorithm reasoning process includes:
Step 1-1) initialization all neurons of first layer input xj, biasing bi, first neuron of first layer and mind All interneuronal weight W through the network second layerij
Step 1-2) output of first neuron of second layer neural network, the meter multiplied accumulating are calculated according to formula (1) Calculation process is completed by the structure of the parallel multiply-add tree in 32 tunnelsIt calculates, after the completion of calculating, by second neuron Weight WijMove in the Part III of SRAM;
H in formula (1)iIndicate the calculated result of every layer of neural network, aiWhat is indicated is that weight multiplies accumulating and h with what is inputtedis What () indicated is sigmoid activation primitive;
Step 1-3) progress ping-pong operation moves in weight, the output calculating of the completion neural network second layer, and ties calculating The Part IV of fruit deposit SRAM;
Step 1-4) by the input of the neural network second layer exported as third layer, calculate the defeated of neural network third layer Out, the first part of covering deposit SRAM.
Step 1-5) according to this access and calculation, obtain neural network the last layer as a result, and by result from It is read in SRAM and writes back DDR according to dma mode;
Algorithm training process includes propagated forward and backpropagation, and the propagated forward includes the following steps:
Step 2-1-1) initialization first layer input xjAnd biasing bi, the weight W of first neuron of first layerij
Step 2-1-2) basisAnd hi=s (ai) calculate first neuron of the second layer Output, which is completed by the structure of the parallel multiply-add tree in 32 tunnelsIt calculates, has been calculated Cheng Hou, by the weight W of second neuronijIt moves in the Part III of SRAM, calculates the output result of second neuron;
Step 2-1-3) use ping-pong operation to move in weight, the output of 512 neurons of the neural network second layer is calculated It completes, is stored in the Part IV of SRAM, and data are write back into DDR according to dma mode;
Step 2-1-4) by the input of the neural network second layer exported as third layer, calculate neural network third layer Output, the first part of covering deposit SRAM;
Step 2-1-5) complete above-mentioned steps, obtain neural network the last layer as a result, and result is read from SRAM It takes and writes back DDR according to dma mode;
In the backpropagation, label data is defined as Std, delta is defined as delta, specifically includes following step It is rapid:
Step 2-2-1) from DDR according to dma mode neural network label data Std is read in, and calculate resulting nerve net Network the last layer data subtract each other to obtain the error delta of neural network the last layer;
Step 2-2-2) the transposition weight of neural network layer second from the bottom is rattled according to dma mode and reads in each neuron Weight Wji, by weight WjiIt is stored in the second part and Part III of SRAM, biasing and weight are updated according to formula (2), until most The weight of later layer and biasing are completed to update;
Covering is stored in the part of the SRAM where former weight and biasing after the completion of updating, and will update the biasing and power finished DDR is written according to dma mode in weight;
Step 2-2-3) the delta delta that calculates preceding layer in the same way, it calculates and updates weight and biasing, The biasing finished and weight will be updated, DDR is written according to dma mode;
Step 2-2-4) successively to previous Es-region propagations, all layers of neural network of weight and biasing are updated, and write back DDR completes the primary training of neural network.
The stacking-type based on SIMD framework from coding method it is further design be, the step 1-5) if mind It is odd-level through the total number of plies of network, then reads the result of the last layer from the first part of SRAM;If the total layer of neural network Number is even level, then the result of the last layer is read from the Part IV of SRAM.Advantages of the present invention is as follows:
There is no limit for the neural network number of plies that stacking-type self-encoding encoder based on SIMD framework of the invention is supported, therefore props up It holds the reasoning and training of Large Scale Neural Networks, and realizes that part calculates covering for time and memory access time by ping-pong operation Lid, there is good Practical significance and broad application prospect.
Detailed description of the invention
Fig. 1 is stacking-type single self-encoding encoder schematic diagram from encryption algorithm.
Fig. 2 is that multiple single self-encoding encoders stack the schematic diagram for becoming self-encoding encoder entirety.
Fig. 3 is flow chart of the stacking-type based on SIMD framework from coding method.
Fig. 4 is that stacking-type calculates from encryption algorithm reasoning part and training part propagated forward part and realizes schematic diagram.
Fig. 5 is stacking-type from encryption algorithm storage mode schematic diagram.
Specific embodiment
Below in conjunction with attached drawing, technical solution of the present invention is described in detail.
As shown in Figure 1, being divided into input layer, hidden layer, output layer, multiple single encodes the self-encoding encoder of the present embodiment certainly To form stacking-type self-encoding encoder as shown in Figure 2 after device storehouse, stacking-type self-encoding encoder by one layer of input, multilayer hidden layer and Whether one layer of output layer composition, finally need Softmax classifier to define according to actual needs.
The self-encoding encoder is mainly made of DMA interface module, ANN Reasoning module and neural metwork training module. The present invention is by Pingpang Memory to every layer of operation result of neural network and to every layer of neural network each neuron weight Pingpang Memory maximally utilizes resource, while carrying out data carrying, conformity calculation knot according to the subregion of SRAM Fruit improves algorithm arithmetic speed.
It is described in detail, and built a based on SystemC language with one embodiment of the present of invention realization below Cycle accurate system integration project model verified.Neural network shares 7 layers, every layer from front to back of neural network in embodiment Neuron number be respectively as follows: 1024,512,256,128,256,512,1024, input, weight, biasing of neural network etc. Data are 32 floating numbers of IEEE754 standard, if with 4PE (Processing Element, wherein containing 4 complex multiplication Musical instruments used in a Buddhist or Taoist mass, 4 complex adders, 1 real add musical instruments used in a Buddhist or Taoist mass, 1 real multipliers, 1 one surmount function) it is (right for computing array 32 bank are answered, each bank depth hypothesis is set as 4K, and bank bit wide is 64), then a bank a address stores 2 Source data.Technical solution of the present invention will be further introduced with this embodiment and in conjunction with attached drawing below.
Hardware algorithm implementation flow chart needs the weight first by all layers and interlayer as shown in figure 3, before the algorithm starts It is stored in DDR, is used in order to which training updates weight, training is as follows with reasoning process detailed step after transposition:
The reasoning link process of stacking-type from encryption algorithm is as follows:
S1: the input x of 1024 neurons of first layer is initializedj, biasing bi, first neuron of first layer and nerve The interneuronal weight W of 512 of the network second layerij, as shown in figure 5, x will be inputtedjIt is stored in the 0-7 bank, weight It is stored in 8-15 bank, biases biIt is stored in constant storage.
S2: according toAnd hi=s (ai) output of first neuron of the second layer is calculated, The calculating process entirety hardware structure multiplied accumulating by the structure of the parallel multiply-add tree in 32 tunnels as shown in figure 4, completedIt calculates.After the completion of calculating, by the weight W of second neuronijMove in Part III bank_3.
S3: table tennis moves in weight, and the output of the neural network second layer is calculated and is completed.It is stored in the Part IV of SRAM bank_4。
S4: input of the output of the neural network second layer as third layer calculates the output of neural network third layer, covering It is stored in the first part bank_1 of SRAM.
S5: according to this access and calculation, obtain neural network the last layer as a result, and by result from SRAM It reads and writes back DDR (if the total number of plies of neural network is odd-level, from the first part bank_1 of SRAM according to dma mode It reads;If the total number of plies of neural network is even level, read from the Part IV bank_4 of SRAM).
The training link process of stacking-type from encryption algorithm is as follows:
Algorithm training link is divided into propagated forward and backpropagation, and propagated forward is not uniquely both with algorithm reasoning link The calculated result by every layer is needed to write back DDR by dma mode, to use for backpropagation, backpropagation uses ladder Spend descent algorithm.
Propagated forward:
S1: the input x of first layer is initializedjAnd biasing bi, the weight W of first neuron of first layerij
S2: according toAnd hi=s (ai) output of first neuron of the second layer is calculated, it should The calculating process entirety hardware structure multiplied accumulating by the structure of the parallel multiply-add tree in 32 tunnels as shown in figure 4, completedIt calculates.After the completion of calculating, by the weight W of second neuronijIt moves in Part III bank_3, calculates The output result of second neuron.
S3: table tennis moves in weight, and the output of 512 neurons of the neural network second layer is calculated and is completed.It is stored in SRAM's In Part IV bank_4 i.e. the 24-31 bank, and data are write back into DDR according to dma mode.
S4: input of the output of the neural network second layer as third layer calculates the output of neural network third layer, covering It is stored in first part bank_1 i.e. the 0-7 bank of SRAM.
S5: according to this access and calculation, obtain the 7th layer of neural network the last layer i.e. neural network as a result, And result is read from SRAM and writes back DDR according to dma mode, the total number of plies of this example neural network is 7 layers, be odd-level, then from It is read in bank_1, that is, 0-7 bank of first part of SRAM.
Backpropagation (gradient decline):
Label data is defined as Std, delta is defined as delta.
S6: reading in neural network label data Std from DDR according to dma mode, with calculate resulting neural network last The 7th layer data of layer subtracts each other to obtain the error delta of neural network the last layer.
S7: the transposition weight of neural network layer second from the bottom is read in the weight of each neuron according to dma mode table tennis Wji, it is deposited into the second part bank_2 and Part III bank_3 of SRAM, according to the update method of biasing and weight, is updated The weight and biasing of the last layer.
Covering is stored in the part of the SRAM where former weight and biasing after the completion of updating, and will update the biasing and power finished DDR is written according to dma mode in weight.
S8: calculating the delta delta of preceding layer in the same way, and calculates and update weight and biasing, equally Mode DDR is written.
S9: successively to previous Es-region propagations, all layers of neural network of weight and biasing are updated, and writes back DDR, completes mind Primary training through network.
The present invention by stacking-type from encryption algorithm input and weight be stored in SRAM division different zones in, can The memory access of Lothrus apterus calculates required variable, and by ping-pong operation and the time-sharing multiplex of computing resource, realizes the algorithm Calculating process fast implements, to substantially increase resource utilization and hardware realization speed, therefore the implementation application Prospect is extensive.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with scope of protection of the claims Subject to.

Claims (4)

1. a kind of stacking-type self-encoding encoder based on SIMD framework, be based on neural network it is characterised by comprising:
DMA interface module is stored on piece SRAM by partitioned mode by the dma mode data that DDR is read in outside piece, and will be last Operation result DDR is write back by dma mode;
ANN Reasoning module, trained weight and biasing carry out categorical reasoning operation to new sample for use;Mind It, will be after training sample propagated forward according to gradient descent algorithm through network training module;It is reversed from the last layer of neural network It propagates, updates the weight and biasing of neural network.
2. the stacking-type self-encoding encoder according to claim 1 based on SIMD framework, it is characterised in that every layer of neural network Storage SRAM contain and have 4N source data storage bank, then the SRAM is divided into four parts, each part has N number of Bank is respectively as follows:
The first part of SRAM, storage input xj
The second part and Part III of SRAM stores weight Wij
The Part IV of SRAM stores the calculated result of every layer of neural network.
Constant memory, storage biasing bi
3. if the described in any item stacking-types based on SIMD framework of claim 1-2 are from coding method, it is characterised in that including Algorithm reasoning process and algorithm training process, algorithm reasoning process include:
Step 1-1) initialization all neurons of first layer input xj, biasing bi, first neuron of first layer and nerve net All interneuronal weight W of the network second layerij
Step 1-2) output of first neuron of second layer neural network, the calculating multiplied accumulating are calculated according to formula (1) The structure of the parallel multiply-add tree in 32 tunnel Cheng You is completedIt calculates, after the completion of calculating, by the power of second neuron Weight WijMove in the Part III of SRAM;
H in formula (1)iIndicate the calculated result of every layer of neural network, aiWhat is indicated is that weight multiplies accumulating and h with what is inputtediS () table What is shown is sigmoid activation primitive;
Step 1-3) progress ping-pong operation moves in weight, the output calculating of the completion neural network second layer, and calculated result is deposited Enter the Part IV of SRAM;
Step 1-4) by the input of the neural network second layer exported as third layer, the output of neural network third layer is calculated, The first part of covering deposit SRAM;
Step 1-5) according to this access and calculation, obtain neural network the last layer as a result, and by result from SRAM Middle reading writes back DDR according to dma mode;
Algorithm training process includes propagated forward and backpropagation, and the propagated forward includes the following steps:
Step 2-1-1) initialization first layer input xjAnd biasing bi, the weight W of first neuron of first layerij
Step 2-1-2) basisAnd hi=s (ai) calculate the defeated of first neuron of the second layer Out, which is completed by the structure of the parallel multiply-add tree in 32 tunnelsIt calculates, calculates and complete Afterwards, by the weight W of second neuronijIt moves in the Part III of SRAM, calculates the output result of second neuron;
Step 2-1-3) use ping-pong operation to move in weight, the output of 512 neurons of the neural network second layer is calculated and is completed, It is stored in the Part IV of SRAM, and data are write back into DDR according to dma mode;
Step 2-1-4) by the input of the neural network second layer exported as third layer, calculate the defeated of neural network third layer Out, the first part of covering deposit SRAM;
Step 2-1-5) complete above-mentioned steps, obtain neural network the last layer as a result, and result is read simultaneously from SRAM DDR is write back according to dma mode;
In the backpropagation, label data is defined as Std, delta is defined as delta, specifically comprises the following steps:
Step 2-2-1) from DDR according to dma mode neural network label data Std is read in, and calculate resulting neural network most Latter layer data subtracts each other to obtain the error delta of neural network the last layer;
Step 2-2-2) the transposition weight of neural network layer second from the bottom is rattled according to dma mode and reads in the power of each neuron Weight Wji, by weight WjiIt is stored in the second part and Part III of SRAM, updates biasing and weight according to formula (2), until last The weight of layer and biasing are completed to update;
Covering is stored in the part of the SRAM where former weight and biasing after the completion of updating, will update the biasing finished and weight by DDR is written according to dma mode;
Step 2-2-3) the delta delta that calculates preceding layer in the same way, it calculates and updates weight and biasing, it will It updates the biasing finished and weight and DDR is written according to dma mode;
Step 2-2-4) successively to previous Es-region propagations, all layers of neural network of weight and biasing are updated, and write back DDR, it is complete At the primary training of neural network.
4. the stacking-type according to claim 3 based on SIMD framework is from coding method, it is characterised in that: the step 1- 5) if the total number of plies of neural network is odd-level, the result of the last layer is read from the first part of SRAM;If neural The total number of plies of network is even level, then the result of the last layer is read from the Part IV of SRAM.
CN201910251530.6A 2019-03-29 2019-03-29 Stack type self-encoder based on SIMD architecture and encoding method Active CN109978143B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910251530.6A CN109978143B (en) 2019-03-29 2019-03-29 Stack type self-encoder based on SIMD architecture and encoding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910251530.6A CN109978143B (en) 2019-03-29 2019-03-29 Stack type self-encoder based on SIMD architecture and encoding method

Publications (2)

Publication Number Publication Date
CN109978143A true CN109978143A (en) 2019-07-05
CN109978143B CN109978143B (en) 2023-07-18

Family

ID=67081767

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910251530.6A Active CN109978143B (en) 2019-03-29 2019-03-29 Stack type self-encoder based on SIMD architecture and encoding method

Country Status (1)

Country Link
CN (1) CN109978143B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022013648A1 (en) * 2020-07-13 2022-01-20 International Business Machines Corporation Methods for detecting and monitoring bias in software application using artificial intelligence and devices thereof
CN114202067A (en) * 2021-11-30 2022-03-18 山东产研鲲云人工智能研究院有限公司 Bandwidth optimization method for convolutional neural network accelerator and related equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110138259A1 (en) * 2009-12-03 2011-06-09 Microsoft Corporation High Performance Digital Signal Processing In Software Radios
CN106991477A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of artificial neural network compression-encoding device and method
CN108446766A (en) * 2018-03-21 2018-08-24 北京理工大学 A kind of method of quick trained storehouse own coding deep neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110138259A1 (en) * 2009-12-03 2011-06-09 Microsoft Corporation High Performance Digital Signal Processing In Software Radios
CN106991477A (en) * 2016-01-20 2017-07-28 南京艾溪信息科技有限公司 A kind of artificial neural network compression-encoding device and method
CN108446766A (en) * 2018-03-21 2018-08-24 北京理工大学 A kind of method of quick trained storehouse own coding deep neural network

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022013648A1 (en) * 2020-07-13 2022-01-20 International Business Machines Corporation Methods for detecting and monitoring bias in software application using artificial intelligence and devices thereof
GB2611981A (en) * 2020-07-13 2023-04-19 Ibm Methods for detecting and monitoring bias in software application using artificial intelligence and devices thereof
US11861513B2 (en) 2020-07-13 2024-01-02 International Business Machines Corporation Methods for detecting and monitoring bias in a software application using artificial intelligence and devices thereof
CN114202067A (en) * 2021-11-30 2022-03-18 山东产研鲲云人工智能研究院有限公司 Bandwidth optimization method for convolutional neural network accelerator and related equipment

Also Published As

Publication number Publication date
CN109978143B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN107437096B (en) Image classification method based on parameter efficient depth residual error network model
CN107239824A (en) Apparatus and method for realizing sparse convolution neutral net accelerator
CN107153873A (en) A kind of two-value convolutional neural networks processor and its application method
CN107832843A (en) A kind of information processing method and Related product
CN110073359A (en) Valid data for convolutional neural networks are laid out
CN107203808B (en) A kind of two-value Convole Unit and corresponding two-value convolutional neural networks processor
CN106951395A (en) Towards the parallel convolution operations method and device of compression convolutional neural networks
CN106951962A (en) Compound operation unit, method and electronic equipment for neutral net
CN107169563A (en) Processing system and method applied to two-value weight convolutional network
CN108416422A (en) A kind of convolutional neural networks implementation method and device based on FPGA
CN107578099A (en) Computing device and method
CN108256628A (en) Convolutional neural networks hardware accelerator and its working method based on multicast network-on-chip
CN107209871A (en) Convolution matrix with readjustment is multiplied to the depth tile for depth convolutional neural networks
CN110543939B (en) Hardware acceleration realization device for convolutional neural network backward training based on FPGA
CN108416327A (en) A kind of object detection method, device, computer equipment and readable storage medium storing program for executing
CN107239823A (en) A kind of apparatus and method for realizing sparse neural network
CN107578095A (en) Neural computing device and the processor comprising the computing device
CN107423816A (en) A kind of more computational accuracy Processing with Neural Network method and systems
CN107967516A (en) A kind of acceleration of neutral net based on trace norm constraint and compression method
CN107256424A (en) Three value weight convolutional network processing systems and method
CN108053848A (en) Circuit structure and neural network chip
CN109978143A (en) It is a kind of based on the stacking-type self-encoding encoder of SIMD framework and coding method
CN110163350A (en) A kind of computing device and method
CN112508190A (en) Method, device and equipment for processing structured sparse parameters and storage medium
KR20190089685A (en) Method and apparatus for processing data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant