CN107316078A - Apparatus and method for performing artificial neural network self study computing - Google Patents
Apparatus and method for performing artificial neural network self study computing Download PDFInfo
- Publication number
- CN107316078A CN107316078A CN201610267211.0A CN201610267211A CN107316078A CN 107316078 A CN107316078 A CN 107316078A CN 201610267211 A CN201610267211 A CN 201610267211A CN 107316078 A CN107316078 A CN 107316078A
- Authority
- CN
- China
- Prior art keywords
- mover
- computing module
- vector
- mrow
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Abstract
The invention discloses a kind of apparatus and method for performing artificial neural network self study computing, described device includes the location of instruction, controller unit, data access unit, interconnecting modules, main computing module and multiple from computing module.The present invention can be to the self study pre-training of multilayer neural network according to the training method successively trained, and for each layer network, the present invention is by multiple computing iteration until weight is updated after being less than certain threshold value, and the self study pre-training of the layer network is completed.Each iterative process can be divided into four-stage, and preceding three phases calculate generation single order hidden layer median, single order visible layer median and second order hidden layer median respectively, and last stage then updates weight using the median of preceding three phases.
Description
Technical field
The present invention relates to artificial neural network technology, it is used to perform ANN more particularly to one kind
The apparatus and method of network self study computing.
Background technology
Multi-layer artificial neural network is widely used in pattern-recognition, image procossing, function approximation and excellent
Change calculate etc. field, multilayer artificial network in recent years due to its higher recognition accuracy and preferably
Can concurrency, more and more widely paid close attention to by academia and industrial quarters.
Typical multi-layer artificial neural network training method is backpropagation (BP) algorithm.The method is
The representative type of supervised learning, needs the training sample of substantial amounts of tape label in the training process, but
Cost price needed for the collection of sample is very high.Meanwhile, in the training process of the method, error correction
Signal reduces with the increase for propagating the number of plies, and training easily converges on local minimum and convergence speed
Degree is slower.Therefore, first use fast convergence rate and be not required to the self-learning algorithm pair of tape label training sample
Network parameter pre-training, is then finely adjusted multilayer neural network as one using backpropagation training again
Individual new focus.Wherein, it is particularly important as the self study computing of pre-training.
A kind of known method for supporting multi-layer artificial neural network self study computing is to use general procedure
Device.This method performs universal command come on supporting by using general-purpose register and general utility functions part
State algorithm.One of shortcoming of this method is that the operational performance of single general processor is relatively low, it is impossible to met
The performance requirement of common multi-layer artificial neural network computing.And multiple general processors when performing parallel,
The intercommunication of general processor becomes performance bottleneck again.In addition, general processor is needed many
Layer artificial neural network pre-training computing is decoded into before a queue of computing and access instruction sequence, processor
End decoding brings larger power dissipation overhead
The known method of another support multi-layer artificial neural network pre-training is to use graphics processor
(GPU).This method performs general SIMD by using general-purpose register and general stream processing unit
Instruct to support above-mentioned algorithm.Because GPU is to be specifically used to perform graph image computing and science
The equipment of calculating, not to the special support of multi-layer artificial neural network computing, it is still desirable to substantial amounts of
Front end work decoding could perform multi-layer artificial neural network computing, bring substantial amounts of overhead.
Other GPU only has less upper caching, the model data (weights) of multi-layer artificial neural network
Need to carry outside piece repeatedly, the outer bandwidth of piece becomes main performance bottleneck.In addition, GPU only compared with
Cached on small piece, the model data (weights) of multi-layer artificial neural network needs to remove outside piece repeatedly
Fortune, the outer bandwidth of piece becomes main performance bottleneck, while bringing huge power dissipation overhead.
The content of the invention
To be solved by this invention is that general processor (GPU, CPU) carries out multilayer in the prior art
Neutral net pre-training needs a series of simple operation and memory access computing, front end decoding power dissipation overhead
Larger and existing general processor data memory access expense is big, single general processor operational performance is low
The problems such as.
The present invention proposes a kind of device for being used to perform artificial neural network self study computing, including instruction
Memory cell, controller unit, data access unit, interconnecting modules, main computing module, Yi Jiduo
It is individual from computing module, wherein:The location of instruction, which is used to read in by data access unit, to be instructed
And cache the instruction of reading;The controller unit, which is used to read from the location of instruction, to be instructed, and will
The Instruction decoding is believed into control interconnecting modules, main computing module and from the control of computing module behavior
Number, respective control signal is then distributed to modules;The data access unit is used to access
External address space, completes the loading and storage of data;The interconnecting modules have different topology realization,
It is the multiple from computing module for the input vector of the main computing module to be distributed to, and will be each
Main computing module is returned to after merging from the result of calculation of computing module;The main computing module be used for pair
The median that the interconnecting modules are returned carries out activation primitive, gibbs sampler, and to activation primitive
Biasing renewal;It is described to be used for the dot-product operation of input vector and respective weights matrix from computing module,
The product calculation of respective component scalar sum respective weights matrix in input vector, and weight matrix
Update.
According to the present invention embodiment, the main computing module include arithmetic element, data according to
Rely relation judging unit and memory cell, wherein, the memory cell exists for caching main computing module
The input data and output data used in calculating process, the arithmetic element are used to complete main computing mould
The computing of block;The data dependence relation judging unit is the arithmetic element and read-write memory cell
Port, the read-write uniformity for ensureing data in memory cell.
According to the embodiment of the present invention, the data dependence relation judging unit is used to judge still
Whether there is between the data of the control signal that is not carried out and the control signal during being carrying out according to
The relation of relying, if there is no, it is allowed to this group of control signal is launched immediately, otherwise needs to wait until the group control
This control signal just allows to be sent out after the completion of all control signals that signal processed is relied on all are performed
Penetrate.
According to the embodiment of the present invention, the data dependence relation judging unit is additionally operable to read
Access evidence is sent to from computing module by interconnecting modules.
It is each described to include arithmetic element, number from computing module according to the embodiment of the present invention
According to dependence judging unit, the first memory cell, the second memory cell and the 3rd memory cell, its
In, the arithmetic element is used to receiving the control signal that controller unit sends and carries out arithmetic logic fortune
Calculate;The data dependence relation judging unit is used to be monitored the read-write operation of buffer unit, with
Ensure that uniformity conflict is not present in the read-write to buffer unit;First memory cell is used to cache god
Input vector and result of calculation through member;Second memory cell is described from computing module for caching
The weight data needed in calculating process;3rd memory cell is used to cache accordingly from computing mould
The weights gradient data that block needs during weights are updated.
The present invention also proposes a kind of method for performing artificial neural network successively self study computing, the people
Artificial neural networks include two layers or more than two layers of multiple neurons, the self study of artificial neural network
Pre-training is using successively training, and for each layer, the pre-training is divided into four-stage:
First stage, input neuron vectorWith weight vector matrixDot-product operation is carried out to obtain
Local induction domain, local induction domain uses gibbs (Gibbs) again after activation primitive nonlinear transformation
Sampling calculating obtains single order hidden layer median
Second stage, first by the transposition of weight vector matrixWith the transposition of single order hidden layer median
Dot-product operation is carried out, it is locally adopted in induction domain using Gibbs again after activation primitive nonlinear transformation
Sample obtains single order visible layer median
Phase III, input single order visible layer medianWith weight vector matrixCarry out dot product fortune
Calculation obtains local induction domain, and local induction domain obtains the second hidden layer after activation primitive nonlinear transformation
Median
Fourth stage, weight is updated according to equation below:
Wherein, it is vectorialFor vector sum weight matrix before progress of above-mentioned first and third stage activation primitive
Dot product part and the biasing added, vectorBiasing when being then second stage;"×" is represented in formula
The multiplication cross of vector, ∈ is then learning rate.
Compared to prior art, the present invention is optimized to the instruction of multilayer neural network pre-training, is handled
The pre-training that device only can complete one layer of neutral net with an instruction learns, and has simplified general processor and has referred to
The front end decoding overheads of order;Meanwhile, the present invention includes main computing module, multiple from computing module
And memory access expense is alleviated in storage on a large amount of distributed pieces, and neutral net pre-training computing can be performed parallel
Without carrying out data memory access outside frequently piece.Sum it up, the power dissipation ratio of performance of the present invention is far above
General processor.
Present invention could apply in following (including but is not limited to) scene:Data processing, robot,
Computer, printer, scanner, phone, tablet personal computer, intelligent terminal, mobile phone, drive recorder,
Navigator, sensor, camera, cloud server, camera, video camera, projecting apparatus, wrist-watch,
Each electronic products such as earphone, mobile storage, wearable device;Aircraft, steamer, vehicle etc. are all kinds of
The vehicles;TV, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light,
All kinds of household electrical appliance such as gas-cooker, lampblack absorber;And including NMR, B ultrasound, electrocardiograph
Etc. all kinds of Medical Devices.
Brief description of the drawings
For a more complete understanding of the present invention and its advantage, referring now to the following description with reference to accompanying drawing,
Wherein:
Fig. 1, which is shown, according to embodiments of the present invention to be used to perform artificial neural network self study pre-training
Device integrally-built example block diagram.
Fig. 2, which is diagrammatically illustrated, according to embodiments of the present invention to be used to perform artificial neural network self study
The H trees of interconnecting modules are realized in the device of pre-training.
Fig. 3, which is shown, according to embodiments of the present invention to be used to perform artificial neural network self study pre-training
Device in main computing module structure example block diagram.
Fig. 4, which is shown, according to embodiments of the present invention to be used to perform artificial neural network self study pre-training
Device in from the example block diagram of computing module structure.
Fig. 5 shows Neural Network Self-learning pre-training process first and third according to embodiments of the present invention
The example block diagram in stage.
Fig. 6 shows Neural Network Self-learning pre-training process second stage according to embodiments of the present invention
Example block diagram.
Fig. 7 shows Neural Network Self-learning pre-training process fourth stage according to embodiments of the present invention
Example flow chart.
Fig. 8 shows monolayer neural networks self study pre-training an iteration according to embodiments of the present invention
Example flow chart.
In all of the figs, identical device, part, unit etc. carry out table using identical reference
Show.
Embodiment
According to reference to accompanying drawing, to the described in detail below of exemplary embodiment of the present, of the invention is other
Aspect, advantage and prominent features will become obvious for those skilled in the art.
In the present invention, term " comprising " and " containing " and its derivative mean including and it is unrestricted;
Term "or" is inclusive, mean and/or.
In this manual, following various embodiments for being used to describe the principle of the invention are explanation, no
The scope of limitation invention should be construed in any way.Referring to the drawings described below is used to help complete
The exemplary embodiment of the invention that foliation solution is limited by claim and its equivalent.Bag described below
A variety of details are included to help to understand, but these details are considered as what is be merely exemplary.Therefore,
It will be appreciated by those of ordinary skill in the art that in the case of without departing substantially from scope and spirit of the present invention,
Embodiment described herein can be made various changes and modifications.In addition, for clarity and brevity
For the sake of, eliminate the description of known function and structure.In addition, through accompanying drawing, same reference numbers are used
In identity function and operation.
The self study pre-training of multi-layer artificial neural network according to embodiments of the present invention, ANN
Network includes two layers or more than two layers of multiple neurons.The self study pre-training of artificial neural network is adopted
With successively training, training is up to last layer since first layer.For each layer, pre-training
It is divided into four-stage:
First stage, input neuron vectorFirst with weight vector matrixDot-product operation is carried out to obtain
Domain is induced to local, local induction domain uses gibbs again after activation primitive nonlinear transformation
(Gibbs) sampling calculating obtains single order hidden layer median
Second stage, first by the transposition of weight vector matrixWith the transposition of single order hidden layer median
Dot-product operation is carried out, it is locally adopted in induction domain using Gibbs again after activation primitive nonlinear transformation
Sample obtains single order visible layer median
Phase III is similar with the first stage, and difference is phase III input in the middle of single order visible layer
ValueCalculate the second hidden layer medianIt is not required to sample by Gibbs before;
Fourth stage, weight is updated according to equation below:
Wherein, it is vectorialFor vector sum weight matrix before progress of above-mentioned first and third stage activation primitive
Dot product part and the biasing added, vectorBiasing when being then second stage;"×" is represented in formula
The multiplication cross of vector, ∈ is then learning rate.
Fig. 1 shows the device for being used to perform artificial neural network self study pre-training according to the present invention
Integrally-built example block diagram.As shown in figure 1, the device includes the location of instruction 1, control
Device unit 2, data access unit 3, interconnecting modules 4, main computing module 5 and multiple from computing module
6.The location of instruction 1, controller unit 2, data access unit 3, interconnecting modules 4, main fortune
Calculate module 5 and can pass through hardware circuit (such as application-specific integrated circuit ASIC) from computing module 6
Realize.
The location of instruction 1 reads in the instruction for instructing and caching reading by data access unit 3.
Controller unit 2 reads instruction from the location of instruction 1, and instruction is translated into and controls other moulds
The control signal of block behavior is simultaneously sent to other modules such as data access unit 3, the and of main computing module 5
From computing module 6 etc..
Data access unit 3 can memory access external address space, directly to inside device each caching
Unit reads and writes data, completes the loading and storage of data.
Fig. 2 diagrammatically illustrates the structure of interconnecting modules 4.Interconnecting modules 4 constitute main computing module 5
And multiple data paths between computing module 6, and with different structures.Interconnection is by multiple
The binary tree path that node is constituted, each node similarly issues the data of upstream at two sections in downstream
Point, the data that two nodes in downstream are returned are merged, and return to the node of upstream.For example,
In the phase process of Neural Network Self-learning computing first and third, the input vector in main computing module 5
Each is sent to from computing module 6 by interconnecting modules 4;After the completion of the calculating process of computing module 6,
After the completion of the calculating process from computing module, the value of the neuron each exported from computing module can be
A complete vector by locally inducing domain to constitute is combined into interconnecting modules step by step, intermediate result is used as
Vector returns to main computing module 5 and carries out activation primitive and carry out Gibbs samplings according to demand.And
During two-stage, the single order hidden layer median vector in main computing module 5Pass through interconnecting modules 4
Each is sent to from computing module 6;After the completion of the calculating process from computing module 6, two, downstream
The vector that node is returned can be summed into a vector in present node and return to upstream node, in
Between result vector return to main computing module 5 and carry out activation primitive and Gibbs samplings.
Fig. 3 is shown in the device for performing artificial neural network pre-training computing according to the present invention
The example block diagram of the structure of main computing module 5.As shown in figure 3, main computing module 5 includes computing list
Member 51, data dependence relation judging unit 52 and memory cell 53.
Memory cell 53 be used to caching the input data that main computing module 5 uses in calculating process and
Output data, arithmetic element 51 completes the various calculation functions of main computing module 5, and data dependence is closed
Be judging unit 52 be the read-write memory cell 53 of arithmetic element 51 port, deposited while ensure that
The read-write uniformity of data in storage unit.Specifically, data dependence relation judging unit 52 judges still
Whether there is between the data of the control signal that is not carried out and the control signal during being carrying out according to
The relation of relying, if there is no, it is allowed to this group of control signal is launched immediately, otherwise needs to wait until the group control
This control signal just allows to be sent out after the completion of all control signals that signal processed is relied on all are performed
Penetrate.For example, all control signals for being sent to data dependence relation unit 52 can all be stored into data according to
Rely in the instruction queue inside relation unit 52, in the queue, the model of the reading data of reading instruction
Clashed if enclosing the write command forward with queue position and writing the scopes of data, the instruction must be waited
It can be performed after being performed to the write command relied on.Meanwhile, data dependence relation judging unit 52
It also is responsible for that data will be read and is sent to by interconnecting modules 4 from computing module, and from computing module 6
Output data is transmitted directly to arithmetic element 51 by interconnecting modules 4.The finger that controller unit 2 is exported
Order is sent to computing unit 51 and data dependence relation judging unit 52, to control its behavior.
Fig. 4 is shown in the device for performing artificial neural network pre-training according to the present invention from fortune
Calculate the example block diagram of the structure of module 6.As shown in figure 4, each include computing list from computing module 6
Member 61, data dependence relation judging unit 62, the first memory cell 63, the second memory cell 64
With the 3rd memory cell 65.
Arithmetic element 61 receives the control signal that sends of controller unit 2 and carries out arithmetic logic computing.
Data dependence relation judging unit 62 is responsible for the read-write operation to buffer unit in calculating process.
Data dependence relation judging unit 62 ensures that uniformity conflict is not present in the read-write to buffer unit.Example
Such as, all control signals for being sent to data dependence relation unit 62 can all be stored into data dependence relation
In instruction queue inside unit 62, in the queue, if the scope of the reading data of reading instruction
Forward write command is write the scopes of data and clashed with queue position, then the instruction must when institute according to
Bad write command can be performed after being performed.
First memory cell 63 caches the input neuron vector in each phase processSingle order hidden layer
MedianSingle order visible layer medianSingle order hidden layer medianAnd the calculating of each stage
Input vector and weight matrix dot product result.
Second memory cell 64 caches the weight data that this needs from computing module 6 in calculating process.
For each from computing module, all it can only store in weight matrix with that should be stored from computing module 6
The corresponding row of scalar data.
3rd memory cell 65 caches the corresponding weights needed from computing module during weights are updated
Gradient data.The weight data that each weights gradient data stored from computing module 6 is stored with it
It is corresponding.
From computing module 6 realize artificial neural network self study pre-training during first three stage pipeline
The renewal of first half and the last stage formula (1) weights.
, will be preceding triphasic by taking the pre-training of artificial neural network depth belief network (DBN) as an example
Weight matrix(or) and input neuron vectorMultiplication can be divided into it is incoherent simultaneously
Row calculates subtask.In first and third stage, each identical input vector is utilized from computing module 6
Value, the corresponding weights of components different with output vector carry out dot product multiplying, respectively obtain output
In vector the corresponding part of different components and, it is repeatedly cumulative after obtain its respective corresponding output component this
A little parts and it is combined into a complete local induction domain vector step by step in interconnecting modules 4.Each from computing
Module 6 only needs to calculate this module correspondence output neuron value and locally induces domain accordingly.No
Same local induction domain component is combined into a complete part and induces domain vector to pass step by step in interconnecting modules 4
It is defeated by main computing module and carries out activation primitive and subsequent sampling.In second stage, each from computing
The single order hidden layer median vector that module 6 calculates inputIn corresponding part scalar sum weight matrixThe product of corresponding row, obtained each output vector is one of final result and treats cumulative portion
Point and, these parts obtain last result with being added two-by-two step by step in interconnecting modules.Each from fortune
Calculate module 6 calculate the local induction domain of output single order visible layer vector part and, all part and
Summation operation is completed in interconnecting modules 4 and obtains last local induction domain.Preceding three phases are calculated and obtained
Median is used to update weight, and output of the main computing module 5 based on preceding three phases computing carries out follow-up
Computing draws weight updated value.In last stage, updated from computing module 5 according to formula (1)
Weight can also be divided into three small steps:
1. each single order hidden layer median vector that input is calculated from computing module 6With input nerve
MemberIn corresponding part scalar product median;
2. each single order hidden layer median vector that input is calculated from computing module 6It is visible with single order
Layer vectorIn corresponding part scalar product, and calculate and the first small stage median vector difference
Value;
3. the product of each difference and learning rate for calculating for the second small stage from computing module 6, is weighed
Weight updated value, afterwards and weightCarry out vector subtraction, the weight after being updated.
It is worth noting that, the above three small stage is only to updating weight one from computing module 6
Example is described, and application person can carry out the fine setting of details, for example, can be by the product in the first small stage
Calculating and the second small stage in product calculating exchange;Or the 3rd small stage can be multiplied by study
Rate advanceed to for the second small stage and even split to the first two small stage.
According to embodiments of the present invention, additionally provide and artificial neural network forward direction fortune is performed in aforementioned means
The instruction set of calculation.Instruction set include CONFIG instructions, COMPUTE instructions, I/O instruction,
NOP instruction, JUMP instructions and MOVE instructions, wherein:
CONFIG instructions configure current layer before every layer of artificial neural networks start and calculate what is needed
Various constants;
The arithmetical logic that COMPUTE instructions complete every layer of artificial neural network is calculated;
I/O instruction is realized to read in from external address space and calculates the input data needed and calculating
Data are stored back to exterior space after;
NOP instruction is responsible for emptying the control letter being currently filled in internal all control signal buffer queues
Number, it is ensured that all instruction all instructions before NOP instruction are finished.NOP instruction does not include in itself
Any operation;
The next IA that controller will be read from the location of instruction is responsible in JUMP instructions
Redirect, for realizing redirecting for controlling stream;
MOVE instructions are responsible for the data of a certain address of device internal address space being carried in device
Another address of portion's address space, the process is not take up fortune in the process of implementation independently of arithmetic element
Calculate the resource of unit.
Fig. 5 shows Neural Network Self-learning pre-training process first and third according to embodiments of the present invention
The example block diagram in stage.In difference from computing module 6, the input vector point that interconnecting modules 4 are broadcasted
Dot-product operation is not carried out from the weight vector of computing module 6 with this, corresponding output neuron value is obtained
Local induction domain part and, all these local induction thresholdings composition intermediate results vectors of output should
Intermediate result vector by plus bias vector and activation computing obtain the final defeated of this layer of neutral net
Go out neuron vector, formula is described as out=f (w*in+b), and wherein out output vectors, in is defeated
Incoming vector, b are bias vectors, and w is weight matrix, and f is activation primitive.Each from computing module 6
Weight vector be in weight matrix with should be from the corresponding column vector of computing module 6.Interconnecting modules 4
By input vector [I0..., In] be sent to all from arithmetic element, it is temporarily stored in the first memory cell.
For i-th from arithmetic element, its corresponding weight vector [W is calculatedi0..., Win] and input vector
Dot product.The result exported from arithmetic element is combined into complete local induction domain vector by interconnecting modules 4
And main arithmetic element 5 is returned to, activation primitive computing is carried out in main arithmetic element 5 and it may
Gibbs sampling, obtain last output vector [O0, O1..., On]。
Fig. 6 shows Neural Network Self-learning pre-training process second stage according to embodiments of the present invention
Example block diagram.Calculate output single order visible layer vectorProcess for interconnecting modules 4 broadcast single order it is hidden
Layer vector value, each takes from computing module 6In corresponding part scalar h0iWith weight matrixCorrespondence
Row [Wi0..., Win] product, obtained each output vector be single order visible layer vector office
Portion's induction one of domain treat cumulative part and, these parts and two two-phase step by step in interconnecting modules 4
Plus obtain last local induction domain.The local induction domain calculated returns to main arithmetic element 5,
Activation primitive computing and its possible Gibbs sampling are carried out in main arithmetic element 5, obtains last
Export single order visible layer vector
Fig. 7 shows Neural Network Self-learning pre-training process fourth stage according to embodiments of the present invention
Flow chart.In last stage, updating weight according to formula (1) from computing module 5 can also be divided into
Three small steps:
1. each single order hidden layer median vector that input is calculated from computing module 6With input nerve
MemberIn the product median of corresponding part scalar cache to the 3rd memory cell shown in Fig. 4;This
The small stage is similar to the block diagram of the second stage shown in Fig. 6, but in its input respectively single order hidden layer
Between be worth vectorWith input neuron
2. each single order hidden layer median vector that input is calculated from computing module 6It is visible with single order
Layer vectorIn corresponding part scalar product, and calculate and the first small stage median vector difference
It is worth and caches to the 3rd memory cell shown in Fig. 4;
3. the product of each difference and learning rate for calculating for the second small stage from computing module 6, is weighed
Weight updated value, afterwards and weightCarry out vector subtraction, the weight after being updated.
It is worth noting that, the above three small stage is only to updating weight one from computing module 6
Example is described, and application person can carry out the fine setting of details, for example, can be by the product in the first small stage
Calculating and the second small stage in product calculating exchange;Or the 3rd small stage can be multiplied by study
Rate advanceed to for the second small stage and even split to the first two small stage.
Fig. 8 shows one layer of artificial neural network self study pre-training computing flow according to one embodiment
Figure, because multi-layer artificial neural network self study pre-training can be by the way of successively training, multilayer
The pre-training of artificial neural network can call the multiple flow to realize.Flow chart description utilizes this hair
Bright device and instruction set realizes a kind of monolayer neural networks self study pre-training computing shown in Fig. 4
Process.
In step S1, an I/O instruction is pre-deposited at the first address of instruction cache unit 1.
In step S2, computing starts, and controller unit 2 is read from the first address of instruction cache unit 1
This I/O instruction, according to the control signal translated, data access unit 3 is read from external address space
Corresponding all artificial neural network operational orders are taken, and are buffered in the location of instruction 1.
In step S3, controller unit 2 then reads in next I/O instruction from the location of instruction,
According to the control signal translated, data access unit 3 reads main computing module 5 from external address space
All data needed (e.g., including input neuron vectorActivation primitive interpolation table, study
Rate and biasing etc.) to main computing module 5 memory cell 53.
In step S4, controller unit 2 then reads in next I/O instruction from the location of instruction,
According to the control signal translated, data access unit 3 is read from computing module 6 from external address space
The weight matrix data needed.
In step S5, controller unit 2 then reads in next CONFIG from the location of instruction
Instruction, according to the control signal translated, device configures what this layer of neutral net first stage calculating needed
Various constants.For example, arithmetic element 51,61 according to the parameter configuration unit in control signal inside
The value of register, precision setting, the data of activation primitive of the parameter for example including this layer of calculating.
In step S6, controller unit 2 then reads in next COMPUTE from the location of instruction
Instruction, according to the control signal translated, starts the calculating of first stage.Main computing module 5 leads to first
Neuron vector will be inputted by crossing interconnecting modules 4Issue each from computing module 6, preserve extremely from computing mould
First memory cell 63 of block 6.From the arithmetic element 61 of computing module 6 from the second memory cell 64
Weight vector (corresponding to the column vector from computing module 6 in weight matrix) is read, from the first storage
Unit reads input neuron vectorComplete weight vector and input neuron vectorDot product fortune
Calculate, intermediate result is returned by interconnecting modules.In interconnecting modules 4, respectively returned from computing module 6
The intermediate result returned is combined into complete local induction domain vector step by step.Main computing module 5 is interconnected
The return value of module 4, the control signal translated is instructed according to COMPUTE, from memory cell 53
Bias vector is read, then the addition of vectors returned with interconnecting modules 4 activates to addition result again,
And Gibbs samplings are carried out, and last single order hidden layer is vectorialIt is written back to memory cell 53.
Then next CONFIG is read in step S7 controller units 2 from the location of instruction to refer to
Order, according to the control signal translated, device configures this layer of neutral net second stage and calculates each of needs
Plant constant.
In step S8, controller unit 2 then reads in next COMPUTE from the location of instruction
Instruction, according to the control signal translated, starts the calculating of second stage.Main computing module 5 leads to first
Interconnecting modules 4 are crossed by single order hidden layer vectorIssue each from computing module 6, preserve extremely from computing module
6 the first memory cell 63.Read from the arithmetic element 61 of computing module 6 from the second memory cell 64
Weight vector (corresponding to the column vector from computing module 6 in weight matrix) is taken, it is single from the first storage
Member chooses single order hidden layer vectorScalar, complete weight vector and single order hidden layer vectorCorresponding mark
The product calculation of amount, intermediate result is returned by interconnecting modules.In interconnecting modules 4, respectively from fortune
The intermediate result for calculating the return of module 6 is summed into complete local induction domain vector step by step.Main computing mould
Block 5 obtains the return value of interconnecting modules 4, and the control signal translated is instructed according to COMPUTE, from
Memory cell 53 reads bias vector, the addition of vectors returned with interconnecting modules 4, then again to phase
Plus result is activated, and Gibbs samplings are carried out, and last single order visible layer is vectorialIt is written back to
Memory cell 53.
In step S9, controller unit 2 then reads in next CONFIG from the location of instruction
Instruction, according to the control signal translated, device configures what this layer of neutral net phase III calculating needed
Various constants.This layer of configuration is basic identical with the first stage, but also needs one learning rate of multi-configuration
Parameter.
In step S10, controller unit 2 then reads in next COMPUTE from the location of instruction
Instruction, according to the control signal translated, starts the calculating of phase III.Main computing module 5 leads to first
Interconnecting modules 4 are crossed by single order hidden layer vectorIssue each from computing module 6, preserve extremely from computing module
6 the first memory cell 63.Single order visible layer vector is read from the first memory cellComplete weights
Vector sum single order visible layer vectorDot-product operation, intermediate result is returned by interconnecting modules.
In interconnecting modules 4, the intermediate result respectively returned from computing module 6 is combined into complete part and lured step by step
Lead domain vector.Main computing module 5 obtains the return value of interconnecting modules 4, is instructed according to COMPUTE
The control signal translated, from memory cell 53 read bias vector, with interconnecting modules 4 return to
Amount is added, and then addition result is activated again, and last single order hidden layer is vectorialIt is written back to and deposits
Storage unit 53.
In step S11, controller unit 2 then reads in next COMPUTE from the location of instruction
Instruction, according to the control signal translated, starts the calculating of fourth stage.First small stage main computing mould
Block 5 will input neuron vector by interconnecting modules 4 firstWith single order hidden layer vectorIssue it is each from
Computing module 6, is preserved to the weight gradient buffer unit 65 from computing module 6.Second small stage from
The arithmetic element 61 of computing module 6 reads single order hidden layer vector from the first memory cellIt is defeated with choosing
Enter neuron vectorCorresponding component, completes single order hidden layer vectorWith corresponding input neuron vectorComponent product calculation, by intermediate result and from weight gradient buffer unit 65 read it is previous
The median of small stage caching carries out vector subtraction computing, and will transport counted intermediate result and cache to power
Weight gradient buffer unit 65.Last small stage is terraced from the arithmetic element 61 of computing module 6 from weight
The median in the degree reading upper small stage of buffer unit 65 is multiplied with learning rate obtains weight updated value,
And from weight buffer unit 64 read corresponding weight and weight updated value and carry out vector subtraction and obtain more
Weight after new, and cached back weight buffer unit 64.In this way, monolayer neural networks are once
Self study pre-training iteration is completed, and is learnt by successive ignition, and weight reaches certain convergence judgment criteria
Then (weight updated value is less than some threshold value) monolayer neural networks pre-training terminates, can started next
The pre-training of layer neutral net.
By using the device and instruction set for performing artificial neural network self study pre-training computing,
Solve CPU and GPU operational performances not enough, the problem of front end decoding overheads are big.Effectively increase
Support to multi-layer artificial neural network forward operation.
Cache, fully excavate by using special for multi-layer artificial neural network forward operation is upper
The reusability of input neuron and weight data, it is to avoid repeatedly read these data to internal memory, drops
Low EMS memory access bandwidth, it is to avoid memory bandwidth turns into multi-layer artificial neural network forward operation performance
The problem of bottleneck.
The process or method described in accompanying drawing above can by including hardware (for example, circuit, specially
With logic etc.), firmware, software is (for example, be embodied in non-transient computer-readable media
Software), or both the processing logic of combination perform.Although being retouched above according to the operation of some orders
Process or method are stated, however, it is to be understood that described some operations can be held with different order
OK.In addition, concurrently rather than certain operations can be sequentially performed.
In foregoing specification, each implementation of the present invention is described with reference to its certain exemplary embodiments
Example.Obviously, various modifications can be made to each embodiment, without departing from the sheet described in appended claims
The wider spirit and scope of invention.Correspondingly, specification and drawings should be considered as illustrative
, rather than it is restricted.
Claims (6)
1. a kind of device for being used to perform artificial neural network self study computing, including instruction storage are single
Member, controller unit, data access unit, interconnecting modules, main computing module and multiple from fortune
Module is calculated, wherein:
The location of instruction is used to read in the finger for instructing and caching reading by data access unit
Order;
The controller unit, which is used to read from the location of instruction, to be instructed, and by the Instruction decoding into control
Interconnecting modules processed, main computing module and the control signal from computing module behavior, then will be respective
Control signal be distributed to modules;
The data access unit is used to access external address space, completes the loading and storage of data;
The interconnecting modules have different topology realization, for by the input vector of the main computing module
It is distributed to the multiple from computing module, and returns after respectively merging from the result of calculation of computing module
To main computing module;
The main computing module is used to carry out activation primitive, Ji to the median that the interconnecting modules are returned
Buss sample, and the biasing to activation primitive renewal;
It is described to be used for the dot-product operation of input vector and respective weights matrix, input vector from computing module
In respective component scalar sum respective weights matrix product calculation, and weight matrix renewal.
2. the device as claimed in claim 1 for being used to perform artificial neural network self study computing,
Characterized in that, the main computing module includes arithmetic element, data dependence relation judging unit and deposited
Storage unit, wherein,
The memory cell be used to caching the input data that main computing module uses in calculating process and
Output data,
The arithmetic element is used for the computing for completing main computing module;
The data dependence relation judging unit is the port of the arithmetic element and read-write memory cell,
Read-write uniformity for ensureing data in memory cell.
3. the device as claimed in claim 2 for being used to perform artificial neural network self study computing,
Characterized in that, the data dependence relation judging unit be used to judging the control signal that has not carried out with
It whether there is dependence between the data of control signal during being carrying out, if it does not,
Allow this group of control signal to launch immediately, otherwise need all controls relied on when this control signal
This group of control signal just allows to be launched after the completion of signal processed is all performed.
4. the device as claimed in claim 3 for being used to perform artificial neural network self study computing,
Characterized in that, the data dependence relation judging unit is additionally operable to reading data passing through interconnecting modules
It is sent to from computing module.
5. the device as claimed in claim 1 for being used to perform artificial neural network self study computing,
Characterized in that, it is each described from computing module include arithmetic element, data dependence relation judging unit,
First memory cell, the second memory cell and the 3rd memory cell, wherein,
The arithmetic element is used to receiving the control signal that controller unit sends and carries out arithmetic logic
Computing;
The data dependence relation judging unit is used to be monitored the read-write operation of memory cell, with
Ensure that uniformity conflict is not present in the read-write to memory cell;
First memory cell is used for the input vector and result of calculation for caching neuron;
Second memory cell is used to cache the power needed in calculating process from computing module
Value Data;
3rd memory cell is used to cache accordingly to be needed from computing module during weights are updated
Weights gradient data.
6. a kind of method for performing artificial neural network successively self study computing, the ANN
Network includes two layers or more than two layers of multiple neurons, and the self study pre-training of artificial neural network is adopted
With successively training, for each layer, the pre-training is divided into four-stage:
First stage, input neuron vectorWith weight vector matrixDot-product operation is carried out to obtain
Local induction domain, local induction domain uses gibbs (Gibbs) again after activation primitive nonlinear transformation
Sampling calculating obtains single order hidden layer median
Second stage, first by the transposition of weight vector matrixWith the transposition of single order hidden layer median
Dot-product operation is carried out, it is locally adopted in induction domain using Gibbs again after activation primitive nonlinear transformation
Sample obtains single order visible layer median
Phase III, input single order visible layer medianWith weight vector matrixCarry out dot product fortune
Calculation obtains local induction domain, and local induction domain obtains the second hidden layer after activation primitive nonlinear transformation
Median
Fourth stage, weight is updated according to equation below:
<mrow>
<mover>
<mi>W</mi>
<mo>&LeftRightArrow;</mo>
</mover>
<mo>&LeftArrow;</mo>
<mover>
<mi>W</mi>
<mo>&LeftRightArrow;</mo>
</mover>
<mo>-</mo>
<mo>&Element;</mo>
<mrow>
<mo>(</mo>
<mover>
<msub>
<mi>h</mi>
<mn>0</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>&times;</mo>
<msup>
<mover>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mi>T</mi>
</msup>
<mo>-</mo>
<mover>
<msub>
<mi>h</mi>
<mn>1</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>&times;</mo>
<msup>
<mover>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mi>T</mi>
</msup>
<mo>)</mo>
</mrow>
<mn>......</mn>
<mrow>
<mo>(</mo>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mover>
<mi>b</mi>
<mo>&RightArrow;</mo>
</mover>
<mo>&LeftArrow;</mo>
<mover>
<mi>b</mi>
<mo>&RightArrow;</mo>
</mover>
<mo>-</mo>
<mo>&Element;</mo>
<mrow>
<mo>(</mo>
<mover>
<msub>
<mi>h</mi>
<mn>0</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>-</mo>
<mover>
<msub>
<mi>h</mi>
<mn>1</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>......</mn>
<mrow>
<mo>(</mo>
<mn>2</mn>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<mover>
<mi>c</mi>
<mo>&RightArrow;</mo>
</mover>
<mo>&LeftArrow;</mo>
<mover>
<mi>c</mi>
<mo>&RightArrow;</mo>
</mover>
<mo>-</mo>
<mo>&Element;</mo>
<mrow>
<mo>(</mo>
<mover>
<msub>
<mi>v</mi>
<mn>0</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>-</mo>
<mover>
<msub>
<mi>v</mi>
<mn>1</mn>
</msub>
<mo>&RightArrow;</mo>
</mover>
<mo>)</mo>
</mrow>
<mn>......</mn>
<mrow>
<mo>(</mo>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
Wherein, it is vectorialFor vector sum weight matrix before progress of above-mentioned first and third stage activation primitive
Dot product part and the biasing added, vectorBiasing when being then second stage;"×" is represented in formula
The multiplication cross of vector, ∈ is then learning rate.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610267211.0A CN107316078B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
CN201910402047.3A CN110188870B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610267211.0A CN107316078B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910402047.3A Division CN110188870B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107316078A true CN107316078A (en) | 2017-11-03 |
CN107316078B CN107316078B (en) | 2021-05-07 |
Family
ID=60185046
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910402047.3A Active CN110188870B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
CN201610267211.0A Active CN107316078B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910402047.3A Active CN110188870B (en) | 2016-04-27 | 2016-04-27 | Apparatus and method for performing artificial neural network self-learning operation |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN110188870B (en) |
Cited By (46)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108108189A (en) * | 2017-12-15 | 2018-06-01 | 北京中科寒武纪科技有限公司 | A kind of computational methods and Related product |
CN108364065A (en) * | 2018-01-19 | 2018-08-03 | 上海兆芯集成电路有限公司 | Adopt the microprocessor of booth multiplication |
CN108710958A (en) * | 2018-05-16 | 2018-10-26 | 北京旋极信息技术股份有限公司 | A kind of prediction health control method and device, computer readable storage medium |
CN108763360A (en) * | 2018-05-16 | 2018-11-06 | 北京旋极信息技术股份有限公司 | A kind of sorting technique and device, computer readable storage medium |
CN108859477A (en) * | 2018-07-05 | 2018-11-23 | 吉林工程技术师范学院 | A kind of children's literature book binder and its control method |
CN109542837A (en) * | 2018-11-30 | 2019-03-29 | 上海寒武纪信息科技有限公司 | Operation method, device and Related product |
CN109754062A (en) * | 2017-11-07 | 2019-05-14 | 上海寒武纪信息科技有限公司 | The execution method and Related product of convolution extended instruction |
CN109784125A (en) * | 2017-11-10 | 2019-05-21 | 福州瑞芯微电子股份有限公司 | Deep learning network processing device, method and image processing unit |
CN109902811A (en) * | 2017-12-11 | 2019-06-18 | 北京中科寒武纪科技有限公司 | Neural network computing device and method |
CN109919313A (en) * | 2019-01-31 | 2019-06-21 | 华为技术有限公司 | A kind of method and distribution training system of gradient transmission |
CN109961136A (en) * | 2017-12-14 | 2019-07-02 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN109978160A (en) * | 2019-03-25 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Configuration device, method and the Related product of artificial intelligence process device |
CN109993289A (en) * | 2017-12-30 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN109993290A (en) * | 2017-12-30 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN110059809A (en) * | 2018-10-10 | 2019-07-26 | 北京中科寒武纪科技有限公司 | A kind of computing device and Related product |
CN110147249A (en) * | 2018-02-12 | 2019-08-20 | 上海寒武纪信息科技有限公司 | A kind of calculation method and device of network model |
CN110163350A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163361A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163349A (en) * | 2018-02-12 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of calculation method and device of network model |
CN110196734A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | A kind of computing device and Related product |
CN110197270A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197271A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197269A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197272A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197268A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197273A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110276447A (en) * | 2018-03-14 | 2019-09-24 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110383300A (en) * | 2018-02-13 | 2019-10-25 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110472734A (en) * | 2018-05-11 | 2019-11-19 | 上海寒武纪信息科技有限公司 | A kind of computing device and Related product |
CN110806903A (en) * | 2018-08-01 | 2020-02-18 | 珠海格力电器股份有限公司 | Configuration parameter determining method and device of electric cooker |
CN111047045A (en) * | 2018-10-12 | 2020-04-21 | 中科寒武纪科技股份有限公司 | Distribution system and method for machine learning operation |
CN111079908A (en) * | 2018-10-18 | 2020-04-28 | 上海寒武纪信息科技有限公司 | Network-on-chip data processing method, storage medium, computer device and apparatus |
CN111178492A (en) * | 2018-11-09 | 2020-05-19 | 中科寒武纪科技股份有限公司 | Computing device, related product and computing method for executing artificial neural network model |
CN111260046A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN111258641A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN111461340A (en) * | 2020-03-10 | 2020-07-28 | 北京百度网讯科技有限公司 | Weight matrix updating method and device and electronic equipment |
CN112329619A (en) * | 2020-11-04 | 2021-02-05 | 济南博观智能科技有限公司 | Face recognition method and device, electronic equipment and readable storage medium |
CN112805727A (en) * | 2018-10-08 | 2021-05-14 | 深爱智能科技有限公司 | Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network |
CN114071781A (en) * | 2021-11-16 | 2022-02-18 | 杭州电子科技大学 | Wireless local area network medium access control method |
US11620130B2 (en) | 2018-02-13 | 2023-04-04 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
US11630666B2 (en) | 2018-02-13 | 2023-04-18 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
US11651202B2 (en) | 2017-12-30 | 2023-05-16 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
US11704544B2 (en) | 2017-12-30 | 2023-07-18 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
US11797467B2 (en) | 2018-10-18 | 2023-10-24 | Shanghai Cambricon Information Technology Co., Ltd. | Data processing device with transmission circuit |
US11847554B2 (en) | 2019-04-18 | 2023-12-19 | Cambricon Technologies Corporation Limited | Data processing method and related products |
CN113807510B (en) * | 2017-12-30 | 2024-05-10 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related products |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111080400B (en) * | 2019-11-25 | 2023-04-18 | 中山大学 | Commodity recommendation method and system based on gate control graph convolution network and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104732274A (en) * | 2015-03-10 | 2015-06-24 | 华南理工大学 | Intelligent computer |
CN104757992A (en) * | 2015-03-16 | 2015-07-08 | 广东工业大学 | Cardiac sound diagnostic system based on depth confidence network and diagnostic method |
CN105117706A (en) * | 2015-08-28 | 2015-12-02 | 小米科技有限责任公司 | Image processing method and apparatus and character recognition method and apparatus |
CN105447569A (en) * | 2015-12-18 | 2016-03-30 | 北京柏惠维康科技有限公司 | Breast cancer cell characteristic analysis system based on deep learning |
CN105488565A (en) * | 2015-11-17 | 2016-04-13 | 中国科学院计算技术研究所 | Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103729678B (en) * | 2013-12-12 | 2016-10-05 | 中国科学院信息工程研究所 | A kind of based on navy detection method and the system of improving DBN model |
CN104157290B (en) * | 2014-08-19 | 2017-10-24 | 大连理工大学 | A kind of method for distinguishing speek person based on deep learning |
CN104182772B (en) * | 2014-08-19 | 2017-10-24 | 大连理工大学 | A kind of gesture identification method based on deep learning |
CN105184366B (en) * | 2015-09-15 | 2018-01-09 | 中国科学院计算技术研究所 | A kind of time-multiplexed general neural network processor |
-
2016
- 2016-04-27 CN CN201910402047.3A patent/CN110188870B/en active Active
- 2016-04-27 CN CN201610267211.0A patent/CN107316078B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104732274A (en) * | 2015-03-10 | 2015-06-24 | 华南理工大学 | Intelligent computer |
CN104757992A (en) * | 2015-03-16 | 2015-07-08 | 广东工业大学 | Cardiac sound diagnostic system based on depth confidence network and diagnostic method |
CN105117706A (en) * | 2015-08-28 | 2015-12-02 | 小米科技有限责任公司 | Image processing method and apparatus and character recognition method and apparatus |
CN105488565A (en) * | 2015-11-17 | 2016-04-13 | 中国科学院计算技术研究所 | Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm |
CN105447569A (en) * | 2015-12-18 | 2016-03-30 | 北京柏惠维康科技有限公司 | Breast cancer cell characteristic analysis system based on deep learning |
Non-Patent Citations (1)
Title |
---|
YUNJI CHEN 等: ""DaDianNao: A Machine-Learning Supercomputer"", 《INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 * |
Cited By (105)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109754062A (en) * | 2017-11-07 | 2019-05-14 | 上海寒武纪信息科技有限公司 | The execution method and Related product of convolution extended instruction |
CN109754062B (en) * | 2017-11-07 | 2024-05-14 | 上海寒武纪信息科技有限公司 | Execution method of convolution expansion instruction and related product |
CN109784125A (en) * | 2017-11-10 | 2019-05-21 | 福州瑞芯微电子股份有限公司 | Deep learning network processing device, method and image processing unit |
CN109902814A (en) * | 2017-12-11 | 2019-06-18 | 北京中科寒武纪科技有限公司 | Neural network computing module and method |
CN109902814B (en) * | 2017-12-11 | 2020-01-17 | 中科寒武纪科技股份有限公司 | Neural network operation module and method |
CN111738431B (en) * | 2017-12-11 | 2024-03-05 | 中科寒武纪科技股份有限公司 | Neural network computing device and method |
US11657258B2 (en) | 2017-12-11 | 2023-05-23 | Cambricon Technologies Corporation Limited | Neural network calculation apparatus and method |
CN109902811A (en) * | 2017-12-11 | 2019-06-18 | 北京中科寒武纪科技有限公司 | Neural network computing device and method |
US11803735B2 (en) | 2017-12-11 | 2023-10-31 | Cambricon Technologies Corporation Limited | Neural network calculation apparatus and method |
CN111738431A (en) * | 2017-12-11 | 2020-10-02 | 中科寒武纪科技股份有限公司 | Neural network operation device and method |
CN109961136B (en) * | 2017-12-14 | 2020-05-19 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN109961136A (en) * | 2017-12-14 | 2019-07-02 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN108108189A (en) * | 2017-12-15 | 2018-06-01 | 北京中科寒武纪科技有限公司 | A kind of computational methods and Related product |
CN109993290A (en) * | 2017-12-30 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN113807510A (en) * | 2017-12-30 | 2021-12-17 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
US11734548B2 (en) | 2017-12-30 | 2023-08-22 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
US11710031B2 (en) | 2017-12-30 | 2023-07-25 | Cambricon Technologies Corporation Limited | Parallel processing circuits for neural networks |
CN113807510B (en) * | 2017-12-30 | 2024-05-10 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related products |
US11651202B2 (en) | 2017-12-30 | 2023-05-16 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
CN109993289A (en) * | 2017-12-30 | 2019-07-09 | 北京中科寒武纪科技有限公司 | Integrated circuit chip device and Related product |
CN109993290B (en) * | 2017-12-30 | 2021-08-06 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
US11704544B2 (en) | 2017-12-30 | 2023-07-18 | Cambricon Technologies Corporation Limited | Integrated circuit chip device and related product |
CN109993289B (en) * | 2017-12-30 | 2021-09-21 | 中科寒武纪科技股份有限公司 | Integrated circuit chip device and related product |
CN108364065B (en) * | 2018-01-19 | 2020-09-11 | 上海兆芯集成电路有限公司 | Microprocessor for booth multiplication |
CN108364065A (en) * | 2018-01-19 | 2018-08-03 | 上海兆芯集成电路有限公司 | Adopt the microprocessor of booth multiplication |
CN110163349A (en) * | 2018-02-12 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of calculation method and device of network model |
CN110147249B (en) * | 2018-02-12 | 2021-02-09 | 上海寒武纪信息科技有限公司 | Network model calculation method and device |
CN110163349B (en) * | 2018-02-12 | 2021-03-23 | 上海寒武纪信息科技有限公司 | Network model calculation method and device |
CN110147249A (en) * | 2018-02-12 | 2019-08-20 | 上海寒武纪信息科技有限公司 | A kind of calculation method and device of network model |
CN110163363A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
US11704125B2 (en) | 2018-02-13 | 2023-07-18 | Cambricon (Xi'an) Semiconductor Co., Ltd. | Computing device and method |
US11663002B2 (en) | 2018-02-13 | 2023-05-30 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
US11709672B2 (en) | 2018-02-13 | 2023-07-25 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
CN110383300A (en) * | 2018-02-13 | 2019-10-25 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
US11720357B2 (en) | 2018-02-13 | 2023-08-08 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
CN110163362B (en) * | 2018-02-13 | 2020-12-11 | 上海寒武纪信息科技有限公司 | Computing device and method |
US11630666B2 (en) | 2018-02-13 | 2023-04-18 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
US11620130B2 (en) | 2018-02-13 | 2023-04-04 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
CN110163362A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163357A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163360A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163356A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163360B (en) * | 2018-02-13 | 2021-06-25 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110163361B (en) * | 2018-02-13 | 2021-06-25 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110163350B (en) * | 2018-02-13 | 2021-06-08 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110163363B (en) * | 2018-02-13 | 2021-05-11 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110163361A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110163350A (en) * | 2018-02-13 | 2019-08-23 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
US11740898B2 (en) | 2018-02-13 | 2023-08-29 | Shanghai Cambricon Information Technology Co., Ltd | Computing device and method |
CN110163356B (en) * | 2018-02-13 | 2020-10-09 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110383300B (en) * | 2018-02-13 | 2024-03-05 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110197270A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197271A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197270B (en) * | 2018-02-27 | 2020-10-30 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN110197268A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN111767998A (en) * | 2018-02-27 | 2020-10-13 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN111767998B (en) * | 2018-02-27 | 2024-05-14 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related products |
CN111767996A (en) * | 2018-02-27 | 2020-10-13 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN110197273A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110197273B (en) * | 2018-02-27 | 2020-08-25 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN110197269A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
TWI786255B (en) * | 2018-02-27 | 2022-12-11 | 大陸商寒武紀(西安)集成電路有限公司 | Integrated circuit chip device, chip, intelligent device, and computing method of neural network |
CN110197272A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and Related product |
CN110196734A (en) * | 2018-02-27 | 2019-09-03 | 上海寒武纪信息科技有限公司 | A kind of computing device and Related product |
CN111767996B (en) * | 2018-02-27 | 2024-03-05 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related products |
CN110197271B (en) * | 2018-02-27 | 2020-10-27 | 上海寒武纪信息科技有限公司 | Integrated circuit chip device and related product |
CN111626413A (en) * | 2018-03-14 | 2020-09-04 | 上海寒武纪信息科技有限公司 | Computing device and method |
CN110276447A (en) * | 2018-03-14 | 2019-09-24 | 上海寒武纪信息科技有限公司 | A kind of computing device and method |
CN110472734A (en) * | 2018-05-11 | 2019-11-19 | 上海寒武纪信息科技有限公司 | A kind of computing device and Related product |
CN110472734B (en) * | 2018-05-11 | 2024-03-29 | 上海寒武纪信息科技有限公司 | Computing device and related product |
CN108710958A (en) * | 2018-05-16 | 2018-10-26 | 北京旋极信息技术股份有限公司 | A kind of prediction health control method and device, computer readable storage medium |
CN108710958B (en) * | 2018-05-16 | 2022-04-15 | 北京旋极信息技术股份有限公司 | Predictive health management method and device and computer readable storage medium |
CN108763360A (en) * | 2018-05-16 | 2018-11-06 | 北京旋极信息技术股份有限公司 | A kind of sorting technique and device, computer readable storage medium |
CN108859477A (en) * | 2018-07-05 | 2018-11-23 | 吉林工程技术师范学院 | A kind of children's literature book binder and its control method |
CN110806903A (en) * | 2018-08-01 | 2020-02-18 | 珠海格力电器股份有限公司 | Configuration parameter determining method and device of electric cooker |
CN112805727A (en) * | 2018-10-08 | 2021-05-14 | 深爱智能科技有限公司 | Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network |
CN110059809B (en) * | 2018-10-10 | 2020-01-17 | 中科寒武纪科技股份有限公司 | Computing device and related product |
CN110059809A (en) * | 2018-10-10 | 2019-07-26 | 北京中科寒武纪科技有限公司 | A kind of computing device and Related product |
CN111047045A (en) * | 2018-10-12 | 2020-04-21 | 中科寒武纪科技股份有限公司 | Distribution system and method for machine learning operation |
CN111079908A (en) * | 2018-10-18 | 2020-04-28 | 上海寒武纪信息科技有限公司 | Network-on-chip data processing method, storage medium, computer device and apparatus |
US11880330B2 (en) | 2018-10-18 | 2024-01-23 | Shanghai Cambricon Information Technology Co., Ltd. | Network-on-chip data processing method and device |
US11971836B2 (en) | 2018-10-18 | 2024-04-30 | Shanghai Cambricon Information Technology Co., Ltd. | Network-on-chip data processing method and device |
US11960431B2 (en) | 2018-10-18 | 2024-04-16 | Guangzhou University | Network-on-chip data processing method and device |
US11797467B2 (en) | 2018-10-18 | 2023-10-24 | Shanghai Cambricon Information Technology Co., Ltd. | Data processing device with transmission circuit |
CN111079908B (en) * | 2018-10-18 | 2024-02-13 | 上海寒武纪信息科技有限公司 | Network-on-chip data processing method, storage medium, computer device and apparatus |
US11809360B2 (en) | 2018-10-18 | 2023-11-07 | Shanghai Cambricon Information Technology Co., Ltd. | Network-on-chip data processing method and device |
US11841816B2 (en) | 2018-10-18 | 2023-12-12 | Shanghai Cambricon Information Technology Co., Ltd. | Network-on-chip data processing method and device |
US11880329B2 (en) | 2018-10-18 | 2024-01-23 | Shanghai Cambricon Information Technology Co., Ltd. | Arbitration based machine learning data processor |
US11868299B2 (en) | 2018-10-18 | 2024-01-09 | Shanghai Cambricon Information Technology Co., Ltd. | Network-on-chip data processing method and device |
US11880328B2 (en) | 2018-10-18 | 2024-01-23 | Shanghai Cambricon Information Technology Co., Ltd. | Network-on-chip data processing method and device |
CN111178492B (en) * | 2018-11-09 | 2020-12-11 | 安徽寒武纪信息科技有限公司 | Computing device, related product and computing method for executing artificial neural network model |
CN111178492A (en) * | 2018-11-09 | 2020-05-19 | 中科寒武纪科技股份有限公司 | Computing device, related product and computing method for executing artificial neural network model |
CN111258641A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN109542837A (en) * | 2018-11-30 | 2019-03-29 | 上海寒武纪信息科技有限公司 | Operation method, device and Related product |
CN111260046A (en) * | 2018-11-30 | 2020-06-09 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN111260046B (en) * | 2018-11-30 | 2022-12-02 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN111258641B (en) * | 2018-11-30 | 2022-12-09 | 上海寒武纪信息科技有限公司 | Operation method, device and related product |
CN109919313A (en) * | 2019-01-31 | 2019-06-21 | 华为技术有限公司 | A kind of method and distribution training system of gradient transmission |
CN109978160A (en) * | 2019-03-25 | 2019-07-05 | 北京中科寒武纪科技有限公司 | Configuration device, method and the Related product of artificial intelligence process device |
US11847554B2 (en) | 2019-04-18 | 2023-12-19 | Cambricon Technologies Corporation Limited | Data processing method and related products |
CN111461340B (en) * | 2020-03-10 | 2023-03-31 | 北京百度网讯科技有限公司 | Weight matrix updating method and device and electronic equipment |
CN111461340A (en) * | 2020-03-10 | 2020-07-28 | 北京百度网讯科技有限公司 | Weight matrix updating method and device and electronic equipment |
CN112329619A (en) * | 2020-11-04 | 2021-02-05 | 济南博观智能科技有限公司 | Face recognition method and device, electronic equipment and readable storage medium |
CN114071781B (en) * | 2021-11-16 | 2024-04-12 | 杭州电子科技大学 | Wireless local area network medium access control method |
CN114071781A (en) * | 2021-11-16 | 2022-02-18 | 杭州电子科技大学 | Wireless local area network medium access control method |
Also Published As
Publication number | Publication date |
---|---|
CN110188870B (en) | 2021-10-12 |
CN110188870A (en) | 2019-08-30 |
CN107316078B (en) | 2021-05-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107316078A (en) | Apparatus and method for performing artificial neural network self study computing | |
EP3451157B1 (en) | Device and method for performing forward operation of convolutional neural network | |
US10713568B2 (en) | Apparatus and method for executing reversal training of artificial neural network | |
CN107341547B (en) | Apparatus and method for performing convolutional neural network training | |
US20200097806A1 (en) | Processing method and accelerating device | |
CN111860811B (en) | Device and method for executing full-connection layer forward operation of artificial neural network | |
CN110929863B (en) | Apparatus and method for performing LSTM operations | |
CN107341541A (en) | A kind of apparatus and method for performing full articulamentum neural metwork training | |
CN107301454B (en) | Artificial neural network reverse training device and method supporting discrete data representation | |
CN107301453B (en) | Artificial neural network forward operation device and method supporting discrete data representation | |
CN107832844A (en) | A kind of information processing method and Related product | |
WO2017185347A1 (en) | Apparatus and method for executing recurrent neural network and lstm computations | |
CN106991476A (en) | Apparatus and method for performing artificial neural network forward operation | |
EP3564863B1 (en) | Apparatus for executing lstm neural network operation, and operational method | |
CN108320018B (en) | Artificial neural network operation device and method | |
EP3451240A1 (en) | Apparatus and method for performing auto-learning operation of artificial neural network | |
EP3444758B1 (en) | Discrete data representation-supporting apparatus and method for back-training of artificial neural network | |
CN108960415B (en) | Processing apparatus and processing system | |
CN111178492A (en) | Computing device, related product and computing method for executing artificial neural network model | |
CN109146069B (en) | Arithmetic device, arithmetic method, and chip |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing Applicant after: Zhongke Cambrian Technology Co., Ltd Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd. |
|
GR01 | Patent grant | ||
GR01 | Patent grant |