CN107316078A - Apparatus and method for performing artificial neural network self study computing - Google Patents

Apparatus and method for performing artificial neural network self study computing Download PDF

Info

Publication number
CN107316078A
CN107316078A CN201610267211.0A CN201610267211A CN107316078A CN 107316078 A CN107316078 A CN 107316078A CN 201610267211 A CN201610267211 A CN 201610267211A CN 107316078 A CN107316078 A CN 107316078A
Authority
CN
China
Prior art keywords
mover
computing module
vector
mrow
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610267211.0A
Other languages
Chinese (zh)
Other versions
CN107316078B (en
Inventor
李震
郭崎
陈云霁
陈天石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Beijing Zhongke Cambrian Technology Co Ltd
Original Assignee
Beijing Zhongke Cambrian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Cambrian Technology Co Ltd filed Critical Beijing Zhongke Cambrian Technology Co Ltd
Priority to CN201610267211.0A priority Critical patent/CN107316078B/en
Priority to CN201910402047.3A priority patent/CN110188870B/en
Publication of CN107316078A publication Critical patent/CN107316078A/en
Application granted granted Critical
Publication of CN107316078B publication Critical patent/CN107316078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention discloses a kind of apparatus and method for performing artificial neural network self study computing, described device includes the location of instruction, controller unit, data access unit, interconnecting modules, main computing module and multiple from computing module.The present invention can be to the self study pre-training of multilayer neural network according to the training method successively trained, and for each layer network, the present invention is by multiple computing iteration until weight is updated after being less than certain threshold value, and the self study pre-training of the layer network is completed.Each iterative process can be divided into four-stage, and preceding three phases calculate generation single order hidden layer median, single order visible layer median and second order hidden layer median respectively, and last stage then updates weight using the median of preceding three phases.

Description

Apparatus and method for performing artificial neural network self study computing
Technical field
The present invention relates to artificial neural network technology, it is used to perform ANN more particularly to one kind The apparatus and method of network self study computing.
Background technology
Multi-layer artificial neural network is widely used in pattern-recognition, image procossing, function approximation and excellent Change calculate etc. field, multilayer artificial network in recent years due to its higher recognition accuracy and preferably Can concurrency, more and more widely paid close attention to by academia and industrial quarters.
Typical multi-layer artificial neural network training method is backpropagation (BP) algorithm.The method is The representative type of supervised learning, needs the training sample of substantial amounts of tape label in the training process, but Cost price needed for the collection of sample is very high.Meanwhile, in the training process of the method, error correction Signal reduces with the increase for propagating the number of plies, and training easily converges on local minimum and convergence speed Degree is slower.Therefore, first use fast convergence rate and be not required to the self-learning algorithm pair of tape label training sample Network parameter pre-training, is then finely adjusted multilayer neural network as one using backpropagation training again Individual new focus.Wherein, it is particularly important as the self study computing of pre-training.
A kind of known method for supporting multi-layer artificial neural network self study computing is to use general procedure Device.This method performs universal command come on supporting by using general-purpose register and general utility functions part State algorithm.One of shortcoming of this method is that the operational performance of single general processor is relatively low, it is impossible to met The performance requirement of common multi-layer artificial neural network computing.And multiple general processors when performing parallel, The intercommunication of general processor becomes performance bottleneck again.In addition, general processor is needed many Layer artificial neural network pre-training computing is decoded into before a queue of computing and access instruction sequence, processor End decoding brings larger power dissipation overhead
The known method of another support multi-layer artificial neural network pre-training is to use graphics processor (GPU).This method performs general SIMD by using general-purpose register and general stream processing unit Instruct to support above-mentioned algorithm.Because GPU is to be specifically used to perform graph image computing and science The equipment of calculating, not to the special support of multi-layer artificial neural network computing, it is still desirable to substantial amounts of Front end work decoding could perform multi-layer artificial neural network computing, bring substantial amounts of overhead. Other GPU only has less upper caching, the model data (weights) of multi-layer artificial neural network Need to carry outside piece repeatedly, the outer bandwidth of piece becomes main performance bottleneck.In addition, GPU only compared with Cached on small piece, the model data (weights) of multi-layer artificial neural network needs to remove outside piece repeatedly Fortune, the outer bandwidth of piece becomes main performance bottleneck, while bringing huge power dissipation overhead.
The content of the invention
To be solved by this invention is that general processor (GPU, CPU) carries out multilayer in the prior art Neutral net pre-training needs a series of simple operation and memory access computing, front end decoding power dissipation overhead Larger and existing general processor data memory access expense is big, single general processor operational performance is low The problems such as.
The present invention proposes a kind of device for being used to perform artificial neural network self study computing, including instruction Memory cell, controller unit, data access unit, interconnecting modules, main computing module, Yi Jiduo It is individual from computing module, wherein:The location of instruction, which is used to read in by data access unit, to be instructed And cache the instruction of reading;The controller unit, which is used to read from the location of instruction, to be instructed, and will The Instruction decoding is believed into control interconnecting modules, main computing module and from the control of computing module behavior Number, respective control signal is then distributed to modules;The data access unit is used to access External address space, completes the loading and storage of data;The interconnecting modules have different topology realization, It is the multiple from computing module for the input vector of the main computing module to be distributed to, and will be each Main computing module is returned to after merging from the result of calculation of computing module;The main computing module be used for pair The median that the interconnecting modules are returned carries out activation primitive, gibbs sampler, and to activation primitive Biasing renewal;It is described to be used for the dot-product operation of input vector and respective weights matrix from computing module, The product calculation of respective component scalar sum respective weights matrix in input vector, and weight matrix Update.
According to the present invention embodiment, the main computing module include arithmetic element, data according to Rely relation judging unit and memory cell, wherein, the memory cell exists for caching main computing module The input data and output data used in calculating process, the arithmetic element are used to complete main computing mould The computing of block;The data dependence relation judging unit is the arithmetic element and read-write memory cell Port, the read-write uniformity for ensureing data in memory cell.
According to the embodiment of the present invention, the data dependence relation judging unit is used to judge still Whether there is between the data of the control signal that is not carried out and the control signal during being carrying out according to The relation of relying, if there is no, it is allowed to this group of control signal is launched immediately, otherwise needs to wait until the group control This control signal just allows to be sent out after the completion of all control signals that signal processed is relied on all are performed Penetrate.
According to the embodiment of the present invention, the data dependence relation judging unit is additionally operable to read Access evidence is sent to from computing module by interconnecting modules.
It is each described to include arithmetic element, number from computing module according to the embodiment of the present invention According to dependence judging unit, the first memory cell, the second memory cell and the 3rd memory cell, its In, the arithmetic element is used to receiving the control signal that controller unit sends and carries out arithmetic logic fortune Calculate;The data dependence relation judging unit is used to be monitored the read-write operation of buffer unit, with Ensure that uniformity conflict is not present in the read-write to buffer unit;First memory cell is used to cache god Input vector and result of calculation through member;Second memory cell is described from computing module for caching The weight data needed in calculating process;3rd memory cell is used to cache accordingly from computing mould The weights gradient data that block needs during weights are updated.
The present invention also proposes a kind of method for performing artificial neural network successively self study computing, the people Artificial neural networks include two layers or more than two layers of multiple neurons, the self study of artificial neural network Pre-training is using successively training, and for each layer, the pre-training is divided into four-stage:
First stage, input neuron vectorWith weight vector matrixDot-product operation is carried out to obtain Local induction domain, local induction domain uses gibbs (Gibbs) again after activation primitive nonlinear transformation Sampling calculating obtains single order hidden layer median
Second stage, first by the transposition of weight vector matrixWith the transposition of single order hidden layer median Dot-product operation is carried out, it is locally adopted in induction domain using Gibbs again after activation primitive nonlinear transformation Sample obtains single order visible layer median
Phase III, input single order visible layer medianWith weight vector matrixCarry out dot product fortune Calculation obtains local induction domain, and local induction domain obtains the second hidden layer after activation primitive nonlinear transformation Median
Fourth stage, weight is updated according to equation below:
Wherein, it is vectorialFor vector sum weight matrix before progress of above-mentioned first and third stage activation primitive Dot product part and the biasing added, vectorBiasing when being then second stage;"×" is represented in formula The multiplication cross of vector, ∈ is then learning rate.
Compared to prior art, the present invention is optimized to the instruction of multilayer neural network pre-training, is handled The pre-training that device only can complete one layer of neutral net with an instruction learns, and has simplified general processor and has referred to The front end decoding overheads of order;Meanwhile, the present invention includes main computing module, multiple from computing module And memory access expense is alleviated in storage on a large amount of distributed pieces, and neutral net pre-training computing can be performed parallel Without carrying out data memory access outside frequently piece.Sum it up, the power dissipation ratio of performance of the present invention is far above General processor.
Present invention could apply in following (including but is not limited to) scene:Data processing, robot, Computer, printer, scanner, phone, tablet personal computer, intelligent terminal, mobile phone, drive recorder, Navigator, sensor, camera, cloud server, camera, video camera, projecting apparatus, wrist-watch, Each electronic products such as earphone, mobile storage, wearable device;Aircraft, steamer, vehicle etc. are all kinds of The vehicles;TV, air-conditioning, micro-wave oven, refrigerator, electric cooker, humidifier, washing machine, electric light, All kinds of household electrical appliance such as gas-cooker, lampblack absorber;And including NMR, B ultrasound, electrocardiograph Etc. all kinds of Medical Devices.
Brief description of the drawings
For a more complete understanding of the present invention and its advantage, referring now to the following description with reference to accompanying drawing, Wherein:
Fig. 1, which is shown, according to embodiments of the present invention to be used to perform artificial neural network self study pre-training Device integrally-built example block diagram.
Fig. 2, which is diagrammatically illustrated, according to embodiments of the present invention to be used to perform artificial neural network self study The H trees of interconnecting modules are realized in the device of pre-training.
Fig. 3, which is shown, according to embodiments of the present invention to be used to perform artificial neural network self study pre-training Device in main computing module structure example block diagram.
Fig. 4, which is shown, according to embodiments of the present invention to be used to perform artificial neural network self study pre-training Device in from the example block diagram of computing module structure.
Fig. 5 shows Neural Network Self-learning pre-training process first and third according to embodiments of the present invention The example block diagram in stage.
Fig. 6 shows Neural Network Self-learning pre-training process second stage according to embodiments of the present invention Example block diagram.
Fig. 7 shows Neural Network Self-learning pre-training process fourth stage according to embodiments of the present invention Example flow chart.
Fig. 8 shows monolayer neural networks self study pre-training an iteration according to embodiments of the present invention Example flow chart.
In all of the figs, identical device, part, unit etc. carry out table using identical reference Show.
Embodiment
According to reference to accompanying drawing, to the described in detail below of exemplary embodiment of the present, of the invention is other Aspect, advantage and prominent features will become obvious for those skilled in the art.
In the present invention, term " comprising " and " containing " and its derivative mean including and it is unrestricted; Term "or" is inclusive, mean and/or.
In this manual, following various embodiments for being used to describe the principle of the invention are explanation, no The scope of limitation invention should be construed in any way.Referring to the drawings described below is used to help complete The exemplary embodiment of the invention that foliation solution is limited by claim and its equivalent.Bag described below A variety of details are included to help to understand, but these details are considered as what is be merely exemplary.Therefore, It will be appreciated by those of ordinary skill in the art that in the case of without departing substantially from scope and spirit of the present invention, Embodiment described herein can be made various changes and modifications.In addition, for clarity and brevity For the sake of, eliminate the description of known function and structure.In addition, through accompanying drawing, same reference numbers are used In identity function and operation.
The self study pre-training of multi-layer artificial neural network according to embodiments of the present invention, ANN Network includes two layers or more than two layers of multiple neurons.The self study pre-training of artificial neural network is adopted With successively training, training is up to last layer since first layer.For each layer, pre-training It is divided into four-stage:
First stage, input neuron vectorFirst with weight vector matrixDot-product operation is carried out to obtain Domain is induced to local, local induction domain uses gibbs again after activation primitive nonlinear transformation (Gibbs) sampling calculating obtains single order hidden layer median
Second stage, first by the transposition of weight vector matrixWith the transposition of single order hidden layer median Dot-product operation is carried out, it is locally adopted in induction domain using Gibbs again after activation primitive nonlinear transformation Sample obtains single order visible layer median
Phase III is similar with the first stage, and difference is phase III input in the middle of single order visible layer ValueCalculate the second hidden layer medianIt is not required to sample by Gibbs before;
Fourth stage, weight is updated according to equation below:
Wherein, it is vectorialFor vector sum weight matrix before progress of above-mentioned first and third stage activation primitive Dot product part and the biasing added, vectorBiasing when being then second stage;"×" is represented in formula The multiplication cross of vector, ∈ is then learning rate.
Fig. 1 shows the device for being used to perform artificial neural network self study pre-training according to the present invention Integrally-built example block diagram.As shown in figure 1, the device includes the location of instruction 1, control Device unit 2, data access unit 3, interconnecting modules 4, main computing module 5 and multiple from computing module 6.The location of instruction 1, controller unit 2, data access unit 3, interconnecting modules 4, main fortune Calculate module 5 and can pass through hardware circuit (such as application-specific integrated circuit ASIC) from computing module 6 Realize.
The location of instruction 1 reads in the instruction for instructing and caching reading by data access unit 3.
Controller unit 2 reads instruction from the location of instruction 1, and instruction is translated into and controls other moulds The control signal of block behavior is simultaneously sent to other modules such as data access unit 3, the and of main computing module 5 From computing module 6 etc..
Data access unit 3 can memory access external address space, directly to inside device each caching Unit reads and writes data, completes the loading and storage of data.
Fig. 2 diagrammatically illustrates the structure of interconnecting modules 4.Interconnecting modules 4 constitute main computing module 5 And multiple data paths between computing module 6, and with different structures.Interconnection is by multiple The binary tree path that node is constituted, each node similarly issues the data of upstream at two sections in downstream Point, the data that two nodes in downstream are returned are merged, and return to the node of upstream.For example, In the phase process of Neural Network Self-learning computing first and third, the input vector in main computing module 5 Each is sent to from computing module 6 by interconnecting modules 4;After the completion of the calculating process of computing module 6, After the completion of the calculating process from computing module, the value of the neuron each exported from computing module can be A complete vector by locally inducing domain to constitute is combined into interconnecting modules step by step, intermediate result is used as Vector returns to main computing module 5 and carries out activation primitive and carry out Gibbs samplings according to demand.And During two-stage, the single order hidden layer median vector in main computing module 5Pass through interconnecting modules 4 Each is sent to from computing module 6;After the completion of the calculating process from computing module 6, two, downstream The vector that node is returned can be summed into a vector in present node and return to upstream node, in Between result vector return to main computing module 5 and carry out activation primitive and Gibbs samplings.
Fig. 3 is shown in the device for performing artificial neural network pre-training computing according to the present invention The example block diagram of the structure of main computing module 5.As shown in figure 3, main computing module 5 includes computing list Member 51, data dependence relation judging unit 52 and memory cell 53.
Memory cell 53 be used to caching the input data that main computing module 5 uses in calculating process and Output data, arithmetic element 51 completes the various calculation functions of main computing module 5, and data dependence is closed Be judging unit 52 be the read-write memory cell 53 of arithmetic element 51 port, deposited while ensure that The read-write uniformity of data in storage unit.Specifically, data dependence relation judging unit 52 judges still Whether there is between the data of the control signal that is not carried out and the control signal during being carrying out according to The relation of relying, if there is no, it is allowed to this group of control signal is launched immediately, otherwise needs to wait until the group control This control signal just allows to be sent out after the completion of all control signals that signal processed is relied on all are performed Penetrate.For example, all control signals for being sent to data dependence relation unit 52 can all be stored into data according to Rely in the instruction queue inside relation unit 52, in the queue, the model of the reading data of reading instruction Clashed if enclosing the write command forward with queue position and writing the scopes of data, the instruction must be waited It can be performed after being performed to the write command relied on.Meanwhile, data dependence relation judging unit 52 It also is responsible for that data will be read and is sent to by interconnecting modules 4 from computing module, and from computing module 6 Output data is transmitted directly to arithmetic element 51 by interconnecting modules 4.The finger that controller unit 2 is exported Order is sent to computing unit 51 and data dependence relation judging unit 52, to control its behavior.
Fig. 4 is shown in the device for performing artificial neural network pre-training according to the present invention from fortune Calculate the example block diagram of the structure of module 6.As shown in figure 4, each include computing list from computing module 6 Member 61, data dependence relation judging unit 62, the first memory cell 63, the second memory cell 64 With the 3rd memory cell 65.
Arithmetic element 61 receives the control signal that sends of controller unit 2 and carries out arithmetic logic computing.
Data dependence relation judging unit 62 is responsible for the read-write operation to buffer unit in calculating process. Data dependence relation judging unit 62 ensures that uniformity conflict is not present in the read-write to buffer unit.Example Such as, all control signals for being sent to data dependence relation unit 62 can all be stored into data dependence relation In instruction queue inside unit 62, in the queue, if the scope of the reading data of reading instruction Forward write command is write the scopes of data and clashed with queue position, then the instruction must when institute according to Bad write command can be performed after being performed.
First memory cell 63 caches the input neuron vector in each phase processSingle order hidden layer MedianSingle order visible layer medianSingle order hidden layer medianAnd the calculating of each stage Input vector and weight matrix dot product result.
Second memory cell 64 caches the weight data that this needs from computing module 6 in calculating process. For each from computing module, all it can only store in weight matrix with that should be stored from computing module 6 The corresponding row of scalar data.
3rd memory cell 65 caches the corresponding weights needed from computing module during weights are updated Gradient data.The weight data that each weights gradient data stored from computing module 6 is stored with it It is corresponding.
From computing module 6 realize artificial neural network self study pre-training during first three stage pipeline The renewal of first half and the last stage formula (1) weights.
, will be preceding triphasic by taking the pre-training of artificial neural network depth belief network (DBN) as an example Weight matrix(or) and input neuron vectorMultiplication can be divided into it is incoherent simultaneously Row calculates subtask.In first and third stage, each identical input vector is utilized from computing module 6 Value, the corresponding weights of components different with output vector carry out dot product multiplying, respectively obtain output In vector the corresponding part of different components and, it is repeatedly cumulative after obtain its respective corresponding output component this A little parts and it is combined into a complete local induction domain vector step by step in interconnecting modules 4.Each from computing Module 6 only needs to calculate this module correspondence output neuron value and locally induces domain accordingly.No Same local induction domain component is combined into a complete part and induces domain vector to pass step by step in interconnecting modules 4 It is defeated by main computing module and carries out activation primitive and subsequent sampling.In second stage, each from computing The single order hidden layer median vector that module 6 calculates inputIn corresponding part scalar sum weight matrixThe product of corresponding row, obtained each output vector is one of final result and treats cumulative portion Point and, these parts obtain last result with being added two-by-two step by step in interconnecting modules.Each from fortune Calculate module 6 calculate the local induction domain of output single order visible layer vector part and, all part and Summation operation is completed in interconnecting modules 4 and obtains last local induction domain.Preceding three phases are calculated and obtained Median is used to update weight, and output of the main computing module 5 based on preceding three phases computing carries out follow-up Computing draws weight updated value.In last stage, updated from computing module 5 according to formula (1) Weight can also be divided into three small steps:
1. each single order hidden layer median vector that input is calculated from computing module 6With input nerve MemberIn corresponding part scalar product median;
2. each single order hidden layer median vector that input is calculated from computing module 6It is visible with single order Layer vectorIn corresponding part scalar product, and calculate and the first small stage median vector difference Value;
3. the product of each difference and learning rate for calculating for the second small stage from computing module 6, is weighed Weight updated value, afterwards and weightCarry out vector subtraction, the weight after being updated.
It is worth noting that, the above three small stage is only to updating weight one from computing module 6 Example is described, and application person can carry out the fine setting of details, for example, can be by the product in the first small stage Calculating and the second small stage in product calculating exchange;Or the 3rd small stage can be multiplied by study Rate advanceed to for the second small stage and even split to the first two small stage.
According to embodiments of the present invention, additionally provide and artificial neural network forward direction fortune is performed in aforementioned means The instruction set of calculation.Instruction set include CONFIG instructions, COMPUTE instructions, I/O instruction, NOP instruction, JUMP instructions and MOVE instructions, wherein:
CONFIG instructions configure current layer before every layer of artificial neural networks start and calculate what is needed Various constants;
The arithmetical logic that COMPUTE instructions complete every layer of artificial neural network is calculated;
I/O instruction is realized to read in from external address space and calculates the input data needed and calculating Data are stored back to exterior space after;
NOP instruction is responsible for emptying the control letter being currently filled in internal all control signal buffer queues Number, it is ensured that all instruction all instructions before NOP instruction are finished.NOP instruction does not include in itself Any operation;
The next IA that controller will be read from the location of instruction is responsible in JUMP instructions Redirect, for realizing redirecting for controlling stream;
MOVE instructions are responsible for the data of a certain address of device internal address space being carried in device Another address of portion's address space, the process is not take up fortune in the process of implementation independently of arithmetic element Calculate the resource of unit.
Fig. 5 shows Neural Network Self-learning pre-training process first and third according to embodiments of the present invention The example block diagram in stage.In difference from computing module 6, the input vector point that interconnecting modules 4 are broadcasted Dot-product operation is not carried out from the weight vector of computing module 6 with this, corresponding output neuron value is obtained Local induction domain part and, all these local induction thresholdings composition intermediate results vectors of output should Intermediate result vector by plus bias vector and activation computing obtain the final defeated of this layer of neutral net Go out neuron vector, formula is described as out=f (w*in+b), and wherein out output vectors, in is defeated Incoming vector, b are bias vectors, and w is weight matrix, and f is activation primitive.Each from computing module 6 Weight vector be in weight matrix with should be from the corresponding column vector of computing module 6.Interconnecting modules 4 By input vector [I0..., In] be sent to all from arithmetic element, it is temporarily stored in the first memory cell. For i-th from arithmetic element, its corresponding weight vector [W is calculatedi0..., Win] and input vector Dot product.The result exported from arithmetic element is combined into complete local induction domain vector by interconnecting modules 4 And main arithmetic element 5 is returned to, activation primitive computing is carried out in main arithmetic element 5 and it may Gibbs sampling, obtain last output vector [O0, O1..., On]。
Fig. 6 shows Neural Network Self-learning pre-training process second stage according to embodiments of the present invention Example block diagram.Calculate output single order visible layer vectorProcess for interconnecting modules 4 broadcast single order it is hidden Layer vector value, each takes from computing module 6In corresponding part scalar h0iWith weight matrixCorrespondence Row [Wi0..., Win] product, obtained each output vector be single order visible layer vector office Portion's induction one of domain treat cumulative part and, these parts and two two-phase step by step in interconnecting modules 4 Plus obtain last local induction domain.The local induction domain calculated returns to main arithmetic element 5, Activation primitive computing and its possible Gibbs sampling are carried out in main arithmetic element 5, obtains last Export single order visible layer vector
Fig. 7 shows Neural Network Self-learning pre-training process fourth stage according to embodiments of the present invention Flow chart.In last stage, updating weight according to formula (1) from computing module 5 can also be divided into Three small steps:
1. each single order hidden layer median vector that input is calculated from computing module 6With input nerve MemberIn the product median of corresponding part scalar cache to the 3rd memory cell shown in Fig. 4;This The small stage is similar to the block diagram of the second stage shown in Fig. 6, but in its input respectively single order hidden layer Between be worth vectorWith input neuron
2. each single order hidden layer median vector that input is calculated from computing module 6It is visible with single order Layer vectorIn corresponding part scalar product, and calculate and the first small stage median vector difference It is worth and caches to the 3rd memory cell shown in Fig. 4;
3. the product of each difference and learning rate for calculating for the second small stage from computing module 6, is weighed Weight updated value, afterwards and weightCarry out vector subtraction, the weight after being updated.
It is worth noting that, the above three small stage is only to updating weight one from computing module 6 Example is described, and application person can carry out the fine setting of details, for example, can be by the product in the first small stage Calculating and the second small stage in product calculating exchange;Or the 3rd small stage can be multiplied by study Rate advanceed to for the second small stage and even split to the first two small stage.
Fig. 8 shows one layer of artificial neural network self study pre-training computing flow according to one embodiment Figure, because multi-layer artificial neural network self study pre-training can be by the way of successively training, multilayer The pre-training of artificial neural network can call the multiple flow to realize.Flow chart description utilizes this hair Bright device and instruction set realizes a kind of monolayer neural networks self study pre-training computing shown in Fig. 4 Process.
In step S1, an I/O instruction is pre-deposited at the first address of instruction cache unit 1.
In step S2, computing starts, and controller unit 2 is read from the first address of instruction cache unit 1 This I/O instruction, according to the control signal translated, data access unit 3 is read from external address space Corresponding all artificial neural network operational orders are taken, and are buffered in the location of instruction 1.
In step S3, controller unit 2 then reads in next I/O instruction from the location of instruction, According to the control signal translated, data access unit 3 reads main computing module 5 from external address space All data needed (e.g., including input neuron vectorActivation primitive interpolation table, study Rate and biasing etc.) to main computing module 5 memory cell 53.
In step S4, controller unit 2 then reads in next I/O instruction from the location of instruction, According to the control signal translated, data access unit 3 is read from computing module 6 from external address space The weight matrix data needed.
In step S5, controller unit 2 then reads in next CONFIG from the location of instruction Instruction, according to the control signal translated, device configures what this layer of neutral net first stage calculating needed Various constants.For example, arithmetic element 51,61 according to the parameter configuration unit in control signal inside The value of register, precision setting, the data of activation primitive of the parameter for example including this layer of calculating.
In step S6, controller unit 2 then reads in next COMPUTE from the location of instruction Instruction, according to the control signal translated, starts the calculating of first stage.Main computing module 5 leads to first Neuron vector will be inputted by crossing interconnecting modules 4Issue each from computing module 6, preserve extremely from computing mould First memory cell 63 of block 6.From the arithmetic element 61 of computing module 6 from the second memory cell 64 Weight vector (corresponding to the column vector from computing module 6 in weight matrix) is read, from the first storage Unit reads input neuron vectorComplete weight vector and input neuron vectorDot product fortune Calculate, intermediate result is returned by interconnecting modules.In interconnecting modules 4, respectively returned from computing module 6 The intermediate result returned is combined into complete local induction domain vector step by step.Main computing module 5 is interconnected The return value of module 4, the control signal translated is instructed according to COMPUTE, from memory cell 53 Bias vector is read, then the addition of vectors returned with interconnecting modules 4 activates to addition result again, And Gibbs samplings are carried out, and last single order hidden layer is vectorialIt is written back to memory cell 53.
Then next CONFIG is read in step S7 controller units 2 from the location of instruction to refer to Order, according to the control signal translated, device configures this layer of neutral net second stage and calculates each of needs Plant constant.
In step S8, controller unit 2 then reads in next COMPUTE from the location of instruction Instruction, according to the control signal translated, starts the calculating of second stage.Main computing module 5 leads to first Interconnecting modules 4 are crossed by single order hidden layer vectorIssue each from computing module 6, preserve extremely from computing module 6 the first memory cell 63.Read from the arithmetic element 61 of computing module 6 from the second memory cell 64 Weight vector (corresponding to the column vector from computing module 6 in weight matrix) is taken, it is single from the first storage Member chooses single order hidden layer vectorScalar, complete weight vector and single order hidden layer vectorCorresponding mark The product calculation of amount, intermediate result is returned by interconnecting modules.In interconnecting modules 4, respectively from fortune The intermediate result for calculating the return of module 6 is summed into complete local induction domain vector step by step.Main computing mould Block 5 obtains the return value of interconnecting modules 4, and the control signal translated is instructed according to COMPUTE, from Memory cell 53 reads bias vector, the addition of vectors returned with interconnecting modules 4, then again to phase Plus result is activated, and Gibbs samplings are carried out, and last single order visible layer is vectorialIt is written back to Memory cell 53.
In step S9, controller unit 2 then reads in next CONFIG from the location of instruction Instruction, according to the control signal translated, device configures what this layer of neutral net phase III calculating needed Various constants.This layer of configuration is basic identical with the first stage, but also needs one learning rate of multi-configuration Parameter.
In step S10, controller unit 2 then reads in next COMPUTE from the location of instruction Instruction, according to the control signal translated, starts the calculating of phase III.Main computing module 5 leads to first Interconnecting modules 4 are crossed by single order hidden layer vectorIssue each from computing module 6, preserve extremely from computing module 6 the first memory cell 63.Single order visible layer vector is read from the first memory cellComplete weights Vector sum single order visible layer vectorDot-product operation, intermediate result is returned by interconnecting modules. In interconnecting modules 4, the intermediate result respectively returned from computing module 6 is combined into complete part and lured step by step Lead domain vector.Main computing module 5 obtains the return value of interconnecting modules 4, is instructed according to COMPUTE The control signal translated, from memory cell 53 read bias vector, with interconnecting modules 4 return to Amount is added, and then addition result is activated again, and last single order hidden layer is vectorialIt is written back to and deposits Storage unit 53.
In step S11, controller unit 2 then reads in next COMPUTE from the location of instruction Instruction, according to the control signal translated, starts the calculating of fourth stage.First small stage main computing mould Block 5 will input neuron vector by interconnecting modules 4 firstWith single order hidden layer vectorIssue it is each from Computing module 6, is preserved to the weight gradient buffer unit 65 from computing module 6.Second small stage from The arithmetic element 61 of computing module 6 reads single order hidden layer vector from the first memory cellIt is defeated with choosing Enter neuron vectorCorresponding component, completes single order hidden layer vectorWith corresponding input neuron vectorComponent product calculation, by intermediate result and from weight gradient buffer unit 65 read it is previous The median of small stage caching carries out vector subtraction computing, and will transport counted intermediate result and cache to power Weight gradient buffer unit 65.Last small stage is terraced from the arithmetic element 61 of computing module 6 from weight The median in the degree reading upper small stage of buffer unit 65 is multiplied with learning rate obtains weight updated value, And from weight buffer unit 64 read corresponding weight and weight updated value and carry out vector subtraction and obtain more Weight after new, and cached back weight buffer unit 64.In this way, monolayer neural networks are once Self study pre-training iteration is completed, and is learnt by successive ignition, and weight reaches certain convergence judgment criteria Then (weight updated value is less than some threshold value) monolayer neural networks pre-training terminates, can started next The pre-training of layer neutral net.
By using the device and instruction set for performing artificial neural network self study pre-training computing, Solve CPU and GPU operational performances not enough, the problem of front end decoding overheads are big.Effectively increase Support to multi-layer artificial neural network forward operation.
Cache, fully excavate by using special for multi-layer artificial neural network forward operation is upper The reusability of input neuron and weight data, it is to avoid repeatedly read these data to internal memory, drops Low EMS memory access bandwidth, it is to avoid memory bandwidth turns into multi-layer artificial neural network forward operation performance The problem of bottleneck.
The process or method described in accompanying drawing above can by including hardware (for example, circuit, specially With logic etc.), firmware, software is (for example, be embodied in non-transient computer-readable media Software), or both the processing logic of combination perform.Although being retouched above according to the operation of some orders Process or method are stated, however, it is to be understood that described some operations can be held with different order OK.In addition, concurrently rather than certain operations can be sequentially performed.
In foregoing specification, each implementation of the present invention is described with reference to its certain exemplary embodiments Example.Obviously, various modifications can be made to each embodiment, without departing from the sheet described in appended claims The wider spirit and scope of invention.Correspondingly, specification and drawings should be considered as illustrative , rather than it is restricted.

Claims (6)

1. a kind of device for being used to perform artificial neural network self study computing, including instruction storage are single Member, controller unit, data access unit, interconnecting modules, main computing module and multiple from fortune Module is calculated, wherein:
The location of instruction is used to read in the finger for instructing and caching reading by data access unit Order;
The controller unit, which is used to read from the location of instruction, to be instructed, and by the Instruction decoding into control Interconnecting modules processed, main computing module and the control signal from computing module behavior, then will be respective Control signal be distributed to modules;
The data access unit is used to access external address space, completes the loading and storage of data;
The interconnecting modules have different topology realization, for by the input vector of the main computing module It is distributed to the multiple from computing module, and returns after respectively merging from the result of calculation of computing module To main computing module;
The main computing module is used to carry out activation primitive, Ji to the median that the interconnecting modules are returned Buss sample, and the biasing to activation primitive renewal;
It is described to be used for the dot-product operation of input vector and respective weights matrix, input vector from computing module In respective component scalar sum respective weights matrix product calculation, and weight matrix renewal.
2. the device as claimed in claim 1 for being used to perform artificial neural network self study computing, Characterized in that, the main computing module includes arithmetic element, data dependence relation judging unit and deposited Storage unit, wherein,
The memory cell be used to caching the input data that main computing module uses in calculating process and Output data,
The arithmetic element is used for the computing for completing main computing module;
The data dependence relation judging unit is the port of the arithmetic element and read-write memory cell, Read-write uniformity for ensureing data in memory cell.
3. the device as claimed in claim 2 for being used to perform artificial neural network self study computing, Characterized in that, the data dependence relation judging unit be used to judging the control signal that has not carried out with It whether there is dependence between the data of control signal during being carrying out, if it does not, Allow this group of control signal to launch immediately, otherwise need all controls relied on when this control signal This group of control signal just allows to be launched after the completion of signal processed is all performed.
4. the device as claimed in claim 3 for being used to perform artificial neural network self study computing, Characterized in that, the data dependence relation judging unit is additionally operable to reading data passing through interconnecting modules It is sent to from computing module.
5. the device as claimed in claim 1 for being used to perform artificial neural network self study computing, Characterized in that, it is each described from computing module include arithmetic element, data dependence relation judging unit, First memory cell, the second memory cell and the 3rd memory cell, wherein,
The arithmetic element is used to receiving the control signal that controller unit sends and carries out arithmetic logic Computing;
The data dependence relation judging unit is used to be monitored the read-write operation of memory cell, with Ensure that uniformity conflict is not present in the read-write to memory cell;
First memory cell is used for the input vector and result of calculation for caching neuron;
Second memory cell is used to cache the power needed in calculating process from computing module Value Data;
3rd memory cell is used to cache accordingly to be needed from computing module during weights are updated Weights gradient data.
6. a kind of method for performing artificial neural network successively self study computing, the ANN Network includes two layers or more than two layers of multiple neurons, and the self study pre-training of artificial neural network is adopted With successively training, for each layer, the pre-training is divided into four-stage:
First stage, input neuron vectorWith weight vector matrixDot-product operation is carried out to obtain Local induction domain, local induction domain uses gibbs (Gibbs) again after activation primitive nonlinear transformation Sampling calculating obtains single order hidden layer median
Second stage, first by the transposition of weight vector matrixWith the transposition of single order hidden layer median Dot-product operation is carried out, it is locally adopted in induction domain using Gibbs again after activation primitive nonlinear transformation Sample obtains single order visible layer median
Phase III, input single order visible layer medianWith weight vector matrixCarry out dot product fortune Calculation obtains local induction domain, and local induction domain obtains the second hidden layer after activation primitive nonlinear transformation Median
Fourth stage, weight is updated according to equation below:
<mrow> <mover> <mi>W</mi> <mo>&amp;LeftRightArrow;</mo> </mover> <mo>&amp;LeftArrow;</mo> <mover> <mi>W</mi> <mo>&amp;LeftRightArrow;</mo> </mover> <mo>-</mo> <mo>&amp;Element;</mo> <mrow> <mo>(</mo> <mover> <msub> <mi>h</mi> <mn>0</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mo>&amp;times;</mo> <msup> <mover> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mi>T</mi> </msup> <mo>-</mo> <mover> <msub> <mi>h</mi> <mn>1</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mo>&amp;times;</mo> <msup> <mover> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mi>T</mi> </msup> <mo>)</mo> </mrow> <mn>......</mn> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <mover> <mi>b</mi> <mo>&amp;RightArrow;</mo> </mover> <mo>&amp;LeftArrow;</mo> <mover> <mi>b</mi> <mo>&amp;RightArrow;</mo> </mover> <mo>-</mo> <mo>&amp;Element;</mo> <mrow> <mo>(</mo> <mover> <msub> <mi>h</mi> <mn>0</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mo>-</mo> <mover> <msub> <mi>h</mi> <mn>1</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mo>)</mo> </mrow> <mn>......</mn> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>
<mrow> <mover> <mi>c</mi> <mo>&amp;RightArrow;</mo> </mover> <mo>&amp;LeftArrow;</mo> <mover> <mi>c</mi> <mo>&amp;RightArrow;</mo> </mover> <mo>-</mo> <mo>&amp;Element;</mo> <mrow> <mo>(</mo> <mover> <msub> <mi>v</mi> <mn>0</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mo>-</mo> <mover> <msub> <mi>v</mi> <mn>1</mn> </msub> <mo>&amp;RightArrow;</mo> </mover> <mo>)</mo> </mrow> <mn>......</mn> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
Wherein, it is vectorialFor vector sum weight matrix before progress of above-mentioned first and third stage activation primitive Dot product part and the biasing added, vectorBiasing when being then second stage;"×" is represented in formula The multiplication cross of vector, ∈ is then learning rate.
CN201610267211.0A 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation Active CN107316078B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610267211.0A CN107316078B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation
CN201910402047.3A CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610267211.0A CN107316078B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201910402047.3A Division CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Publications (2)

Publication Number Publication Date
CN107316078A true CN107316078A (en) 2017-11-03
CN107316078B CN107316078B (en) 2021-05-07

Family

ID=60185046

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910402047.3A Active CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation
CN201610267211.0A Active CN107316078B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910402047.3A Active CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Country Status (1)

Country Link
CN (2) CN110188870B (en)

Cited By (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108189A (en) * 2017-12-15 2018-06-01 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN108364065A (en) * 2018-01-19 2018-08-03 上海兆芯集成电路有限公司 Adopt the microprocessor of booth multiplication
CN108710958A (en) * 2018-05-16 2018-10-26 北京旋极信息技术股份有限公司 A kind of prediction health control method and device, computer readable storage medium
CN108763360A (en) * 2018-05-16 2018-11-06 北京旋极信息技术股份有限公司 A kind of sorting technique and device, computer readable storage medium
CN108859477A (en) * 2018-07-05 2018-11-23 吉林工程技术师范学院 A kind of children's literature book binder and its control method
CN109542837A (en) * 2018-11-30 2019-03-29 上海寒武纪信息科技有限公司 Operation method, device and Related product
CN109754062A (en) * 2017-11-07 2019-05-14 上海寒武纪信息科技有限公司 The execution method and Related product of convolution extended instruction
CN109784125A (en) * 2017-11-10 2019-05-21 福州瑞芯微电子股份有限公司 Deep learning network processing device, method and image processing unit
CN109902811A (en) * 2017-12-11 2019-06-18 北京中科寒武纪科技有限公司 Neural network computing device and method
CN109919313A (en) * 2019-01-31 2019-06-21 华为技术有限公司 A kind of method and distribution training system of gradient transmission
CN109961136A (en) * 2017-12-14 2019-07-02 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN109978160A (en) * 2019-03-25 2019-07-05 北京中科寒武纪科技有限公司 Configuration device, method and the Related product of artificial intelligence process device
CN109993289A (en) * 2017-12-30 2019-07-09 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN109993290A (en) * 2017-12-30 2019-07-09 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN110059809A (en) * 2018-10-10 2019-07-26 北京中科寒武纪科技有限公司 A kind of computing device and Related product
CN110147249A (en) * 2018-02-12 2019-08-20 上海寒武纪信息科技有限公司 A kind of calculation method and device of network model
CN110163350A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163361A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163349A (en) * 2018-02-12 2019-08-23 上海寒武纪信息科技有限公司 A kind of calculation method and device of network model
CN110196734A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 A kind of computing device and Related product
CN110197270A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197271A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197269A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197272A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197268A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197273A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110276447A (en) * 2018-03-14 2019-09-24 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110383300A (en) * 2018-02-13 2019-10-25 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110472734A (en) * 2018-05-11 2019-11-19 上海寒武纪信息科技有限公司 A kind of computing device and Related product
CN110806903A (en) * 2018-08-01 2020-02-18 珠海格力电器股份有限公司 Configuration parameter determining method and device of electric cooker
CN111047045A (en) * 2018-10-12 2020-04-21 中科寒武纪科技股份有限公司 Distribution system and method for machine learning operation
CN111079908A (en) * 2018-10-18 2020-04-28 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus
CN111178492A (en) * 2018-11-09 2020-05-19 中科寒武纪科技股份有限公司 Computing device, related product and computing method for executing artificial neural network model
CN111260046A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111258641A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111461340A (en) * 2020-03-10 2020-07-28 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN112329619A (en) * 2020-11-04 2021-02-05 济南博观智能科技有限公司 Face recognition method and device, electronic equipment and readable storage medium
CN112805727A (en) * 2018-10-08 2021-05-14 深爱智能科技有限公司 Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network
CN114071781A (en) * 2021-11-16 2022-02-18 杭州电子科技大学 Wireless local area network medium access control method
US11620130B2 (en) 2018-02-13 2023-04-04 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11651202B2 (en) 2017-12-30 2023-05-16 Cambricon Technologies Corporation Limited Integrated circuit chip device and related product
US11704544B2 (en) 2017-12-30 2023-07-18 Cambricon Technologies Corporation Limited Integrated circuit chip device and related product
US11797467B2 (en) 2018-10-18 2023-10-24 Shanghai Cambricon Information Technology Co., Ltd. Data processing device with transmission circuit
US11847554B2 (en) 2019-04-18 2023-12-19 Cambricon Technologies Corporation Limited Data processing method and related products
CN113807510B (en) * 2017-12-30 2024-05-10 中科寒武纪科技股份有限公司 Integrated circuit chip device and related products

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111080400B (en) * 2019-11-25 2023-04-18 中山大学 Commodity recommendation method and system based on gate control graph convolution network and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732274A (en) * 2015-03-10 2015-06-24 华南理工大学 Intelligent computer
CN104757992A (en) * 2015-03-16 2015-07-08 广东工业大学 Cardiac sound diagnostic system based on depth confidence network and diagnostic method
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105447569A (en) * 2015-12-18 2016-03-30 北京柏惠维康科技有限公司 Breast cancer cell characteristic analysis system based on deep learning
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729678B (en) * 2013-12-12 2016-10-05 中国科学院信息工程研究所 A kind of based on navy detection method and the system of improving DBN model
CN104157290B (en) * 2014-08-19 2017-10-24 大连理工大学 A kind of method for distinguishing speek person based on deep learning
CN104182772B (en) * 2014-08-19 2017-10-24 大连理工大学 A kind of gesture identification method based on deep learning
CN105184366B (en) * 2015-09-15 2018-01-09 中国科学院计算技术研究所 A kind of time-multiplexed general neural network processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732274A (en) * 2015-03-10 2015-06-24 华南理工大学 Intelligent computer
CN104757992A (en) * 2015-03-16 2015-07-08 广东工业大学 Cardiac sound diagnostic system based on depth confidence network and diagnostic method
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN105447569A (en) * 2015-12-18 2016-03-30 北京柏惠维康科技有限公司 Breast cancer cell characteristic analysis system based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUNJI CHEN 等: ""DaDianNao: A Machine-Learning Supercomputer"", 《INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE》 *

Cited By (105)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754062A (en) * 2017-11-07 2019-05-14 上海寒武纪信息科技有限公司 The execution method and Related product of convolution extended instruction
CN109754062B (en) * 2017-11-07 2024-05-14 上海寒武纪信息科技有限公司 Execution method of convolution expansion instruction and related product
CN109784125A (en) * 2017-11-10 2019-05-21 福州瑞芯微电子股份有限公司 Deep learning network processing device, method and image processing unit
CN109902814A (en) * 2017-12-11 2019-06-18 北京中科寒武纪科技有限公司 Neural network computing module and method
CN109902814B (en) * 2017-12-11 2020-01-17 中科寒武纪科技股份有限公司 Neural network operation module and method
CN111738431B (en) * 2017-12-11 2024-03-05 中科寒武纪科技股份有限公司 Neural network computing device and method
US11657258B2 (en) 2017-12-11 2023-05-23 Cambricon Technologies Corporation Limited Neural network calculation apparatus and method
CN109902811A (en) * 2017-12-11 2019-06-18 北京中科寒武纪科技有限公司 Neural network computing device and method
US11803735B2 (en) 2017-12-11 2023-10-31 Cambricon Technologies Corporation Limited Neural network calculation apparatus and method
CN111738431A (en) * 2017-12-11 2020-10-02 中科寒武纪科技股份有限公司 Neural network operation device and method
CN109961136B (en) * 2017-12-14 2020-05-19 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN109961136A (en) * 2017-12-14 2019-07-02 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN108108189A (en) * 2017-12-15 2018-06-01 北京中科寒武纪科技有限公司 A kind of computational methods and Related product
CN109993290A (en) * 2017-12-30 2019-07-09 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN113807510A (en) * 2017-12-30 2021-12-17 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
US11734548B2 (en) 2017-12-30 2023-08-22 Cambricon Technologies Corporation Limited Integrated circuit chip device and related product
US11710031B2 (en) 2017-12-30 2023-07-25 Cambricon Technologies Corporation Limited Parallel processing circuits for neural networks
CN113807510B (en) * 2017-12-30 2024-05-10 中科寒武纪科技股份有限公司 Integrated circuit chip device and related products
US11651202B2 (en) 2017-12-30 2023-05-16 Cambricon Technologies Corporation Limited Integrated circuit chip device and related product
CN109993289A (en) * 2017-12-30 2019-07-09 北京中科寒武纪科技有限公司 Integrated circuit chip device and Related product
CN109993290B (en) * 2017-12-30 2021-08-06 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
US11704544B2 (en) 2017-12-30 2023-07-18 Cambricon Technologies Corporation Limited Integrated circuit chip device and related product
CN109993289B (en) * 2017-12-30 2021-09-21 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN108364065B (en) * 2018-01-19 2020-09-11 上海兆芯集成电路有限公司 Microprocessor for booth multiplication
CN108364065A (en) * 2018-01-19 2018-08-03 上海兆芯集成电路有限公司 Adopt the microprocessor of booth multiplication
CN110163349A (en) * 2018-02-12 2019-08-23 上海寒武纪信息科技有限公司 A kind of calculation method and device of network model
CN110147249B (en) * 2018-02-12 2021-02-09 上海寒武纪信息科技有限公司 Network model calculation method and device
CN110163349B (en) * 2018-02-12 2021-03-23 上海寒武纪信息科技有限公司 Network model calculation method and device
CN110147249A (en) * 2018-02-12 2019-08-20 上海寒武纪信息科技有限公司 A kind of calculation method and device of network model
CN110163363A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
US11704125B2 (en) 2018-02-13 2023-07-18 Cambricon (Xi'an) Semiconductor Co., Ltd. Computing device and method
US11663002B2 (en) 2018-02-13 2023-05-30 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11709672B2 (en) 2018-02-13 2023-07-25 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN110383300A (en) * 2018-02-13 2019-10-25 上海寒武纪信息科技有限公司 A kind of computing device and method
US11720357B2 (en) 2018-02-13 2023-08-08 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN110163362B (en) * 2018-02-13 2020-12-11 上海寒武纪信息科技有限公司 Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
US11620130B2 (en) 2018-02-13 2023-04-04 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN110163362A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163357A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163360A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163356A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163360B (en) * 2018-02-13 2021-06-25 上海寒武纪信息科技有限公司 Computing device and method
CN110163361B (en) * 2018-02-13 2021-06-25 上海寒武纪信息科技有限公司 Computing device and method
CN110163350B (en) * 2018-02-13 2021-06-08 上海寒武纪信息科技有限公司 Computing device and method
CN110163363B (en) * 2018-02-13 2021-05-11 上海寒武纪信息科技有限公司 Computing device and method
CN110163361A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110163350A (en) * 2018-02-13 2019-08-23 上海寒武纪信息科技有限公司 A kind of computing device and method
US11740898B2 (en) 2018-02-13 2023-08-29 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN110163356B (en) * 2018-02-13 2020-10-09 上海寒武纪信息科技有限公司 Computing device and method
CN110383300B (en) * 2018-02-13 2024-03-05 上海寒武纪信息科技有限公司 Computing device and method
CN110197270A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197271A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197270B (en) * 2018-02-27 2020-10-30 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN110197268A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN111767998A (en) * 2018-02-27 2020-10-13 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN111767998B (en) * 2018-02-27 2024-05-14 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products
CN111767996A (en) * 2018-02-27 2020-10-13 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN110197273A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110197273B (en) * 2018-02-27 2020-08-25 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN110197269A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
TWI786255B (en) * 2018-02-27 2022-12-11 大陸商寒武紀(西安)集成電路有限公司 Integrated circuit chip device, chip, intelligent device, and computing method of neural network
CN110197272A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 Integrated circuit chip device and Related product
CN110196734A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 A kind of computing device and Related product
CN111767996B (en) * 2018-02-27 2024-03-05 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products
CN110197271B (en) * 2018-02-27 2020-10-27 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN111626413A (en) * 2018-03-14 2020-09-04 上海寒武纪信息科技有限公司 Computing device and method
CN110276447A (en) * 2018-03-14 2019-09-24 上海寒武纪信息科技有限公司 A kind of computing device and method
CN110472734A (en) * 2018-05-11 2019-11-19 上海寒武纪信息科技有限公司 A kind of computing device and Related product
CN110472734B (en) * 2018-05-11 2024-03-29 上海寒武纪信息科技有限公司 Computing device and related product
CN108710958A (en) * 2018-05-16 2018-10-26 北京旋极信息技术股份有限公司 A kind of prediction health control method and device, computer readable storage medium
CN108710958B (en) * 2018-05-16 2022-04-15 北京旋极信息技术股份有限公司 Predictive health management method and device and computer readable storage medium
CN108763360A (en) * 2018-05-16 2018-11-06 北京旋极信息技术股份有限公司 A kind of sorting technique and device, computer readable storage medium
CN108859477A (en) * 2018-07-05 2018-11-23 吉林工程技术师范学院 A kind of children's literature book binder and its control method
CN110806903A (en) * 2018-08-01 2020-02-18 珠海格力电器股份有限公司 Configuration parameter determining method and device of electric cooker
CN112805727A (en) * 2018-10-08 2021-05-14 深爱智能科技有限公司 Artificial neural network operation acceleration device for distributed processing, artificial neural network acceleration system using same, and method for accelerating artificial neural network
CN110059809B (en) * 2018-10-10 2020-01-17 中科寒武纪科技股份有限公司 Computing device and related product
CN110059809A (en) * 2018-10-10 2019-07-26 北京中科寒武纪科技有限公司 A kind of computing device and Related product
CN111047045A (en) * 2018-10-12 2020-04-21 中科寒武纪科技股份有限公司 Distribution system and method for machine learning operation
CN111079908A (en) * 2018-10-18 2020-04-28 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus
US11880330B2 (en) 2018-10-18 2024-01-23 Shanghai Cambricon Information Technology Co., Ltd. Network-on-chip data processing method and device
US11971836B2 (en) 2018-10-18 2024-04-30 Shanghai Cambricon Information Technology Co., Ltd. Network-on-chip data processing method and device
US11960431B2 (en) 2018-10-18 2024-04-16 Guangzhou University Network-on-chip data processing method and device
US11797467B2 (en) 2018-10-18 2023-10-24 Shanghai Cambricon Information Technology Co., Ltd. Data processing device with transmission circuit
CN111079908B (en) * 2018-10-18 2024-02-13 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus
US11809360B2 (en) 2018-10-18 2023-11-07 Shanghai Cambricon Information Technology Co., Ltd. Network-on-chip data processing method and device
US11841816B2 (en) 2018-10-18 2023-12-12 Shanghai Cambricon Information Technology Co., Ltd. Network-on-chip data processing method and device
US11880329B2 (en) 2018-10-18 2024-01-23 Shanghai Cambricon Information Technology Co., Ltd. Arbitration based machine learning data processor
US11868299B2 (en) 2018-10-18 2024-01-09 Shanghai Cambricon Information Technology Co., Ltd. Network-on-chip data processing method and device
US11880328B2 (en) 2018-10-18 2024-01-23 Shanghai Cambricon Information Technology Co., Ltd. Network-on-chip data processing method and device
CN111178492B (en) * 2018-11-09 2020-12-11 安徽寒武纪信息科技有限公司 Computing device, related product and computing method for executing artificial neural network model
CN111178492A (en) * 2018-11-09 2020-05-19 中科寒武纪科技股份有限公司 Computing device, related product and computing method for executing artificial neural network model
CN111258641A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Operation method, device and related product
CN109542837A (en) * 2018-11-30 2019-03-29 上海寒武纪信息科技有限公司 Operation method, device and Related product
CN111260046A (en) * 2018-11-30 2020-06-09 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111260046B (en) * 2018-11-30 2022-12-02 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111258641B (en) * 2018-11-30 2022-12-09 上海寒武纪信息科技有限公司 Operation method, device and related product
CN109919313A (en) * 2019-01-31 2019-06-21 华为技术有限公司 A kind of method and distribution training system of gradient transmission
CN109978160A (en) * 2019-03-25 2019-07-05 北京中科寒武纪科技有限公司 Configuration device, method and the Related product of artificial intelligence process device
US11847554B2 (en) 2019-04-18 2023-12-19 Cambricon Technologies Corporation Limited Data processing method and related products
CN111461340B (en) * 2020-03-10 2023-03-31 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN111461340A (en) * 2020-03-10 2020-07-28 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN112329619A (en) * 2020-11-04 2021-02-05 济南博观智能科技有限公司 Face recognition method and device, electronic equipment and readable storage medium
CN114071781B (en) * 2021-11-16 2024-04-12 杭州电子科技大学 Wireless local area network medium access control method
CN114071781A (en) * 2021-11-16 2022-02-18 杭州电子科技大学 Wireless local area network medium access control method

Also Published As

Publication number Publication date
CN110188870B (en) 2021-10-12
CN110188870A (en) 2019-08-30
CN107316078B (en) 2021-05-07

Similar Documents

Publication Publication Date Title
CN107316078A (en) Apparatus and method for performing artificial neural network self study computing
EP3451157B1 (en) Device and method for performing forward operation of convolutional neural network
US10713568B2 (en) Apparatus and method for executing reversal training of artificial neural network
CN107341547B (en) Apparatus and method for performing convolutional neural network training
US20200097806A1 (en) Processing method and accelerating device
CN111860811B (en) Device and method for executing full-connection layer forward operation of artificial neural network
CN110929863B (en) Apparatus and method for performing LSTM operations
CN107341541A (en) A kind of apparatus and method for performing full articulamentum neural metwork training
CN107301454B (en) Artificial neural network reverse training device and method supporting discrete data representation
CN107301453B (en) Artificial neural network forward operation device and method supporting discrete data representation
CN107832844A (en) A kind of information processing method and Related product
WO2017185347A1 (en) Apparatus and method for executing recurrent neural network and lstm computations
CN106991476A (en) Apparatus and method for performing artificial neural network forward operation
EP3564863B1 (en) Apparatus for executing lstm neural network operation, and operational method
CN108320018B (en) Artificial neural network operation device and method
EP3451240A1 (en) Apparatus and method for performing auto-learning operation of artificial neural network
EP3444758B1 (en) Discrete data representation-supporting apparatus and method for back-training of artificial neural network
CN108960415B (en) Processing apparatus and processing system
CN111178492A (en) Computing device, related product and computing method for executing artificial neural network model
CN109146069B (en) Arithmetic device, arithmetic method, and chip

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

GR01 Patent grant
GR01 Patent grant