CN107316078B - Apparatus and method for performing artificial neural network self-learning operation - Google Patents

Apparatus and method for performing artificial neural network self-learning operation Download PDF

Info

Publication number
CN107316078B
CN107316078B CN201610267211.0A CN201610267211A CN107316078B CN 107316078 B CN107316078 B CN 107316078B CN 201610267211 A CN201610267211 A CN 201610267211A CN 107316078 B CN107316078 B CN 107316078B
Authority
CN
China
Prior art keywords
neural network
instruction
unit
artificial neural
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610267211.0A
Other languages
Chinese (zh)
Other versions
CN107316078A (en
Inventor
李震
郭崎
陈云霁
陈天石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cambricon Technologies Corp Ltd
Original Assignee
Cambricon Technologies Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cambricon Technologies Corp Ltd filed Critical Cambricon Technologies Corp Ltd
Priority to CN201910402047.3A priority Critical patent/CN110188870B/en
Priority to CN201610267211.0A priority patent/CN107316078B/en
Publication of CN107316078A publication Critical patent/CN107316078A/en
Application granted granted Critical
Publication of CN107316078B publication Critical patent/CN107316078B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

An apparatus and method for performing artificial neural network self-learning operations includes a controller unit, an interconnection module, a master operation module, and a plurality of slave operation modules. The self-learning pre-training of the multi-layer neural network can be completed by the self-learning pre-training of each layer network after the self-learning pre-training of the layer network is completed through multiple operation iterations until the weight updating is smaller than a certain threshold value according to a training mode of layer-by-layer training. The first visible layer intermediate value and the second hidden layer intermediate value are respectively calculated and generated in the first three stages, and the weights are updated in the last stage by using the intermediate values in the first three stages.

Description

Apparatus and method for performing artificial neural network self-learning operation
Technical Field
The present disclosure relates to artificial neural network technology, and in particular, to an apparatus and method for performing artificial neural network self-learning operations.
Background
The multilayer artificial neural network is widely applied to the fields of pattern recognition, image processing, function approximation, optimization calculation and the like, and in recent years, the multilayer artificial neural network is more and more widely concerned by academia and industry due to higher recognition accuracy and better parallelism.
A typical multi-layer artificial neural network training method is the Back Propagation (BP) algorithm. This method is representative of supervised learning, and requires a large number of labeled training samples during the training process, however, the cost required for sample collection is expensive. Meanwhile, in the training process of the method, the error correction signal is reduced along with the increase of the number of the propagation layers, the training is easy to converge on the local minimum value, and the convergence speed is low. Therefore, the network parameters are pre-trained by adopting a self-learning algorithm with high convergence rate and without labeled training samples, and then the multi-layer neural network is finely adjusted by adopting back propagation training to become a new hot spot. Among them, the self-learning operation as the pre-training is particularly important.
One known method of supporting multi-layer artificial neural network self-learning operations is to use a general purpose processor. The method supports the above algorithm by executing general instructions using a general register file and general functional units. One of the disadvantages of this method is that the single general-purpose processor has a low operation performance and cannot meet the performance requirements of the common multi-layer artificial neural network operation. When multiple general-purpose processors are executed in parallel, the mutual communication between the general-purpose processors becomes a performance bottleneck. In addition, the general processor needs to decode the multilayer artificial neural network pre-training operation into a long-row operation and access instruction sequence, and the front-end decoding of the processor brings large power consumption overhead
Another known approach to support multi-layer artificial neural network pre-training is to use a Graphics Processor (GPU). The method supports the above algorithm by executing general purpose SIMD instructions using a general purpose register file and a general purpose stream processing unit. Because the GPU is a device specially used for performing graphic image operations and scientific calculations, there is no special support for operations of the multilayer artificial neural network, and a large amount of front-end decoding work is still required to perform operations of the multilayer artificial neural network, which brings a large amount of additional overhead. In addition, the GPU only has small on-chip cache, model data (weight) of the multilayer artificial neural network needs to be carried from the outside of the chip repeatedly, and the bandwidth of the outside of the chip becomes a main performance bottleneck. In addition, the GPU has only a small on-chip cache, and model data (weight) of the multilayer artificial neural network needs to be repeatedly carried off-chip, and off-chip bandwidth becomes a main performance bottleneck, and brings huge power consumption overhead.
Disclosure of Invention
The method aims to solve the problems that in the prior art, a series of simple operations and access operations are needed for pre-training a multi-layer neural network by a general purpose processor (GPU, CPU), the front-end decoding power consumption overhead is high, the data access overhead of the conventional general purpose processor is high, the operation performance of a single general purpose processor is low, and the like.
The present disclosure proposes a device for performing an artificial neural network self-learning operation, comprising an instruction storage unit, a controller unit, a data access unit, an interconnection module, a master operation module, and a plurality of slave operation modules, wherein: the instruction storage unit is used for reading in instructions through the data access unit and caching the read instructions; the controller unit is used for reading an instruction from the instruction storage unit, decoding the instruction into control signals for controlling the behaviors of the interconnection module, the main operation module and the slave operation module, and then distributing the respective control signals to the modules; the data access unit is used for accessing an external address space and finishing the loading and the storage of data; the interconnection module has different topology realization and is used for distributing the input vector of the master operation module to the plurality of slave operation modules and combining the calculation results of the slave operation modules and returning the combined calculation results to the master operation module; the main operation module is used for carrying out activation function and Gibbs sampling on the intermediate value returned by the interconnection module and updating the bias of the activation function; the slave operation module is used for performing dot product operation on the input vector and the corresponding weight matrix, performing product operation on the corresponding component scalar in the input vector and the corresponding weight matrix, and updating the weight matrix.
According to a specific embodiment of the present disclosure, the main operation module includes an operation unit, a data dependency relationship determination unit, and a storage unit, where the storage unit is configured to cache input data and output data used by the main operation module in a calculation process, and the operation unit is configured to complete an operation of the main operation module; the data dependency relationship judging unit is a port of the operation unit and the read-write storage unit and is used for ensuring the read-write consistency of data in the storage unit.
According to a specific embodiment of the present disclosure, the data dependency relationship determining unit is configured to determine whether a dependency relationship exists between a control signal that is not yet executed and data of a control signal that is being executed, and if not, allow the set of control signals to be immediately transmitted, otherwise, it is required to wait until all control signals that are depended on by the set of control signals are completely executed before allowing the set of control signals to be transmitted.
According to a specific embodiment of the present disclosure, the data dependency relationship determination unit is further configured to send the read data to the slave computing module through the interconnection module.
According to a specific embodiment of the present disclosure, each slave operation module includes an operation unit, a data dependency relationship determination unit, a first storage unit, a second storage unit, and a third storage unit, wherein the operation unit is configured to receive a control signal sent by the controller unit and perform an arithmetic logic operation; the data dependency relationship judging unit is used for monitoring the read-write operation of the cache unit so as to ensure that consistency conflict does not exist in the read-write operation of the cache unit; the first storage unit is used for caching input vectors and calculation results of the neurons; the second storage unit is used for caching weight data required by the slave operation module in the calculation process; the third storage unit is used for caching weight gradient data required by the corresponding slave operation module in the process of updating the weight.
The present disclosure also provides a method for performing a layer-by-layer self-learning operation of an artificial neural network, the artificial neural network comprising a plurality of neurons of two or more layers, the self-learning pre-training of the artificial neural network employing layer-by-layer training, the pre-training being divided into four stages for each layer:
the first stage, inputting neuron vector
Figure GDA0001659564890000031
And weight vector matrix
Figure GDA0001659564890000032
Performing dot product operation to obtain a local induction domain, performing nonlinear transformation on the local induction domain through an activation function, and then performing Gibbs sampling calculation to obtain a first-order hidden layer intermediate value
Figure GDA0001659564890000033
In the second stage, the transposition of the weight vector matrix is performed first
Figure GDA0001659564890000034
And transposing of first-order hidden layer intermediate values
Figure GDA0001659564890000035
Performing dot product operation, wherein the local induction domain of the linear transformation is subjected to nonlinear transformation of an activation function, and then Gibbs sampling is adopted to obtain a first-order visible layer intermediate value
Figure GDA0001659564890000036
The third stage, inputting the middle value of the first-order visible layer
Figure GDA0001659564890000037
And weight vector matrix
Figure GDA0001659564890000038
Performing dot product operation to obtain a local induced domain, and performing nonlinear transformation on the local induced domain through an activation function to obtain a second hidden layer intermediate value
Figure GDA0001659564890000039
The fourth stage, updating the weights according to the following formula:
Figure GDA00016595648900000310
Figure GDA0001659564890000041
Figure GDA0001659564890000042
wherein the vector
Figure GDA0001659564890000043
The sum of the dot product of the vector and the weight matrix before the activation function is applied for the first and third stages, the vector
Figure GDA0001659564890000044
The bias at the second stage; in the formula, "x" represents cross multiplication of the vector, and e is the learning rate.
Compared with the prior art, the method and the device optimize the multilayer neural network pre-training instruction, the processor can finish pre-training learning of one layer of the neural network by only one instruction, and the front-end decoding overhead of the instruction of the general processor is reduced; meanwhile, the method comprises a main operation module, a plurality of slave operation modules and a large amount of distributed on-chip storage and memory access alleviation overhead, and can execute neural network pre-training operation in parallel without frequent off-chip data access. In summary, the performance power consumption ratio of the present disclosure is much higher than that of a general purpose processor.
The present disclosure may be applied in the following (including but not limited to) scenarios: the system comprises various electronic products such as a data processing device, a robot, a computer, a printer, a scanner, a telephone, a tablet computer, an intelligent terminal, a mobile phone, a driving recorder, a navigator, a sensor, a camera, a cloud server, a camera, a video camera, a projector, a watch, an earphone, a mobile storage device and a wearable device; various vehicles such as airplanes, ships, vehicles, and the like; various household appliances such as televisions, air conditioners, microwave ovens, refrigerators, electric cookers, humidifiers, washing machines, electric lamps, gas stoves, range hoods and the like; and various medical devices including nuclear magnetic resonance apparatuses, B-ultrasonic apparatuses, electrocardiographs and the like.
Drawings
For a more complete understanding of the present disclosure and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
FIG. 1 illustrates an example block diagram of the overall structure of an apparatus for performing artificial neural network self-learning pre-training in accordance with an embodiment of the present disclosure.
FIG. 2 schematically illustrates an H-tree structured implementation of interconnect modules in an apparatus for performing artificial neural network self-learning pre-training in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates an example block diagram of a structure of a main operation module in an apparatus for performing artificial neural network self-learning pre-training in accordance with an embodiment of the present disclosure.
FIG. 4 illustrates an example block diagram of a slave operational module structure in an apparatus for performing artificial neural network self-learning pre-training in accordance with an embodiment of the present disclosure.
FIG. 5 illustrates an example block diagram of the first and third stages of a neural network self-learning pre-training process in accordance with an embodiment of this disclosure.
FIG. 6 illustrates an example block diagram of a second stage of a neural network self-learning pre-training process in accordance with an embodiment of this disclosure.
FIG. 7 illustrates an example flow diagram of a fourth stage of a neural network self-learning pre-training process in accordance with an embodiment of the present disclosure.
FIG. 8 illustrates an example flow diagram of a single-layer neural network self-learning pre-training iteration in accordance with an embodiment of the present disclosure.
Like devices, components, units, etc. are designated with like reference numerals throughout the drawings.
Detailed Description
Other aspects, advantages, and salient features of the disclosure will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, discloses exemplary embodiments of the disclosure.
In the present disclosure, the terms "include" and "comprise," as well as derivatives thereof, mean inclusion without limitation; the term "or" is inclusive, meaning and/or.
In this specification, the various embodiments described below which are used to describe the principles of the present disclosure are by way of illustration only and should not be construed in any way to limit the scope of the invention. The following description with reference to the accompanying drawings is provided to assist in a comprehensive understanding of exemplary embodiments of the present disclosure as defined by the claims and their equivalents. The following description includes various specific details to aid understanding, but such details are to be regarded as illustrative only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Moreover, descriptions of well-known functions and constructions are omitted for clarity and conciseness. Moreover, throughout the drawings, the same reference numerals are used for similar functions and operations.
According to the self-learning pre-training of the multilayer artificial neural network of the embodiment of the disclosure, the artificial neural network comprises a plurality of neurons with two layers or more than two layers. The self-learning pre-training of the artificial neural network adopts layer-by-layer training, and the training is started from the first layer to the last layer. For each layer, the pre-training is divided into four phases:
the first stage, inputting neuron vector
Figure GDA0001659564890000061
First, the weight vector matrix is summed
Figure GDA0001659564890000062
Performing dot product operation to obtain a local induction domain, performing nonlinear transformation on the local induction domain through an activation function, and then performing Gibbs sampling calculation to obtain a first-order hidden layer intermediate value
Figure GDA0001659564890000063
In the second stage, the transposition of the weight vector matrix is performed first
Figure GDA0001659564890000064
And transposing of first-order hidden layer intermediate values
Figure GDA0001659564890000065
Performing dot product operation, wherein the local induction domain of the linear transformation is subjected to nonlinear transformation of an activation function, and then Gibbs sampling is adopted to obtain a first-order visible layer intermediate value
Figure GDA0001659564890000066
The third stage is similar to the first stage except that the third stage input is a first-order visible layer intermediate value
Figure GDA0001659564890000067
Calculating the intermediate value of the second hidden layer
Figure GDA0001659564890000068
Gibbs sampling is not needed before;
the fourth stage, updating the weights according to the following formula:
Figure GDA0001659564890000069
Figure GDA00016595648900000610
Figure GDA00016595648900000611
wherein the vector
Figure GDA00016595648900000612
The sum of the dot product of the vector and the weight matrix before the activation function is applied for the first and third stages, the vector
Figure GDA00016595648900000613
The bias at the second stage; in the formula, "x" represents cross multiplication of the vector, and e is the learning rate.
FIG. 1 illustrates an example block diagram of the overall structure of an apparatus for performing artificial neural network self-learning pre-training in accordance with this disclosure. As shown in fig. 1, the apparatus includes an instruction storage unit 1, a controller unit 2, a data access unit 3, an interconnection module 4, a master operation module 5, and a plurality of slave operation modules 6. The instruction storage unit 1, the controller unit 2, the data access unit 3, the interconnect module 4, the master operation module 5 and the slave operation module 6 may all be implemented by hardware circuits (e.g., application specific integrated circuits ASIC).
The instruction storage unit 1 reads in instructions through the data access unit 3 and buffers the read instructions.
The controller unit 2 reads the instruction from the instruction storage unit 1, translates the instruction into a control signal for controlling the behavior of other modules, and sends the control signal to other modules such as the data access unit 3, the master operation module 5, the slave operation module 6, and the like.
The data access unit 3 can access and store an external address space, and directly read and write data to each cache unit in the device to finish the loading and storage of the data.
FIG. 2 showsThe structure of the interconnect module 4 is schematically shown. The interconnect module 4 constitutes a data path between the master operational module 5 and the plurality of slave operational modules 6 and has a different structure. The interconnection is a binary tree path formed by a plurality of nodes, each node sends upstream data to two downstream nodes in the same way, combines the data returned by the two downstream nodes and returns the data to the upstream node. For example, in the first and third stages of the neural network self-learning operation, the input vector in the master operation module 5 is sent to each slave operation module 6 through the interconnection module 4; after the calculation process of the operation module 6 is completed, after the calculation process of the slave operation module is completed, the values of the neurons output by each slave operation module are gradually spliced into a complete vector consisting of local induction domains in the interconnection module, and the complete vector is returned to the master operation module 5 as an intermediate result vector to perform an activation function and perform Gibbs sampling according to requirements. And during the second stage, the intermediate value vector of the first hidden layer in the main operation module 5
Figure GDA0001659564890000071
Sent to the various slave calculation modules 6 through the interconnection module 4; after the calculation process of the slave operation module 6 is completed, the vectors returned by the two nodes at the downstream are added into one vector at the current node and returned to the upstream node, and the vector is returned to the master operation module 5 as an intermediate result vector to perform the activation function and Gibbs sampling.
Fig. 3 shows an example block diagram of the structure of the main operation module 5 in an apparatus for performing an artificial neural network pre-training operation according to the present disclosure. As shown in fig. 3, the main operation block 5 includes an operation unit 51, a data dependency relationship judgment unit 52, and a storage unit 53.
The storage unit 53 is used for caching input data and output data used by the main operation module 5 in a calculation process, the operation unit 51 completes various operation functions of the main operation module 5, and the data dependency relationship judgment unit 52 is a port for the operation unit 51 to read and write the storage unit 53, and can ensure the read-write consistency of data in the storage unit. Specifically, the data dependency relationship determining unit 52 determines whether there is a dependency relationship between the control signals that have not yet been executed and the data of the control signals that are being executed, and if not, allows the set of control signals to be immediately transmitted, otherwise, it is required to wait until all the control signals that are depended on by the set of control signals are completely executed and then allow the set of control signals to be transmitted. For example, all control signals to the data dependency unit 52 are stored in an instruction queue within the data dependency unit 52, in which queue a read data range of a read instruction must wait until the dependent write instruction is executed if it conflicts with a write data range of a write instruction located earlier in the queue. Meanwhile, the data dependency relationship determination unit 52 is also responsible for sending the read data to the slave computation module through the interconnection module 4, and the output data of the slave computation module 6 is directly sent to the operation unit 51 through the interconnection module 4. The instruction output by the controller unit 2 is sent to the calculation unit 51 and the data dependency relationship judgment unit 52 to control the behavior thereof.
Fig. 4 shows an example block diagram of the structure of the slave operational module 6 in an apparatus for performing artificial neural network pre-training according to the present disclosure. As shown in fig. 4, each slave operation module 6 includes an operation unit 61, a data dependency relationship judgment unit 62, a first storage unit 63, a second storage unit 64, and a third storage unit 65.
The arithmetic unit 61 receives the control signal from the controller unit 2 and performs arithmetic logic operation.
The data dependency relationship determination unit 62 is responsible for reading and writing operations on the cache unit in the calculation process. The data dependency judgment unit 62 ensures that there is no consistency conflict for the reading and writing of the cache unit. For example, all control signals to the data dependency unit 62 are stored in an instruction queue within the data dependency unit 62, in which queue a read data range of a read instruction must wait until the dependent write instruction is executed if it conflicts with a write data range of a write instruction located earlier in the queue.
The first storage unit 63 buffers the input neuron vectors in the respective stage processes
Figure GDA0001659564890000081
First order hidden layer intermediate value
Figure GDA0001659564890000082
First order visible layer median
Figure GDA0001659564890000083
First order hidden layer intermediate value
Figure GDA0001659564890000084
And the dot product result of the input vector and the weight matrix calculated in each stage.
The second storage unit 64 buffers the weight data required by the slave operation module 6 in the calculation process. For each slave, only the column of the weight matrix corresponding to the scalar data stored by the slave 6 is stored.
The third storage unit 65 buffers weight gradient data required by the corresponding slave operation module in the process of updating the weights. Each weight gradient data stored in the slave operation module 6 corresponds to the weight data stored therein.
The slave operation module 6 realizes the updating of the weight of the formula (1) in the first half part and the last stage of the parallel first three stages in the self-learning pre-training process of the artificial neural network.
Taking the pre-training of the artificial neural network Deep Belief Network (DBN) as an example, the weight matrix of the first three stages is used
Figure GDA0001659564890000085
(or
Figure GDA0001659564890000086
) And input neuron vector
Figure GDA0001659564890000087
Can be divided into uncorrelated parallel computing subtasks. In the first and third stages, each slave operation module 6 performs dot product multiplication operation by using the same input vector value and the weights corresponding to different components of the output vector to respectively obtain the corresponding parts of different components in the output vectorAnd, after a plurality of accumulations, these parts of their respective corresponding output components are obtained and are pieced together step by step in the interconnection module 4 to form a complete local induction domain vector. Each slave operation module 6 only needs to calculate the corresponding local induction domain of the corresponding output neuron value of the module. Different local induction domain components are spliced into a complete local induction domain vector step by step in the interconnection module 4 and transmitted to the main operation module for activation function and subsequent sampling. In the second stage, each slave operation module 6 only calculates the intermediate value vector of the input first-order hidden layer
Figure GDA0001659564890000088
Corresponding partial scalar quantities and weight matrix
Figure GDA0001659564890000091
And each output vector obtained by multiplying the corresponding columns is a partial sum to be accumulated of the final result, and the partial sums are added pairwise by pairwise in the interconnection module to obtain the final result. Each slave operation module 6 calculates partial sums of output first-order visible layer vector local induced domains, and all the partial sums are summed in the interconnection module 4 to obtain the final local induced domain. The intermediate values obtained by calculation in the first three stages are used for updating the weight, and the main operation module 5 performs subsequent operation based on the output of the operation in the first three stages to obtain a weight updating value. In the last phase, the slave operation module 6 can update the weight according to the formula (1) and can also be divided into three small steps:
1. each slave operation module 6 calculates an input first-order hidden layer intermediate value vector
Figure GDA0001659564890000092
And input neurons
Figure GDA0001659564890000093
The product median of the corresponding partial scalars;
2. each slave operation module 6 calculates an input first-order hidden layer intermediate value vector
Figure GDA0001659564890000094
And a first order visible layer vector
Figure GDA0001659564890000095
Multiplying the corresponding partial scalars and calculating the vector difference value with the first small stage intermediate value;
3. each slave operation module 6 calculates the product of the difference value of the second small stage and the learning rate to obtain a weight update value, and then the weight update value and the weight
Figure GDA0001659564890000096
And carrying out vector subtraction to obtain updated weight.
It is noted that the three small phases described above are merely an example description of updating the weights from the calculation module 6, and the user may perform fine-tuning of details, for example, the calculation of the product in the first small phase and the calculation of the product in the second small phase may be interchanged; or the third minor phase multiplied by the learning rate may be advanced to the second minor phase or even split to the first two minor phases.
According to an embodiment of the present disclosure, there is also provided an instruction set for performing an artificial neural network forward operation on the aforementioned apparatus. The instruction set comprises a CONFIG instruction, a COMPUTE instruction, an IO instruction, a NOP instruction, a JUMP instruction and a MOVE instruction, wherein:
configuring various constants required by calculation of a current layer by the CONFIG instruction before calculation of each layer of artificial neural network is started;
the COMPUTE instruction completes the arithmetic logic calculation of each layer of artificial neural network;
the IO instruction reads input data required by calculation from an external address space and stores the data back to the external space after the calculation is finished;
the NOP instruction is responsible for emptying the control signals currently loaded in all the control signal cache queues in the NOP instruction, and all instructions before the NOP instruction are guaranteed to be finished. NOP instructions do not contain any operations themselves;
the JUMP instruction is responsible for the JUMP of the next instruction address to be read from the instruction storage unit by the controller and is used for realizing the JUMP of a control flow;
the MOVE instruction is responsible for carrying data at one address in the internal address space of the device to another address in the internal address space of the device, and the process is independent of the arithmetic unit and does not occupy the resources of the arithmetic unit in the execution process.
FIG. 5 illustrates an example block diagram of the first and third stages of a neural network self-learning pre-training process in accordance with an embodiment of this disclosure. In different slave operation modules 6, the input vector broadcasted by the interconnection module 4 is respectively subjected to dot product operation with the weight vector of the slave operation module 6 to obtain the partial induction domain partial sum of the corresponding output neuron values, all the output partial induction domain values form an intermediate result vector, the intermediate result vector is subjected to offset vector addition and activation operation to obtain the final output neuron vector of the layer of neural network, and the formula is described as out ═ f (w in + b), wherein out is the output vector, in is the input vector, b is the offset vector, w is the weight matrix, and f is the activation function. The weight vector of each slave operation module 6 is the column vector corresponding to the slave operation module 6 in the weight matrix. The interconnection module 4 inputs the vector [ I ]0,…,In]The data are sent to all the slave operation units and temporarily stored in the first storage unit. For the ith slave arithmetic unit, calculate its corresponding weight vector [ W ]i0,…,Win]Dot product with the input vector. The results output from the operation units are pieced together into a complete local induction domain vector through the interconnection module 4 and returned to the main operation module 5, and the activation function operation and possible Gibbs sampling thereof are carried out in the main operation module 5 to obtain the final output vector [ O0,O1,…,On]。
FIG. 6 illustrates an example block diagram of a second stage of a neural network self-learning pre-training process in accordance with an embodiment of this disclosure. Computing and outputting a first-order visible layer vector
Figure GDA0001659564890000101
By broadcasting a first-order hidden vector value for the interconnection module 4, each taken from the calculation module 6
Figure GDA0001659564890000102
Of the corresponding partial scalar h0iAnd weight matrix
Figure GDA0001659564890000103
Corresponding column [ W ]i0,…,Win]Each output vector obtained is a partial sum to be accumulated of the local induction domain of the first-order visible layer vector, and the partial sums are added pairwise by pairwise in the interconnection module 4 to obtain the final local induction domain. The calculated local induction domain is returned to the main operation module 5, and the activation function operation and possible Gibbs sampling are carried out in the main operation module 5 to obtain the final output first-order visible layer vector
Figure GDA0001659564890000104
FIG. 7 shows a flowchart of a fourth stage of a neural network self-learning pre-training process in accordance with an embodiment of the present disclosure. In the last stage, the slave operation module 6 can update the weight according to the formula (1) and can also be divided into three small steps:
1. each slave operation module 6 calculates an input first-order hidden layer intermediate value vector
Figure GDA0001659564890000111
And input neurons
Figure GDA0001659564890000112
The intermediate value of the product of the corresponding partial scalar is cached to the third storage unit shown in fig. 4; this small stage is similar to the block diagram of the second stage shown in FIG. 6, but its inputs are first-hidden-layer intermediate-value vectors
Figure GDA0001659564890000113
And input neurons
Figure GDA0001659564890000114
2. Each slave operation module 6 calculates an input first-order hidden layer intermediate value vector
Figure GDA0001659564890000115
And a first order visible layer vector
Figure GDA0001659564890000116
The product of the corresponding partial scalar in the first small stage, and the vector difference value with the first small stage intermediate value is calculated and cached to the third storage unit shown in fig. 4;
3. each slave operation module 6 calculates the product of the difference value of the second small stage and the learning rate to obtain a weight update value, and then the weight update value and the weight
Figure GDA0001659564890000117
And carrying out vector subtraction to obtain updated weight.
It is noted that the three small phases described above are merely an example description of updating the weights from the calculation module 6, and the user may perform fine-tuning of details, for example, the calculation of the product in the first small phase and the calculation of the product in the second small phase may be interchanged; or the third minor phase multiplied by the learning rate may be advanced to the second minor phase or even split to the first two minor phases.
FIG. 8 illustrates a flow diagram of a one-layer artificial neural network self-learning pre-training operation, according to an embodiment, since the multi-layer artificial neural network self-learning pre-training may be implemented in a layer-by-layer training manner, the flow may be invoked multiple times for the multi-layer artificial neural network pre-training. The flow chart describes a process for implementing a single-layer neural network self-learning pre-training operation of the type shown in figure 4 using the apparatus and instruction set of the present disclosure.
In step S1, an IO instruction is pre-stored at the first address of instruction cache unit 1.
In step S2, the operation starts, the controller unit 2 reads the IO instruction from the first address of the instruction cache unit 1, and according to the translated control signal, the data access unit 3 reads all corresponding artificial neural network operation instructions from the external address space and caches them in the instruction storage unit 1.
In step S3, the controller unit 2 reads in the next IO instruction from the instruction storage unit, and the data access unit 3 reads all the data required by the main operation module 5 from the external address space (e.g. including input) according to the decoded control signalEntry neuron vector
Figure GDA0001659564890000118
Activation function interpolation table, learning rate, offset, and the like) to the storage unit 53 of the main operation block 5.
In step S4, the controller unit 2 then reads in the next IO instruction from the instruction storage unit, and the data access unit 3 reads the weight matrix data required from the operation module 6 from the external address space according to the decoded control signal.
At step S5, the controller unit 2 then reads in the next CONFIG instruction from the instruction storage unit, and based on the translated control signal, the device configures the various constants required for the first stage calculation of the layer neural network. For example, the arithmetic units 51, 61 configure the values of the unit internal registers according to parameters in the control signals, such as the precision setting of the calculation of the layer, the data of the activation function.
At step S6, the controller unit 2 then reads in the next component instruction from the instruction storage unit, and starts the first-stage calculation based on the translated control signal. The main operation module 5 firstly inputs the neuron vector through the interconnection module 4
Figure GDA0001659564890000121
The data is sent to each slave operation module 6 and stored in the first storage unit 63 of the slave operation module 6. The operation unit 61 of the slave operation module 6 reads the weight vector (the column vector in the weight matrix corresponding to the slave operation module 6) from the second storage unit 64, and the input neuron vector from the first storage unit
Figure GDA0001659564890000122
Completing weight vector and input neuron vector
Figure GDA0001659564890000123
And returning the intermediate result through the interconnection module. In the interconnection module 4, intermediate results returned from the operation module 6 are pieced into complete local induction domain vectors step by step. The main operation module 5 obtains the return value of the interconnection module 4 according to COMPUThe control signal translated by the TE instruction reads the offset vector from the storage unit 53, adds the offset vector with the vector returned by the interconnection module 4, activates the addition result, performs Gibbs sampling, and makes the last first-order hidden vector
Figure GDA0001659564890000124
Written back to the memory cell 53.
The controller unit 2 then reads in the next CONFIG instruction from the instruction storage unit at step S7, and based on the translated control signal, the device configures the various constants required for the second stage calculation of the layer neural network.
At step S8, the controller unit 2 then reads in the next component instruction from the instruction storage unit, and starts the second stage of calculation based on the translated control signal. The main operation module 5 firstly uses the first-order hidden layer vector quantity through the interconnection module 4
Figure GDA0001659564890000125
The data is sent to each slave operation module 6 and stored in the first storage unit 63 of the slave operation module 6. The operation unit 61 of the slave operation module 6 reads the weight vector (corresponding to the column vector of the slave operation module 6 in the weight matrix) from the second storage unit 64, and selects the first-order hidden-layer vector from the first storage unit
Figure GDA0001659564890000126
Scalar quantity of (1), completion weight vector and first-order hidden vector
Figure GDA0001659564890000127
And performing product operation on corresponding scalars, and returning an intermediate result through the interconnection module. In the interconnection block 4, the intermediate results returned from the operation block 6 are added step by step to form a complete local induction domain vector. The main operation module 5 obtains the return value of the interconnection module 4, reads the offset vector from the storage unit 53 according to the control signal decoded by the COMPUTE instruction, adds the offset vector with the vector returned by the interconnection module 4, then activates the addition result, performs Gibbs sampling, and samples the vector of the last visible layer
Figure GDA0001659564890000128
Written back to the memory cell 53.
At step S9, the controller unit 2 then reads in the next CONFIG instruction from the instruction storage unit, and based on the translated control signal, the device configures the various constants required for the third stage calculation of the layer of neural network. The configuration of the layer is basically the same as that of the first stage, but one more learning rate parameter is required to be configured.
At step S10, the controller unit 2 then reads in the next component instruction from the instruction storage unit, and starts the third-stage calculation based on the translated control signal. The main operation module 5 firstly uses the first-order hidden layer vector quantity through the interconnection module 4
Figure GDA0001659564890000131
The data is sent to each slave operation module 6 and stored in the first storage unit 63 of the slave operation module 6. Reading a first order visible layer vector from a first memory cell
Figure GDA0001659564890000132
Completion weight vector and first order visible layer vector
Figure GDA0001659564890000133
And returning the intermediate result through the interconnection module. In the interconnection module 4, intermediate results returned from the operation module 6 are pieced into complete local induction domain vectors step by step. The main operation module 5 obtains the return value of the interconnection module 4, reads the offset vector from the storage unit 53 according to the control signal decoded by the COMPUTE instruction, adds the offset vector with the vector returned by the interconnection module 4, activates the addition result, and adds the last first-order hidden layer vector
Figure GDA0001659564890000134
Written back to the memory cell 53.
At step S11, the controller unit 2 then reads in the next component instruction from the instruction storage unit, and starts the calculation at the fourth stage based on the decoded control signal. The first small-stage main operation module 5 firstly inputs the input nerve through the interconnection module 4Element vector
Figure GDA0001659564890000135
And first order hidden layer vector
Figure GDA0001659564890000136
The weight gradient is sent to each slave operation module 6 and stored in the weight gradient buffer unit 65 of the slave operation module 6. The second small stage reads the first-order hidden layer vector from the first storage unit from the operation unit 61 of the operation module 6
Figure GDA0001659564890000137
And selecting input neuron vectors
Figure GDA0001659564890000138
Corresponding component to complete the first-order hidden vector
Figure GDA0001659564890000139
And corresponding input neuron vector
Figure GDA00016595648900001310
The intermediate result and the intermediate value cached in the previous small stage read from the weight gradient caching unit 65 are subjected to vector subtraction, and the computed intermediate result is cached in the weight gradient caching unit 65. The last small stage reads the weight update value obtained by multiplying the intermediate value of the last small stage by the learning rate from the weight gradient buffer unit 65 from the operation unit 61 of the operation module 6, reads the corresponding weight and the weight update value from the weight buffer unit 64, performs vector subtraction to obtain an updated weight, and buffers the updated weight back to the weight buffer unit 64. Thus, one-time self-learning pre-training iteration of the single-layer neural network is completed, and after multiple times of iterative learning, the weight reaches a certain convergence criterion (the weight update value is less than a certain threshold), the pre-training of the single-layer neural network is finished, and the pre-training of the next-layer neural network can be started.
By adopting the device and the instruction set for executing the artificial neural network self-learning pre-training operation, the problems of insufficient operation performance of a CPU and a GPU and high front-end decoding overhead are solved. The support for the forward operation of the multilayer artificial neural network is effectively improved.
By adopting the special on-chip cache for the forward operation of the multilayer artificial neural network, the reusability of input neurons and weight data is fully mined, the data are prevented from being read to the memory repeatedly, the memory access bandwidth is reduced, and the problem that the memory bandwidth becomes the bottleneck of the forward operation performance of the multilayer artificial neural network is avoided.
Each function/unit/module/submodule in the present disclosure may be hardware, for example, the hardware may be a circuit including a digital circuit, an analog circuit, and the like. Physical implementations of hardware structures include, but are not limited to, physical devices including, but not limited to, transistors, memristors, and the like. The computing module in the computing device may be any suitable hardware processor, such as a CPU, GPU, FPGA, DSP, ASIC, and the like. The memory unit may be any suitable magnetic or magneto-optical storage medium, such as RRAM, DRAM, SRAM, EDRAM, HBM, HMC, etc.
It will be clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be performed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules to perform all or part of the above described functions.
The processes or methods depicted in the preceding figures may be performed by processing logic that comprises hardware (e.g., circuitry, dedicated logic, etc.), firmware, software (e.g., software embodied on a non-transitory computer readable medium), or a combination of both. Although the processes or methods are described above in terms of some sequential operations, it should be understood that some of the operations described may be performed in a different order. Further, some operations may be performed in parallel rather than sequentially.
In the foregoing specification, embodiments of the present disclosure have been described with reference to specific exemplary embodiments thereof. It will be evident that various modifications may be made thereto without departing from the broader spirit and scope of the disclosure as set forth in the following claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims (20)

1. An apparatus for performing artificial neural network self-learning operations, comprising a controller unit, an interconnection module, a master operation module, and a plurality of slave operation modules, wherein:
the controller unit is used for reading an instruction, decoding the instruction into control signals for controlling the behaviors of the interconnection module, the main operation module and the slave operation module, and then distributing the respective control signals to the modules;
the interconnection module has different topology realization and is used for distributing the input vector of the master operation module to the plurality of slave operation modules and combining the calculation results of the slave operation modules and returning the combined calculation results to the master operation module;
the main operation module comprises: the activation function arithmetic unit is used for carrying out activation function arithmetic on the intermediate value returned by the interconnection module; the sampling arithmetic unit is used for carrying out Gibbs sampling on the operation result of the activation function; the adder is used for updating the offset of the sampling result;
the slave operation module is used for performing dot product operation on the input vector and the corresponding weight matrix, performing product operation on a corresponding component scalar in the input vector and the corresponding weight matrix, and updating the weight matrix;
the artificial neural network comprises a plurality of neurons with two or more layers, the self-learning pre-training of the artificial neural network adopts layer-by-layer training, and for each layer of neurons, the pre-training of the artificial neural network comprises the following steps:
first phase, in the slave operation module, the input neuron vector broadcast by the interconnection module
Figure FDA0002972821060000011
And weight vector matrix
Figure FDA0002972821060000012
Carry out the transportation of the dot productCalculating to obtain a local induction domain, and calculating to obtain a first-order hidden layer intermediate value by adopting Gibbs sampling after the local induction domain is subjected to nonlinear transformation of an activation function
Figure FDA0002972821060000013
2. The apparatus for performing artificial neural network self-learning operations of claim 1, further comprising:
the instruction storage unit is used for reading in the instructions through the data access unit and caching the read instructions;
and the data access unit is used for accessing the external address space and finishing the loading and the storing of the data.
3. The apparatus for performing artificial neural network self-learning operations of claim 1, wherein the instruction comprises a compote instruction.
4. The apparatus for performing artificial neural network self-learning operations of claim 1, wherein the instructions further comprise:
the CONFIG instruction is used for configuring various constants required by calculation of a current layer before calculation of each layer of artificial neural network starts;
a COMPUTE instruction for completing arithmetic logic calculation of each layer of artificial neural network;
the IO instruction is used for reading input data required by calculation from the external address space and storing the data back to the external space after the calculation is finished;
the NOP instruction is used for emptying the control signals currently loaded in all the control signal cache queues in the NOP instruction, and all instructions before the NOP instruction are ensured to be finished, and the NOP instruction does not contain any operation;
a JUMP instruction for the controller to JUMP to a next instruction address to be read from the instruction storage unit to realize a JUMP of a control flow;
the MOVE instruction is used for transporting data of a certain address in the internal address space of the device to another address in the internal address space of the device, is independent of the arithmetic unit, and does not occupy the resources of the arithmetic unit in the execution process.
5. The apparatus for performing artificial neural network self-learning operation according to claim 1, wherein the main operation module includes an operation unit, a data dependency judgment unit, and a storage unit, wherein,
the storage unit is used for caching input data and output data used by the main operation module in the calculation process,
the operation unit is used for completing the operation of the main operation module;
the data dependency relationship judging unit is a port of the operation unit and the read-write storage unit and is used for ensuring the read-write consistency of data in the storage unit.
6. The apparatus of claim 5, wherein the data dependency relationship determining unit is configured to determine whether there is a dependency relationship between the control signal that has not been executed and the data of the control signal being executed, and if not, allow the control signal to be immediately transmitted, otherwise, allow the control signal to be transmitted after all the control signals depended on by the control signal are completely executed.
7. The apparatus for performing artificial neural network self-learning operation according to claim 6, wherein the data dependency judgment unit is further configured to send the read data to the slave computing module through the interconnection module.
8. The apparatus for performing artificial neural network self-learning operation according to claim 1, wherein each of the slave operation modules includes an operation unit, a data dependency judgment unit, a first storage unit, a second storage unit, and a third storage unit, wherein,
the arithmetic unit is used for receiving the control signal sent by the controller unit and carrying out arithmetic logic operation;
the data dependency relationship judging unit is used for monitoring the read-write operation of the storage unit so as to ensure that consistency conflict does not exist in the read-write operation of the storage unit;
the first storage unit is used for caching input vectors and calculation results of the neurons;
the second storage unit is used for caching weight data required by the slave operation module in the calculation process;
the third storage unit is used for caching weight gradient data required by the corresponding slave operation module in the process of updating the weight.
9. The apparatus for performing artificial neural network self-learning operations of claim 1, wherein the pre-training of the artificial neural network for each layer of neurons further comprises:
in the second stage, the slave operation module firstly transposes the weight vector matrix
Figure FDA0002972821060000031
And transposing of first-order hidden layer intermediate values
Figure FDA0002972821060000032
Performing dot product operation, and obtaining a first-order visible layer intermediate value by Gibbs sampling after the local induction domain in the main operation module is subjected to nonlinear transformation of an activation function
Figure FDA0002972821060000033
The third stage of receiving the intermediate value of the first-order visible layer from the operation module
Figure FDA0002972821060000034
And weight vector matrix
Figure FDA0002972821060000035
Performing dot product operation to obtain local induction domainAnd outputting the local induction domain to a main operation module, and obtaining a second hidden layer intermediate value after nonlinear transformation of an activation function
Figure FDA0002972821060000036
In the fourth stage, the slave operation module updates the weight according to the following formula:
Figure FDA0002972821060000037
Figure FDA0002972821060000038
Figure FDA0002972821060000039
wherein the vector
Figure FDA00029728210600000310
The vector and the weight matrix dot product part before the activation function is carried out for the first stage and the third stage and the added bias, the vector
Figure FDA00029728210600000311
The bias at the second stage; in the formula, "x" represents cross multiplication of the vector, and e is the learning rate.
10. A method for performing artificial neural network self-learning operation, applied to the apparatus for performing artificial neural network self-learning operation of any one of claims 1 to 9, comprising:
the controller unit reads the instruction, decodes the instruction into control signals for controlling the behaviors of the interconnection module, the main operation module and the slave operation module, and then distributes the respective control signals to the modules;
the interconnection module has different topology realization, distributes the input vector of the master operation module to the plurality of slave operation modules, combines the calculation results of the slave operation modules and returns the combined calculation results to the master operation module;
the main operation module carries out activation function and Gibbs sampling on the intermediate value returned by the interconnection module and updates the bias of the activation function;
inputting dot product operation of the vector and the corresponding weight matrix from the operation module, product operation of the corresponding component scalar in the input vector and the corresponding weight matrix, and updating the weight matrix;
the artificial neural network comprises a plurality of neurons with two or more layers, the self-learning pre-training of the artificial neural network adopts layer-by-layer training, and for each layer of neurons, the pre-training of the artificial neural network comprises the following steps:
first phase, in the slave operation module, the input neuron vector broadcast by the interconnection module
Figure FDA0002972821060000041
And weight vector matrix
Figure FDA0002972821060000042
Performing dot product operation to obtain a local induction domain, performing nonlinear transformation on the local induction domain through an activation function, and then performing Gibbs sampling calculation to obtain a first-order hidden layer intermediate value
Figure FDA0002972821060000043
11. The method for performing artificial neural network self-learning operations of claim 10, further comprising:
the instruction storage unit reads in the instruction through the data access unit and caches the read instruction;
and the data access unit accesses the external address space to complete the loading and the storing of the data.
12. The method for performing artificial neural network self-learning operations of claim 10, wherein the instruction comprises a compote instruction.
13. The method for performing artificial neural network self-learning operations of claim 10, wherein the instructions further comprise:
the CONFIG instruction is used for configuring various constants required by calculation of a current layer before calculation of each layer of artificial neural network starts;
a COMPUTE instruction for completing arithmetic logic calculation of each layer of artificial neural network;
the IO instruction is used for reading input data required by calculation from the external address space and storing the data back to the external space after the calculation is finished;
the NOP instruction is used for emptying the control signals currently loaded in all the control signal cache queues in the NOP instruction, and all instructions before the NOP instruction are ensured to be finished, and the NOP instruction does not contain any operation;
a JUMP instruction for the controller to JUMP to a next instruction address to be read from the instruction storage unit to realize a JUMP of a control flow;
the MOVE instruction is used for transporting data of a certain address in the internal address space of the device to another address in the internal address space of the device, is independent of the arithmetic unit, and does not occupy the resources of the arithmetic unit in the execution process.
14. The method for performing artificial neural network self-learning operation according to claim 10, wherein the main operation module includes an operation unit, a data dependency judgment unit, and a storage unit, wherein,
the memory unit caches input data and output data used by the main operation module in the calculation process,
the operation unit completes the operation of the main operation module;
the data dependency relationship judging unit is a port of the operation unit and the read-write storage unit and is used for ensuring the read-write consistency of data in the storage unit.
15. The method for performing artificial neural network self-learning operation as claimed in claim 14, wherein the data dependency relationship determining unit is configured to determine whether there is a dependency relationship between the control signal that has not been executed and the data of the control signal that is being executed, and if not, allow the control signal to be immediately transmitted, otherwise, allow the control signal to be transmitted after all the control signals that the control signal depends on are completely executed.
16. The method for performing artificial neural network self-learning operations of claim 15, wherein the data dependency determination unit is further configured to send the read data to the slave computing module via the interconnection module.
17. The method for performing artificial neural network self-learning operation according to claim 10, wherein each of the slave operation modules includes an operation unit, a data dependency judgment unit, a first storage unit, a second storage unit, and a third storage unit, wherein,
the arithmetic unit is used for receiving the control signal sent by the controller unit and carrying out arithmetic logic operation;
the data dependency relationship judging unit is used for monitoring the read-write operation of the storage unit so as to ensure that consistency conflict does not exist in the read-write operation of the storage unit;
the first storage unit is used for caching input vectors and calculation results of the neurons;
the second storage unit is used for caching weight data required by the slave operation module in the calculation process;
the third storage unit is used for caching weight gradient data required by the corresponding slave operation module in the process of updating the weight.
18. The method of claim 10, wherein the artificial neural network comprises a plurality of neurons in two or more layers, and wherein the self-learning pre-training of the artificial neural network employs layer-by-layer training.
19. The method for performing artificial neural network self-learning operations of claim 18, wherein the pre-training is divided into four phases for each layer of neurons:
first phase, in the slave operation module, the input neuron vector broadcast by the interconnection module
Figure FDA0002972821060000061
And weight vector matrix
Figure FDA0002972821060000062
Performing dot product operation to obtain a local induction domain, performing nonlinear transformation on the local induction domain through an activation function, and then performing Gibbs sampling calculation to obtain a first-order hidden layer intermediate value
Figure FDA0002972821060000063
In the second stage, the slave operation module firstly transposes the weight vector matrix
Figure FDA0002972821060000064
And transposing of first-order hidden layer intermediate values
Figure FDA0002972821060000065
Performing dot product operation, and obtaining a first-order visible layer intermediate value by Gibbs sampling after the local induction domain in the main operation module is subjected to nonlinear transformation of an activation function
Figure FDA0002972821060000066
The third stage of receiving the intermediate value of the first-order visible layer from the operation module
Figure FDA0002972821060000067
And weight vector matrix
Figure FDA0002972821060000068
Performing dot product operation to obtain a local induced domain, outputting the local induced domain to a main operation module, and performing nonlinear transformation on an activation function to obtain a second hidden layer intermediate value
Figure FDA0002972821060000069
In the fourth stage, the slave operation module updates the weight according to the following formula:
Figure FDA00029728210600000610
Figure FDA00029728210600000611
Figure FDA00029728210600000612
wherein the vector
Figure FDA00029728210600000613
The vector and the weight matrix dot product part before the activation function is carried out for the first stage and the third stage and the added bias, the vector
Figure FDA00029728210600000614
The bias at the second stage; in the formula, "x" represents cross multiplication of the vector, and e is the learning rate.
20. An electronic device comprising the apparatus for performing artificial neural network self-learning operations of any one of claims 1-9.
CN201610267211.0A 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation Active CN107316078B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910402047.3A CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation
CN201610267211.0A CN107316078B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610267211.0A CN107316078B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201910402047.3A Division CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Publications (2)

Publication Number Publication Date
CN107316078A CN107316078A (en) 2017-11-03
CN107316078B true CN107316078B (en) 2021-05-07

Family

ID=60185046

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910402047.3A Active CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation
CN201610267211.0A Active CN107316078B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910402047.3A Active CN110188870B (en) 2016-04-27 2016-04-27 Apparatus and method for performing artificial neural network self-learning operation

Country Status (1)

Country Link
CN (2) CN110188870B (en)

Families Citing this family (46)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754062B (en) * 2017-11-07 2024-05-14 上海寒武纪信息科技有限公司 Execution method of convolution expansion instruction and related product
CN109784125A (en) * 2017-11-10 2019-05-21 福州瑞芯微电子股份有限公司 Deep learning network processing device, method and image processing unit
CN109902814B (en) 2017-12-11 2020-01-17 中科寒武纪科技股份有限公司 Neural network operation module and method
CN110826712B (en) * 2017-12-14 2024-01-09 中科寒武纪科技股份有限公司 Neural network processor board card and related products
CN108108189B (en) * 2017-12-15 2020-10-30 安徽寒武纪信息科技有限公司 Calculation method and related product
EP3624019A4 (en) 2017-12-30 2021-03-24 Cambricon Technologies Corporation Limited Integrated circuit chip device and related product
CN109993290B (en) 2017-12-30 2021-08-06 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN109993292B (en) 2017-12-30 2020-08-04 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN109993289B (en) 2017-12-30 2021-09-21 中科寒武纪科技股份有限公司 Integrated circuit chip device and related product
CN108364065B (en) * 2018-01-19 2020-09-11 上海兆芯集成电路有限公司 Microprocessor for booth multiplication
CN110147249B (en) * 2018-02-12 2021-02-09 上海寒武纪信息科技有限公司 Network model calculation method and device
CN110163349B (en) * 2018-02-12 2021-03-23 上海寒武纪信息科技有限公司 Network model calculation method and device
US11704125B2 (en) 2018-02-13 2023-07-18 Cambricon (Xi'an) Semiconductor Co., Ltd. Computing device and method
US11630666B2 (en) 2018-02-13 2023-04-18 Shanghai Cambricon Information Technology Co., Ltd Computing device and method
CN110163350B (en) * 2018-02-13 2021-06-08 上海寒武纪信息科技有限公司 Computing device and method
EP3651078B1 (en) * 2018-02-13 2021-10-27 Shanghai Cambricon Information Technology Co., Ltd Computation device and method
CN110163361B (en) * 2018-02-13 2021-06-25 上海寒武纪信息科技有限公司 Computing device and method
CN110197273B (en) * 2018-02-27 2020-08-25 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN111767996B (en) * 2018-02-27 2024-03-05 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products
CN110197269B (en) * 2018-02-27 2020-12-29 安徽寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN110196734A (en) * 2018-02-27 2019-09-03 上海寒武纪信息科技有限公司 A kind of computing device and Related product
CN110197270B (en) * 2018-02-27 2020-10-30 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN111767998B (en) * 2018-02-27 2024-05-14 上海寒武纪信息科技有限公司 Integrated circuit chip device and related products
CN110197271B (en) * 2018-02-27 2020-10-27 上海寒武纪信息科技有限公司 Integrated circuit chip device and related product
CN111626413A (en) * 2018-03-14 2020-09-04 上海寒武纪信息科技有限公司 Computing device and method
CN110472734B (en) * 2018-05-11 2024-03-29 上海寒武纪信息科技有限公司 Computing device and related product
CN108763360A (en) * 2018-05-16 2018-11-06 北京旋极信息技术股份有限公司 A kind of sorting technique and device, computer readable storage medium
CN108710958B (en) * 2018-05-16 2022-04-15 北京旋极信息技术股份有限公司 Predictive health management method and device and computer readable storage medium
CN108859477A (en) * 2018-07-05 2018-11-23 吉林工程技术师范学院 A kind of children's literature book binder and its control method
CN110806903A (en) * 2018-08-01 2020-02-18 珠海格力电器股份有限公司 Configuration parameter determining method and device of electric cooker
US20220004854A1 (en) * 2018-10-08 2022-01-06 Deeper-I Co., Inc. Artificial neural network computation acceleration apparatus for distributed processing, artificial neural network acceleration system using same, and artificial neural network acceleration method therefor
CN110059809B (en) * 2018-10-10 2020-01-17 中科寒武纪科技股份有限公司 Computing device and related product
CN111047045B (en) * 2018-10-12 2021-03-19 中科寒武纪科技股份有限公司 Distribution system and method for machine learning operation
EP4009186A1 (en) 2018-10-18 2022-06-08 Shanghai Cambricon Information Technology Co., Ltd Network-on-chip data processing method and device
CN111079908B (en) * 2018-10-18 2024-02-13 上海寒武纪信息科技有限公司 Network-on-chip data processing method, storage medium, computer device and apparatus
CN111178492B (en) * 2018-11-09 2020-12-11 安徽寒武纪信息科技有限公司 Computing device, related product and computing method for executing artificial neural network model
CN111258641B (en) * 2018-11-30 2022-12-09 上海寒武纪信息科技有限公司 Operation method, device and related product
CN109542837B (en) * 2018-11-30 2023-03-24 上海寒武纪信息科技有限公司 Operation method, device and related product
CN111260046B (en) * 2018-11-30 2022-12-02 上海寒武纪信息科技有限公司 Operation method, device and related product
CN109919313B (en) * 2019-01-31 2021-06-08 华为技术有限公司 Gradient transmission method and distributed training system
CN109978160B (en) * 2019-03-25 2021-03-02 中科寒武纪科技股份有限公司 Configuration device and method of artificial intelligence processor and related products
US11934940B2 (en) 2019-04-18 2024-03-19 Cambricon Technologies Corporation Limited AI processor simulation
CN111080400B (en) * 2019-11-25 2023-04-18 中山大学 Commodity recommendation method and system based on gate control graph convolution network and storage medium
CN111461340B (en) * 2020-03-10 2023-03-31 北京百度网讯科技有限公司 Weight matrix updating method and device and electronic equipment
CN112329619B (en) * 2020-11-04 2022-06-14 济南博观智能科技有限公司 Face recognition method and device, electronic equipment and readable storage medium
CN114071781B (en) * 2021-11-16 2024-04-12 杭州电子科技大学 Wireless local area network medium access control method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732274A (en) * 2015-03-10 2015-06-24 华南理工大学 Intelligent computer
CN104757992A (en) * 2015-03-16 2015-07-08 广东工业大学 Cardiac sound diagnostic system based on depth confidence network and diagnostic method
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105447569A (en) * 2015-12-18 2016-03-30 北京柏惠维康科技有限公司 Breast cancer cell characteristic analysis system based on deep learning
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729678B (en) * 2013-12-12 2016-10-05 中国科学院信息工程研究所 A kind of based on navy detection method and the system of improving DBN model
CN104157290B (en) * 2014-08-19 2017-10-24 大连理工大学 A kind of method for distinguishing speek person based on deep learning
CN104182772B (en) * 2014-08-19 2017-10-24 大连理工大学 A kind of gesture identification method based on deep learning
CN105184366B (en) * 2015-09-15 2018-01-09 中国科学院计算技术研究所 A kind of time-multiplexed general neural network processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104732274A (en) * 2015-03-10 2015-06-24 华南理工大学 Intelligent computer
CN104757992A (en) * 2015-03-16 2015-07-08 广东工业大学 Cardiac sound diagnostic system based on depth confidence network and diagnostic method
CN105117706A (en) * 2015-08-28 2015-12-02 小米科技有限责任公司 Image processing method and apparatus and character recognition method and apparatus
CN105488565A (en) * 2015-11-17 2016-04-13 中国科学院计算技术研究所 Calculation apparatus and method for accelerator chip accelerating deep neural network algorithm
CN105447569A (en) * 2015-12-18 2016-03-30 北京柏惠维康科技有限公司 Breast cancer cell characteristic analysis system based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"DaDianNao: A Machine-Learning Supercomputer";Yunji Chen 等;《International Symposium on Microarchitecture》;20141231;参见第609-622页 *

Also Published As

Publication number Publication date
CN110188870B (en) 2021-10-12
CN110188870A (en) 2019-08-30
CN107316078A (en) 2017-11-03

Similar Documents

Publication Publication Date Title
CN107316078B (en) Apparatus and method for performing artificial neural network self-learning operation
CN107341547B (en) Apparatus and method for performing convolutional neural network training
CN107341542B (en) Apparatus and method for performing recurrent neural networks and LSTM operations
CN107832843B (en) Information processing method and related product
CN107315571B (en) Device and method for executing forward operation of full-connection layer neural network
CN109376861B (en) Apparatus and method for performing full connectivity layer neural network training
CN107329734B (en) Apparatus and method for performing convolutional neural network forward operation
CN107301454B (en) Artificial neural network reverse training device and method supporting discrete data representation
CN109358900B (en) Artificial neural network forward operation device and method supporting discrete data representation
CN109242094B (en) Apparatus and method for performing artificial neural network forward operations
CN107886166B (en) Device and method for executing artificial neural network operation
WO2017185347A1 (en) Apparatus and method for executing recurrent neural network and lstm computations
CN111353588A (en) Apparatus and method for performing artificial neural network reverse training
WO2017185248A1 (en) Apparatus and method for performing auto-learning operation of artificial neural network
EP3444758B1 (en) Discrete data representation-supporting apparatus and method for back-training of artificial neural network
WO2018058452A1 (en) Apparatus and method for performing artificial neural network operation
WO2017185335A1 (en) Apparatus and method for executing batch normalization operation
CN107341546B (en) Device and method for executing batch normalization operation
CN111178492B (en) Computing device, related product and computing method for executing artificial neural network model
CN109993276B (en) Apparatus and method for performing artificial neural network reverse training

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant after: Zhongke Cambrian Technology Co., Ltd

Address before: 100190 room 644, comprehensive research building, No. 6 South Road, Haidian District Academy of Sciences, Beijing

Applicant before: Beijing Zhongke Cambrian Technology Co., Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant