CN110647983B - Self-supervision learning acceleration system and method based on storage and calculation integrated device array - Google Patents

Self-supervision learning acceleration system and method based on storage and calculation integrated device array Download PDF

Info

Publication number
CN110647983B
CN110647983B CN201910944467.4A CN201910944467A CN110647983B CN 110647983 B CN110647983 B CN 110647983B CN 201910944467 A CN201910944467 A CN 201910944467A CN 110647983 B CN110647983 B CN 110647983B
Authority
CN
China
Prior art keywords
array
calculation
module
matrix
self
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910944467.4A
Other languages
Chinese (zh)
Other versions
CN110647983A (en
Inventor
潘红兵
娄胜
王宇宣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201910944467.4A priority Critical patent/CN110647983B/en
Publication of CN110647983A publication Critical patent/CN110647983A/en
Application granted granted Critical
Publication of CN110647983B publication Critical patent/CN110647983B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Neurology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a self-supervision learning acceleration system and method based on a storage and computation integrated device array. The acceleration system comprises a cache module, a calculation array, a weight input module, an auxiliary circuit, a control module and a parameter updating module; the cache module, the calculation array and the parameter updating module are sequentially connected; the weight input module is connected with the calculation array and used for updating the calculation array; the control module is respectively connected with the cache module, the weight input module, the calculation array and the parameter updating module; the computing array and the auxiliary circuit are used for completing the operation of the self-supervision neural network. The invention realizes an acceleration system and method of self-supervision learning by the area and power consumption advantages of the storage and calculation integrated computing array, and can save a large amount of energy consumption and product volume compared with the existing processing system which utilizes a graphic computing display card and a traditional digital circuit.

Description

Self-supervision learning acceleration system and method based on storage and calculation integrated device array
Technical Field
The invention relates to a system and a method for accelerating self-supervision learning by using a storage and computation integrated device array, belonging to the field of machine learning.
Background
Most of the conventional computers adopt the von neumann architecture, however, due to the separation of the storage unit and the operation unit of the von neumann architecture, the great energy consumption is generated on the data transmission, and the operation speed is influenced.
Self-supervised learning is one type of unsupervised learning that does not require labeled data to train a general-purpose system. When a neural network is generally trained, a graphic computing graphics card or a central processing unit is used for all the calculations of the network part and the calculation of parameter updating, and for a typical processor of the von neumann architecture, the energy efficiency ratio of the graphic computing graphics card to the central processing unit is very low.
When a neural network is used for inference, a conventional digital circuit generally expands convolution operation into matrix vector multiplication operation, and uses a corresponding multiply-accumulate unit to complete matrix multiplication operation. However, a single multiplier requires a large resource (area) and also causes high power consumption; the access during operation brings the improvement of power consumption, and the existence of a storage wall also limits the further improvement of the operation speed.
The storage and calculation integrated device can realize a storage-calculation integrated function with certain precision, a single device can store numerical value information and store the numerical value information for a long time, and multiplication can be realized by utilizing an analog method in the device. In intensive computing tasks (such as machine learning), large area advantages and power consumption advantages can be achieved using computationally-integrated devices. The current storage and calculation integrated device types are mainly as follows: phase change memories, resistive random access memories rerams, floating gate devices and the like. Numerous products with capacities reaching the Gb level have been released by the industry and the trade, such as 8Gb PCM, which was introduced by 20nm technology in 2012 by three stars, and 128Gb 3D XPoint technology, which was introduced by Micron and Intel 2015. The challenges facing current computing integrated devices include: the precision of the value stored in the device is not high, the process of a part of storage and calculation integrated device is not mature, and the manufacturing process of the memory and the processor is not compatible.
Disclosure of Invention
In order to overcome the technical defects of the traditional processing system in the process of self-supervised learning, the invention provides a system and a method for accelerating the self-supervised learning based on a storage and calculation integrated device array.
The technical scheme adopted by the system of the invention is as follows:
a self-supervision learning acceleration system based on a storage and calculation integrated device array comprises a cache module, a calculation array, a weight input module, an auxiliary circuit, a control module and a parameter updating module; the cache module, the calculation array and the parameter updating module are connected in sequence; the weight input module is connected with the calculation array and used for updating the calculation array; the control module is respectively connected with the cache module, the weight input module, the calculation array and the parameter updating module; the computing array and the auxiliary circuit are used for completing the operation of the self-supervision neural network.
Further, the parameter updating module adopts a digital circuit or a graphic calculation display card.
The invention utilizes the acceleration method of the acceleration system, and the specific process is as follows: the control module inputs and stores the initialized network parameters into the calculation array through the control weight input module; the upper computer sends the training data to a cache module through an interface; the control module sends the training data in the cache module to the auxiliary circuit, and the training data in the cache module is still reserved; a part of the auxiliary circuit quantizes the training data according to bits and then inputs the training data into a calculation array; the calculation array completes convolution operation and full connection operation in the self-supervision neural network, and the other part of auxiliary circuits completes activation and pooling operation; the parameter updating module calculates updated network parameters by a gradient descent method according to the calculation result of the neural network and the training data in the cache module, and sends the parameters to the control module; after receiving the parameters, the control module controls the calculation array to erase the original parameters, and then controls the weight input module to input and store the updated parameters into the calculation array, thereby completing one iteration; repeating the iteration process to complete the training process of the self-supervision learning.
Further, the parameter updating module calculates the updated network parameters according to the calculation result output by the neural network and the stored training data and according to a back propagation algorithm by using a preset loss function.
Further, the calculation array completes convolution operation in the self-supervision neural network, and the specific calculation process of each convolution layer is as follows:
(1) for m convolution kernels of the current convolution layer, expanding each convolution kernel according to columns and then splicing the convolution kernels into a column of vectors, if the m column vectors corresponding to the m convolution kernels are spliced into a matrix, for the input image characteristic diagram of n channels, splicing the n matrices up and down into a new large matrix, and calculating the nonvolatile storage value of the array to be the corresponding value in the large matrix by adopting a calculation array with the same size as the large matrix;
(2) the input of the current convolutional layer is the image feature maps of n channels, and for each image feature map, the following operations are performed to obtain n' matrixes: selecting an area with the same size as the convolution kernel, and moving the characteristic diagram for p times according to a specified step; taking out the corresponding values in the characteristic diagram every time of moving, and unfolding and splicing the values into a line of vectors according to the longitudinal sequence; after the movement is finished, p row vectors are obtained and are sequentially spliced into a matrix from top to bottom;
(3) splicing the n' matrixes left and right to obtain a final electric input matrix; sequentially inputting each row of the electrical input matrix into the calculation array from top to bottom, wherein each column element of the row vector corresponds to each row of the calculation array;
(4) inputting the row vectors into a calculation array according to bit positions, namely inputting one bit at a time; after the calculation of the calculation array is completed, converting the result of each row by an analog-to-digital converter to obtain a digital signal, shifting the digital signal according to corresponding bits respectively, and accumulating to obtain a result, wherein the result is in the form of a vector with the length of m; the result is the result of the calculation completed by the entry of a row of vectors of the electrical input matrix into the calculation array, and the result of the summation after the convolution operation is performed on the same area in the n image characteristic diagrams corresponding to the m convolution kernels;
(5) according to the methods in the steps (3) and (4), p row vectors of the electrical input matrix are sequentially calculated to obtain p vector form results, and the p row vectors are vertically spliced into a matrix; splicing each column of the matrix into a characteristic diagram according to the sequence of values from the characteristic diagram in the step (2), namely obtaining m characteristic diagrams corresponding to the result of convolution operation of each convolution kernel;
(6) and adding bias to the m characteristic diagrams by using an auxiliary circuit, and performing activation operation to obtain a final result of the current layer convolution layer.
Further, the computing array completes full-connection operation in the self-supervision neural network, and the computing process of each full-connection layer is as follows:
(1) assuming that the number of upper layer neurons is m, and the number of the local layer neurons is n, the weights are m × n in total, the m × n weights are sequentially arranged into a matrix, and a calculation array with the same size as the matrix is adopted to calculate the light input quantity of the array to be a corresponding value in the matrix;
(2) taking the m values output by the upper layer as the electrical input quantity of the calculation array;
(3) inputting the electrical input quantity into the integrated storage device array according to the bit, namely inputting one bit each time; after the calculation of the calculation array is completed, converting the result of each row by an analog-to-digital converter to obtain a digital signal, shifting the digital signal according to corresponding bits respectively, and accumulating to obtain a result, wherein the result is in the form of a vector with the length of n;
(4) and adding bias to the vector with the length of n, activating, and obtaining the final result of the current layer full-connection layer after the activation.
The invention realizes an acceleration system and method of self-supervision learning by the area and power consumption advantages of the storage and calculation integrated computing array, and can save a large amount of energy consumption and product volume compared with the existing processing system which utilizes a graphic computing display card and a traditional digital circuit.
Drawings
Fig. 1 is a block diagram of the structure of the acceleration system for the self-supervised learning of the present invention.
Fig. 2 is a schematic structural diagram of a computing unit in embodiment 1 of the present invention.
FIG. 3 is a hardware block diagram of the compute array and support circuits of embodiment 1 of the present invention.
Fig. 4 is a schematic structural diagram of a convolutional self-encoder in embodiment 1 of the present invention.
Fig. 5 is a schematic diagram illustrating the principle of the convolution operation developed as the matrix multiplication method in embodiment 1 of the present invention, (a) performing convolution operation on one feature map for n convolution kernels, and outputting results of n channels; (b) The area corresponding to the convolution kernel in the characteristic diagram is expanded according to the column direction to be used as input, n convolution kernels are expanded according to the column direction and then spliced into a convolution kernel matrix, the multiplication result is an n-column matrix, and each column corresponds to the result of one channel in a.
FIG. 6 is a structural schematic diagram of a memristor in embodiment 2 of the present disclosure.
FIG. 7 is a schematic structural diagram of a memristor computational array in embodiment 2 of the present disclosure.
Fig. 8 is a schematic diagram of the NOR Flash array structure in embodiment 3 of the present invention.
Detailed Description
The invention aims to build an acceleration system for self-supervision learning by utilizing a storage-integration device array so as to obtain smaller area and higher energy efficiency. As shown in fig. 1, the acceleration system includes a calculation array, a weight input module, an auxiliary circuit, a control module, and a parameter update module; the cache module, the calculation array and the parameter updating module are connected in sequence; the weight input module is connected with the calculation array and used for updating the calculation array; the control module is respectively connected with the weight input module, the calculation array, the cache module and the parameter updating module; and auxiliary circuits are arranged between the cache module and the calculation array and between the calculation array and the parameter updating module, and the calculation array and the auxiliary circuits are used for completing the operation of the self-supervision neural network. The array of integral devices can compute large-scale matrix multiplication with minimal cost, and the convolution operation can be expressed as matrix multiplication through expansion.
Example 1
The integrated computing array of the embodiment adopts a photoelectric integrated computing array, the array comprises a light emitting array and a computing array, and the computing array is formed by periodically arranging a plurality of computing units.
As shown in fig. 2, the calculation unit of the present embodiment includes: the photoelectric readout device comprises a control grid serving as a carrier control region, a charge coupling layer serving as a coupling region and a P-type substrate serving as a photon-generated carrier collecting region and a readout region, wherein the P-type substrate is divided into a left collecting region and a right readout region, and the right readout region comprises a shallow trench isolation, and an N-type source end and an N-type drain end which are formed by ion implantation. The shallow trench isolation is located in the middle of the semiconductor substrate, the collection region and the readout region, and is formed by etching and filling silicon dioxide so as to isolate electric signals of the collection region and the readout region. The N-type source end is positioned on one side, close to the bottom dielectric layer, in the reading area and is formed by doping through an ion implantation method. The N-type drain terminal is positioned on the other side, opposite to the N-type source terminal, of the semiconductor substrate close to the bottom layer dielectric layer, and is formed by a doping method through an ion implantation method.
And applying a pulse with a negative voltage range or applying a pulse with a positive voltage range on the control gate on the substrate in the collecting region to generate a depletion layer for collecting photoelectrons in the substrate in the collecting region, and reading out the quantity of the collected photoelectrons through the right read-out region as the input quantity of the optical input end. When reading, a positive voltage is applied to the control grid electrode to form a conductive channel between the N-type source end and the N-type drain end of the collecting region, and then a bias pulse voltage is applied between the N-type source end and the N-type drain end to accelerate electrons in the conductive channel to form a current between the source and the drain. And current carriers are formed in a channel between the source and the drain and are acted by the control gate voltage, the source and the drain voltage and the number of photoelectrons collected by the collecting region together to serve as electrons acted by the light input quantity and the electric input quantity, and the electrons are output in a current form, wherein the control gate voltage and the source and the drain voltage can serve as the electric input quantity of the device, and the number of photoelectrons serves as the light input quantity of the device.
The charge coupling layer of the coupling region is used for connecting the collecting region and the reading region, so that the surface potential of the collecting region substrate can be influenced by the quantity of collected photoelectrons after the depletion region in the collecting region substrate starts to collect the photoelectrons; through the connection of the charge coupling layer, the surface potential of the semiconductor substrate in the reading region is influenced by the surface potential of the semiconductor substrate in the collecting region, so that the magnitude of the current between the source and the drain of the reading region is influenced, and the quantity of photoelectrons collected in the collecting region is read by judging the current between the source and the drain of the reading region;
and the control gate of the carrier control region is used for applying a pulse voltage to the control gate so as to generate a depletion region for exciting photoelectrons in the P-type semiconductor substrate readout region, and can also be used as an electrical input end for inputting one bit of operand.
In addition, a bottom dielectric layer for isolation is arranged between the P-type semiconductor substrate and the charge coupling layer; a top dielectric layer for isolation is also present between the charge coupling layer and the control gate.
The hardware block diagram of the calculation array and the auxiliary circuit of the self-monitoring learning acceleration system composed of the photoelectric storage and calculation integrated calculation array is shown in fig. 3.
Now, assume that the neural network model of the system is a convolution self-encoder, which is a classic unsupervised learning (self-supervised learning) case and can be used for applications such as image denoising and the like. The method combines the convolution and pooling operations of the convolutional neural network by utilizing the self-supervision learning mode of the traditional self-encoder, thereby realizing feature extraction and realizing a deep neural network. Firstly, a convolutional self-encoder model is built by using a photoelectric memory-computer integrated computing array, and then a convolutional self-encoder accelerating system is used for training to obtain the image denoising capability.
As shown in fig. 4, the convolution layers, the upsampling layers, and the like in the convolutional auto-encoder are sequentially built by using the integrated photoelectric-computing array, and assuming that for a certain convolution layer, the input feature map size is 4 × 4, 64 channels are provided, 64 convolution kernels with the size of 3 × 3 are provided in total, and the step is 1, the feature map with the size of 2 × 2 is output as 64 channels. The construction of the layer of the convolutional layer comprises the following steps:
1) Expanding convolution kernel, extracting characteristic diagram, matrix multiplication according to the method of FIG. 5,
Figure BDA0002223777850000051
input characteristic diagram (the nth channel)
Figure BDA0002223777850000052
Convolution kernel (mth)
The matrix multiplication operation after expansion is
Figure BDA0002223777850000061
The convolution operation of the convolutional layer can be expressed as above, where W (convolution kernel, size (9 × 64) is the weight of the photo computational array, and a (upper layer output, size 16 × 9 × 64)) is the input of the photo computational array.
2) The calculation array inputs one row of electrical signals (4 rows in total) at a time, and the input vector is input into the calculation array according to bit positions, namely one bit at a time. Assuming that each element has 8 bits, the input is divided into 8 times, when the operation in the calculation array is completed, the result of each column is subjected to AD conversion to obtain a digital signal, the 8 times of output is shifted according to the corresponding bit by using a basic digital logic circuit, and then the result is obtained by accumulation.
3) According to the method in step 2), 4 rows of vectors of the electrical input matrix are sequentially operated to obtain 4 vector-form results (each vector has 64 elements), and the 4 rows of vectors are spliced up and down to form a matrix. And splicing the first column of the matrix into a feature map according to the sequence of values from the feature map, namely the feature map corresponding to the result of the first convolution kernel convolution operation, splicing the second column into a feature map corresponding to the result of the second convolution kernel convolution operation, and repeating the steps to obtain 64 feature maps (the size is 4 x 4).
4) The resulting 64 profiles are biased using basic digital circuitry and activation is performed using an activation function. And obtaining the output result of the convolution layer after finishing.
By the method, a network main body part of the convolution self-encoder can be built by using the photoelectric-computing integrated computing array; after the building is completed, the acceleration system can be used for training the convolution self-encoder, and the training comprises the following steps:
the control module controls the light emitting array to emit light and inputs the random initialization network parameter information into the photoelectric calculation integrated computing array; the upper computer sends the image data to the cache module through the interface; the control module inputs the image data into an auxiliary circuit, wherein the image data is added with Gaussian noise to become a noise image, is quantized according to bits and then is input into a photoelectric memory integrated computing array; the photoelectric calculation array completes convolution operation and full-connection operation in the convolution self-encoder network, and the other part of auxiliary circuits complete activation, pooling and other operations, so that the photoelectric calculation integrated calculation array and the auxiliary circuits are matched to complete all operations of the convolution self-encoder network; the parameter updating module calculates updated network parameters by a gradient descent method according to the calculation result of the convolution self-encoder model and the original image data in the cache module, and sends the parameters to the control module; after receiving the parameters, the control module controls the photoelectric calculation integrated computing array to erase original parameters, then controls the light emitting array to input and store the updated parameters into the photoelectric calculation integrated computing array, and therefore one iteration is completed; the above steps are an iteration period, after a plurality of iterations, the loss function value is calculated by the parameter updating module, the loss function value is smaller than a certain threshold value, the training effect is achieved, and at the moment, the convolutional self-encoder model has the image denoising function.
Example 2
The memory-computation-integrated computation array of the embodiment adopts a memristor computation array. As shown in fig. 6, a memristor is a non-linear resistor with a memory function, and the resistance of the resistor changes with the flowing circuit. After the power is turned off, even if the current stops, the resistance value is maintained until the reverse current passes, and it returns to its original state.
Therefore, the resistance value of the memristor can be changed by controlling the current change, for example, the high resistance value is defined as 1, and the low resistance value is defined as 0, so that the data storage function is realized. In addition, the memristor can realize the function of integrating storage and calculation according to methods such as memristor CMOS logic, logic storage fusion operation and the like.
A memristor computing array can be formed by a plurality of memristor units, and some functional units are added on the periphery of the memristor array, as shown in FIG. 7. Thus, a 4 × 4 memristor array may store a 4 × 4 weight matrix, and after 4 1 × 4 input voltages are applied, a matrix vector multiplication may be performed within a read delay.
In the same embodiment 1, assuming that the network model of the system is a convolution self-encoder model, convolution operation can be expanded into matrix vector multiplication, so that the convolution self-encoder model can be built by using the memristor array, and after the building is completed, the acceleration system can be used for training the convolution self-encoder, and the training comprises the following steps:
the control module controls the weight input module, weight data are sequentially stored in the memristor array in a mode of controlling the voltage of a device port, and the upper computer sends image data to the cache module through an interface; the control module inputs the image data into an auxiliary circuit, and Gaussian noise is added into the image data in the auxiliary circuit to become a noise image, the noise image is quantized according to bits, and then the image data is input into a memristor array; the memristor calculation array completes convolution operation and full-connection operation in the convolution self-encoder network, and the other part of auxiliary circuits complete activation, pooling and other operations, so that the memristor array and the auxiliary circuits are matched to complete all operations of the convolution self-encoder network; the parameter updating module calculates updated network parameters by a gradient descent method according to the calculation result of the convolution self-encoder model and the original image data in the cache module, and sends the parameters to the control module; after receiving the parameters, the control module firstly controls the memristor array to erase original parameters, and then controls the weight input module to input and store the updated parameters into the memristor array, so that one iteration is completed; the above steps are an iteration cycle, after a plurality of iterations, the parameter updating module calculates that the loss function value is smaller than a certain threshold value, and the training effect is achieved, then the convolutional self-encoder model has the image denoising function.
Example 3
The integrated computing array of the embodiment adopts a floating gate device/flash computing array. As shown in fig. 8, in a 3 × 8bit NOR Flash structure, the basic memory cells under each Bit Line are connected in parallel, and when a Word Line is selected, reading of the Word, that is, bit reading can be achieved, and a higher reading rate is achieved.
The Flash storage unit can store the weight parameters of the neural network and can complete the multiplication and addition operation related to the weight, so that the multiplication and addition operation and the storage are integrated into one Flash unit. The multiplication is performed by a current mirror like analog circuit. The input current is converted to a voltage and coupled to the control gate of the Flash transistor, the output current of the Flash transistor being equal to the input current multiplied by the stored weight. The addition is calculated in a manner similar to the parallel circuit current summation.
The storage and calculation integrated array based on the NOR Flash architecture can directly perform full-precision matrix convolution operation (multiply-add operation) in a storage unit by utilizing the analog characteristic of NOR Flash. The bottleneck of data back and forth transmission between the logic operation unit and the memory is avoided, so that the power consumption is greatly reduced, and the operation efficiency is improved.
In the same embodiment 1, assuming that the network model of the system is a convolutional self-encoder model, the convolutional operation can be expanded into matrix vector multiplication, so that the convolutional self-encoder model can be built by using the NOR flash memory array, and after the building is completed, the convolutional self-encoder can be trained by using the acceleration system, and the training comprises the following steps:
the control module controls the weight input module, weight data are sequentially stored in the NOR flash memory array in a mode of controlling the voltage of a device port, and the upper computer sends image data to the cache module through an interface; the control module inputs it into an auxiliary circuit in which image data is added with gaussian noise to become a noise image, and is quantized by bit, and then input to the NOR flash memory array; the NOR flash memory array completes convolution operation and full connection operation in the convolution self-encoder network, and the other part of auxiliary circuits complete activation, pooling and other operations, so that the NOR flash memory array and the auxiliary circuits are matched to complete all operations of the convolution self-encoder network; the parameter updating module calculates updated network parameters by a gradient descent method according to the calculation result of the convolution self-encoder model and the original image data in the cache module, and sends the parameters to the control module; after receiving the parameters, the control module controls the NOR flash memory array to erase the original parameters, and then controls the weight input module to input and store the updated parameters into the NOR flash memory array, thereby completing one iteration; the above steps are an iteration cycle, after a plurality of iterations, the parameter updating module calculates that the loss function value is smaller than a certain threshold value, and the training effect is achieved, then the convolutional self-encoder model has the image denoising function.

Claims (6)

1. A self-supervision learning acceleration system based on a storage and computation integrated device array is characterized by comprising a cache module, a computation array, a weight input module, an auxiliary circuit, a control module and a parameter updating module; the cache module, the calculation array and the parameter updating module are connected in sequence; the weight input module is connected with the calculation array and used for updating the calculation array; the control module is respectively connected with the cache module, the weight input module, the calculation array and the parameter updating module; the computing array and the auxiliary circuit are used for completing the operation of the self-supervision neural network.
2. The system of claim 1, wherein the parameter updating module is a digital circuit or a graphic computing graphics card.
3. The acceleration method of the self-supervision learning acceleration system based on the memory-computer integrated device array is characterized in that the method comprises the following specific processes: the control module inputs and stores the initialized network parameters into the calculation array through the control weight input module; the upper computer sends the training data to the cache module through the interface; the control module sends the training data in the cache module to the auxiliary circuit, and the training data in the cache module is still reserved; a part of auxiliary circuits quantize the training data according to bits and then input the training data into a calculation array; the calculation array completes convolution operation and full connection operation in the self-supervision neural network, and the other part of auxiliary circuits completes activation and pooling operation; the parameter updating module calculates updated network parameters by a gradient descent method according to the calculation result of the neural network and the training data in the cache module, and sends the parameters to the control module; after receiving the parameters, the control module controls the calculation array to erase the original parameters, and then controls the weight input module to input and store the updated parameters into the calculation array, thereby completing one iteration; repeating the iteration process to complete the training process of the self-supervision learning.
4. An acceleration method according to claim 3, characterized in that the parameter update module calculates the updated network parameters according to the back propagation algorithm, using a pre-set loss function, according to the calculation results output by the neural network and the stored training data.
5. The acceleration method of claim 3, characterized in that, the computation array performs convolution operations in the self-supervised neural network, and the specific computation process of each convolution layer is as follows:
(1) for m convolution kernels of the current convolution layer, expanding each convolution kernel according to columns and then splicing the convolution kernels into a column of vectors, if the m column vectors corresponding to the m convolution kernels are spliced into a matrix, for the input image characteristic diagram of n channels, splicing the n matrices up and down into a new large matrix, and calculating the nonvolatile storage value of the array to be the corresponding value in the large matrix by adopting a calculation array with the same size as the large matrix;
(2) the input of the current convolutional layer is the image feature maps of n channels, and for each image feature map, the following operations are performed to obtain n' matrixes: selecting an area with the same size as the convolution kernel, and moving the characteristic diagram for p times according to a specified step; taking out the corresponding values in the characteristic diagram every time of moving, and unfolding and splicing the values into a line of vectors according to the longitudinal sequence; after the movement is finished, p row vectors are obtained and are sequentially spliced into a matrix from top to bottom;
(3) splicing the n' matrixes left and right to obtain a final electric input matrix; sequentially inputting each row of the electrical input matrix into the calculation array from top to bottom, wherein each column element of the row vector corresponds to each row of the calculation array;
(4) inputting the row vectors into a calculation array according to bit positions, namely inputting one bit at a time; after the calculation of the calculation array is completed, converting the result of each row by an analog-to-digital converter to obtain a digital signal, shifting the digital signal according to corresponding bits respectively, and accumulating to obtain a result, wherein the result is in the form of a vector with the length of m; the result is the result of the calculation completed by the entry of a row of vectors of the electrical input matrix into the calculation array, and the result of the summation after the convolution operation is performed on the same area in the n image characteristic diagrams corresponding to the m convolution kernels;
(5) according to the methods in the steps (3) and (4), p row vectors of the electrical input matrix are sequentially calculated to obtain p vector form results, and the p row vectors are vertically spliced into a matrix; splicing each column of the matrix into a characteristic diagram according to the sequence of values from the characteristic diagram in the step (2), namely obtaining m characteristic diagrams corresponding to the result of convolution operation of each convolution kernel;
(6) and adding bias to the m characteristic diagrams by using an auxiliary circuit, and performing activation operation to obtain a final result of the current layer convolution layer.
6. The acceleration method of claim 3, characterized in that, the computing array completes fully-connected operation in the self-supervised neural network, and the computation process of each fully-connected layer is as follows:
(1) assuming that the number of upper layer neurons is m, and the number of the local layer neurons is n, the weights are m × n in total, the m × n weights are sequentially arranged into a matrix, and a calculation array with the same size as the matrix is adopted to calculate the light input quantity of the array to be a corresponding value in the matrix;
(2) taking the m values output by the upper layer as the electrical input quantity of the calculation array;
(3) inputting the electrical input quantity into the integrated storage device array according to the bit, namely inputting one bit each time; after the calculation of the calculation array is completed, converting the result of each row by an analog-to-digital converter to obtain a digital signal, shifting the digital signal according to corresponding bits respectively, and accumulating to obtain a result, wherein the result is in the form of a vector with the length of n;
(4) and adding bias to the vector with the length of n, activating, and obtaining the final result of the current layer full-connection layer after the activation.
CN201910944467.4A 2019-09-30 2019-09-30 Self-supervision learning acceleration system and method based on storage and calculation integrated device array Active CN110647983B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910944467.4A CN110647983B (en) 2019-09-30 2019-09-30 Self-supervision learning acceleration system and method based on storage and calculation integrated device array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910944467.4A CN110647983B (en) 2019-09-30 2019-09-30 Self-supervision learning acceleration system and method based on storage and calculation integrated device array

Publications (2)

Publication Number Publication Date
CN110647983A CN110647983A (en) 2020-01-03
CN110647983B true CN110647983B (en) 2023-03-24

Family

ID=69012143

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910944467.4A Active CN110647983B (en) 2019-09-30 2019-09-30 Self-supervision learning acceleration system and method based on storage and calculation integrated device array

Country Status (1)

Country Link
CN (1) CN110647983B (en)

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344170B (en) * 2020-02-18 2023-04-25 杭州知存智能科技有限公司 Neural network weight matrix adjustment method, write-in control method and related device
CN111464764B (en) * 2020-03-02 2022-10-14 上海集成电路研发中心有限公司 Memristor-based image sensor and convolution operation method thereof
CN111625760B (en) * 2020-06-01 2022-07-05 山东大学 Storage and calculation integrated method based on electrical characteristics of flash memory
CN112183732A (en) * 2020-10-22 2021-01-05 中国人民解放军国防科技大学 Convolutional neural network acceleration method and device and computer equipment
CN112989268B (en) * 2021-02-06 2024-01-30 江南大学 Memory operation-oriented fully-unfolded non-orthogonal wiring memory array design method
CN113205848B (en) * 2021-04-27 2024-05-31 山东华芯半导体有限公司 Memristor-based memory calculation integrated solid state disk and memory calculation implementation method
CN113517009A (en) * 2021-06-10 2021-10-19 上海新氦类脑智能科技有限公司 Storage and calculation integrated intelligent chip, control method and controller
CN113723044B (en) * 2021-09-10 2024-04-05 上海交通大学 Excess row activation and calculation integrated accelerator design method based on data sparsity
WO2023044707A1 (en) * 2021-09-24 2023-03-30 Intel Corporation Methods and apparatus to accelerate convolution
CN115083462B (en) * 2022-07-14 2022-11-11 中科南京智能技术研究院 Digital in-memory computing device based on Sram
CN115081373B (en) * 2022-08-22 2022-11-04 统信软件技术有限公司 Memristor simulation method and device, computing equipment and readable storage medium
CN115879530B (en) * 2023-03-02 2023-05-05 湖北大学 RRAM (remote radio access m) memory-oriented computing system array structure optimization method
CN116049094B (en) * 2023-04-03 2023-07-21 南京大学 Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit
CN116167424B (en) * 2023-04-23 2023-07-14 深圳市九天睿芯科技有限公司 CIM-based neural network accelerator, CIM-based neural network accelerator method, CIM-based neural network storage processing system and CIM-based neural network storage processing equipment
CN116777727B (en) * 2023-06-21 2024-01-09 北京忆元科技有限公司 Integrated memory chip, image processing method, electronic device and storage medium
CN117574844B (en) * 2023-11-23 2024-07-05 华南理工大学 Self-supervision learning DTCO process parameter performance specification feedback method
CN118627566A (en) * 2024-08-13 2024-09-10 南京大学 Heterogeneous memory device and method for accelerating ViT neural network

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106228238B (en) * 2016-07-27 2019-03-22 中国科学技术大学苏州研究院 Accelerate the method and system of deep learning algorithm on field programmable gate array platform
US10698657B2 (en) * 2016-08-12 2020-06-30 Xilinx, Inc. Hardware accelerator for compressed RNN on FPGA
CN110009102B (en) * 2019-04-12 2023-03-24 南京吉相传感成像技术研究院有限公司 Depth residual error network acceleration method based on photoelectric computing array

Also Published As

Publication number Publication date
CN110647983A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
CN110647983B (en) Self-supervision learning acceleration system and method based on storage and calculation integrated device array
CN111338601B (en) Circuit for in-memory multiply and accumulate operation and method thereof
KR102519293B1 (en) A monolithic multi-bit weight cell for neuromorphic computing
Peng et al. Optimizing weight mapping and data flow for convolutional neural networks on processing-in-memory architectures
US20220335278A1 (en) Parallel acceleration method for memristor-based neural network, parallel acceleration processor based on memristor-based neural network and parallel acceleration device based on memristor-based neural network
CN109635941B (en) Maximum pooling processor based on 1T1R memory device
EP4022426B1 (en) Refactoring mac operations
TWI815312B (en) Memory device, compute in memory device and method
TWI771014B (en) Memory circuit and operating method thereof
CN110751279B (en) Ferroelectric capacitance coupling neural network circuit structure and multiplication method of vector and matrix in neural network
CN111656371A (en) Neural network circuit with non-volatile synapse array
CN115879530B (en) RRAM (remote radio access m) memory-oriented computing system array structure optimization method
Burr et al. Ohm's law+ kirchhoff's current law= better ai: Neural-network processing done in memory with analog circuits will save energy
CN114241245A (en) Image classification system based on residual error capsule neural network
CN110009102B (en) Depth residual error network acceleration method based on photoelectric computing array
CN114861900B (en) Weight updating method and processing unit for memristor array
CN116483773A (en) Memory computing circuit and device based on transposed DRAM unit
CN115390789A (en) Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method
CN110244817B (en) Partial differential equation solver based on photoelectric computing array and method thereof
US20220309328A1 (en) Compute-in-memory devices, systems and methods of operation thereof
CN109993283B (en) Deep convolution generation type countermeasure network acceleration method based on photoelectric calculation array
CN118036682A (en) Method, device, equipment and medium for implementing in-memory calculation of addition neural network
US20230361081A1 (en) In-memory computing circuit and fabrication method thereof
CN115831185A (en) Storage and calculation integrated chip, operation method, manufacturing method and electronic equipment
CN114093394B (en) Rotatable internal computing circuit and implementation method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant