CN116049094B - Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit - Google Patents
Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit Download PDFInfo
- Publication number
- CN116049094B CN116049094B CN202310341643.1A CN202310341643A CN116049094B CN 116049094 B CN116049094 B CN 116049094B CN 202310341643 A CN202310341643 A CN 202310341643A CN 116049094 B CN116049094 B CN 116049094B
- Authority
- CN
- China
- Prior art keywords
- module
- configuration
- array
- calculation
- weight
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 89
- 238000000034 method Methods 0.000 title claims abstract description 22
- 230000006870 function Effects 0.000 claims abstract description 20
- 238000012163 sequencing technique Methods 0.000 claims abstract description 12
- 239000000872 buffer Substances 0.000 claims description 47
- 238000005286 illumination Methods 0.000 claims description 32
- 230000005284 excitation Effects 0.000 claims description 19
- 230000008878 coupling Effects 0.000 claims description 15
- 238000010168 coupling process Methods 0.000 claims description 15
- 238000005859 coupling reaction Methods 0.000 claims description 15
- 239000000758 substrate Substances 0.000 claims description 13
- 230000008569 process Effects 0.000 claims description 10
- 230000008859 change Effects 0.000 claims description 8
- 238000002347 injection Methods 0.000 claims description 7
- 239000007924 injection Substances 0.000 claims description 7
- 238000006243 chemical reaction Methods 0.000 claims description 6
- 239000002784 hot electron Substances 0.000 claims description 6
- 230000005684 electric field Effects 0.000 claims description 5
- 206010070834 Sensitisation Diseases 0.000 claims 1
- 230000008313 sensitization Effects 0.000 claims 1
- 238000001514 detection method Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 3
- 239000011159 matrix material Substances 0.000 description 10
- 238000007667 floating Methods 0.000 description 9
- 239000013598 vector Substances 0.000 description 9
- 230000003287 optical effect Effects 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000013139 quantization Methods 0.000 description 3
- 238000009825 accumulation Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 2
- 230000003139 buffering effect Effects 0.000 description 2
- 238000012512 characterization method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000001788 irregular Effects 0.000 description 2
- 230000027756 respiratory electron transport chain Effects 0.000 description 2
- 238000000926 separation method Methods 0.000 description 2
- 238000012795 verification Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000004888 barrier function Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012937 correction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7807—System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
- G06F15/7821—Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06E—OPTICAL COMPUTING DEVICES; COMPUTING DEVICES USING OTHER RADIATIONS WITH SIMILAR PROPERTIES
- G06E3/00—Devices not provided for in group G06E1/00, e.g. for processing analogue or hybrid data
- G06E3/001—Analogue devices in which mathematical operations are carried out with the aid of optical or electro-optical elements
- G06E3/005—Analogue devices in which mathematical operations are carried out with the aid of optical or electro-optical elements using electro-optical or opto-electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Hardware Design (AREA)
- Physics & Mathematics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Mathematical Physics (AREA)
- Nonlinear Science (AREA)
- Optics & Photonics (AREA)
- Transforming Light Signals Into Electric Signals (AREA)
Abstract
The invention discloses a multi-threshold configuration device and a method based on a photoelectric storage and calculation integrated unit, belonging to the field of photoelectric detection and the field of digital signal processing of very large scale integrated circuits. The device comprises a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module, wherein the cache module is used for realizing the storage, updating and distribution of data; the configuration module is used for generating an external control signal to control the writing and reading of the calculation array data; the computing array is used for realizing a high-precision calculation function; the reading module reads data stored in the array; the ordering module is used for generating a data format required by array calculation; the global control module is used for controlling data distribution and locating the current configuration state. The device and the method can realize the configuration and calculation of multiple thresholds based on the photoelectric memory integrated unit, shorten the time required by actual writing, reduce the error caused by writing of devices, reduce configuration links and improve the working efficiency.
Description
Technical Field
The invention relates to a multi-threshold configuration device and a method thereof based on a photoelectric storage and calculation integrated unit, belonging to the field of photoelectric detection and the field of digital signal processing of very large scale integrated circuits.
Background
Most conventional computers employ von neumann architecture, and instructions and data are stored in the same memory, resulting in excessive dependence of the system on memory. The separation of simultaneous storage and computation results in the speed of the CPU accessing memory limiting the speed of system operation.
The integrated memory technology solves the problem of data handling power consumption caused by separation of the memory cells from the computing cells under the traditional von neumann architecture, and the memory cells are used for participating in logic computation, so that greater computational power and higher energy efficiency can be provided in specific fields. At present, the market development drive of the integrated memory and calculation is very strong, and besides AI calculation, the memory and calculation technology is widely applied to the integrated memory and calculation chip and the brain-like chip.
In 2009, the center of Nanjing university sensing and imaging technology is based on the acceleration injection effect of an N-type floating gate transistor on photo-generated electrons in a depletion region by a substrate electric field under the strong inversion condition of a substrate bias, so that the pixel structure of which all photosensitive-reading-addressing and resetting functions can be completed by using only a single device under a standard CMOS process is successfully realized, the duty ratio of pixels and the potential of further shrinking the pixels are greatly improved, and the new breakthrough of the sensor technology is realized.
The above-mentioned integrated photoelectric memory device still has a certain disadvantage, because the time of each optical writing cannot be predicted, only a short step length is written each time to ensure accuracy, and the device is read out immediately after writing to ensure the accuracy of the storage process. Even so, the optical writing process still causes some errors due to some non-ideal factors, and moreover, the configuration work of each unit is completed by a plurality of exposure-verification-exposure-verification cycles, which occupies a great deal of writing time. Therefore, how to avoid the writing error and shorten the writing time becomes the first difficulty to be solved.
Disclosure of Invention
In order to shorten the time required for actually writing the integrated photoelectric storage and calculation unit, reduce the error caused by writing devices, reduce configuration links and improve the working efficiency, the invention provides a multi-threshold configuration device and a multi-threshold configuration method based on the integrated photoelectric storage and calculation unit. The multi-threshold configuration device based on the photoelectric storage and calculation integrated unit solves the problems of writing errors, long configuration time, low efficiency and the like based on the modes of piecewise linear approximation and the like through the arrangement of the configuration module, the calculation array and the like.
The technical scheme adopted by the invention is as follows:
a multi-threshold configuration device based on an integrated photoelectric storage and calculation unit comprises a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module; the buffer memory module is connected with the configuration module, the calculation array, the reading module and the global control module in sequence, and the sequencing module is connected with the buffer memory module, the calculation array and the global control module respectively.
Further, the buffer module is used for receiving weight data, calculation data, configuration row address, illumination intensity and illumination duration under the light intensity sent by the upper computer, buffering the weight and the excitation value, and updating the configuration state and the configuration information of the current weight.
Furthermore, the computing array is formed by arranging a plurality of photoelectric computing integrated units, each photoelectric computing integrated unit adopts a typical nor flash architecture and has different sizes, the thickness of a medium layer at the top layer and the medium layer at the bottom layer is unchanged, the lower limit threshold value of drain-source current in different photoelectric computing integrated units is represented only by changing the gate width or gate length of a device, and then the minimum data quantity which can be stored by different units is represented.
Further, the photoelectric integrated unit realizes electron transfer and storage through photoelectric conversion during exposure, photons are incident into a depletion region of the substrate, and the substrate absorbs the photons and excites electron hole pairs. And the photoelectrons move to the channel under the gate voltage to obtain energy, and finally enter the charge coupling layer under the drive of the gate oxide electric field to realize charge storage. In the process, the charge quantity of the charge coupling layer can influence the threshold voltage of the photoelectric storage integrated unit when the photoelectric storage integrated unit is started, so that the magnitude of the current at the drain end is influenced, and the quantity of photoelectrons entering the charge coupling layer when the photosensitive is read out by judging the magnitude of the current at the drain end. The nonlinear function curve formed by the drain current and the exposure time is approximated in a piecewise linear manner, different intervals are divided, each interval corresponds to one illumination intensity, the speed of hot electron injection is influenced by the different illumination intensities, and the linearity of the curve function between the current of the readout area and the exposure time in different sections is further influenced.
Further, the configuration module receives the exposure time information, the configuration row address information and the row weight configuration state information sent by the buffer module, and generates control signals for controlling the operation of the grid electrode, the source electrode and the drain electrode of the computing array and the following actual exposure time length:
wherein T is the actual exposure time, T is the change time of the drain-source current in the interval under the current illumination intensity,the overexposure time is 1 ms-10 ms;
further, the reading module reads out the weight data stored in the row array after the array exposure is calculated to reach the actual illumination time, and the weight data are sequentially sent to the buffer memory module to remap the configuration state of each weight after being sequenced.
Further, the reordering module is responsible for the computation of the device after multi-threshold configuration for rearranging the stimulus in bits into the data format required for the array computation to assist the array computation.
Further, the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
The invention also provides a using method of the writing device based on the photoelectric storage integrated unit, which comprises the following specific steps:
(1) According to a network model to be deployed or the amount to be calculated, customizing an m multiplied by n photoelectric memory integrated unit to be arranged into an m multiplied by n row-column staggered crossbar calculation array structure, wherein the gate width of each memory unit is unequal from 0.6u to 3 u.
Wherein m represents the row number of the array, n represents the column number of the array, u represents the size of the array, and the photoelectric calculation integrated units with different gate widths have different data amounts or weight convergence lower limits, i.e. the calculation array is formed by arranging a plurality of threshold value calculation units.
(2) The nonlinear function curve formed by the leakage end current and the exposure time of any photoelectric calculation integrated unit is linearly approximated in a piecewise mode, k sections with different sizes are divided according to the leakage end current, and the exposure intensity of each section is different;
wherein k is the number of linear intervals after piecewise linear approximation, and also represents that the exposure intensity is divided into k types;
(3) The light intensity of the maximum value interval of the leakage current is used as initial exposure intensity, the exposure time of the intensity interval, the configuration row address, the weight to be stored and the weight address of the row are sequentially input into a buffer module, and the buffer module buffers the row weight and the weight configuration state and then transmits the configuration row address, the configuration state and the exposure time to the configuration module.
(4) The configuration module generates an array control signal to control the memory unit of a certain row of the array to be exposed simultaneously, and after the specified exposure time is counted, the configuration module drives the array to read all the weights stored in the row to the reading module. The read-out module sorts the read-out weights and sequentially distributes the read-out weights to the buffer module, and the buffer module compares the sizes of the received weights and the buffered weights and maps out new configuration states to complete a round of configuration work.
(5) Switching the exposure intensity, transmitting the new exposure time to a buffer module, transmitting the current configuration state of the row weight to the configuration module by the buffer module, driving the array by the configuration module to re-expose the row unqualified weight with the new exposure intensity, re-reading the weight after the new exposure time is reached, and distributing the weight to the buffer module by a reading module to map the new configuration state;
(6) Repeating the step (5) until the weight of one row is completely configured.
(7) And (3) sequentially switching the configuration row addresses and repeating the steps (3) - (6) to complete the whole configuration work of the multi-threshold device. Taking single writing and reading as one configuration work, the multi-threshold device performs m times of configuration work at most;
where m is the number of rows of the array and k is the number of sections approximated by segments.
(8) After the configuration work is completed, the excitation data for calculation are transmitted to a buffer module for buffer, a reordering module takes out the appointed number of excitation from the buffer module and rearranges the excitation into a format required by an array according to bit, the excitation data participate in calculation through a switch of a control grid signal, and finally the calculation result is read out by a reading module.
The invention also provides application of the multi-threshold configuration device based on the photoelectric storage integrated unit, wherein the application comprises the digital signal processing field used in the photoelectric detection field and the very large scale integrated circuit.
The present invention also provides a computing device comprising: one or more processors, storage devices; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method for using a photovoltaic-based writing apparatus according to the present invention.
The invention has the beneficial effects that:
the invention comprises a calculation array formed by devices with different sizes, and four areas which are approximately divided by linear segmentation, wherein each area corresponds to one light intensity, the time required by one-time input to the assigned weight is calculated, and the time input to the array is configured by a configuration module. The invention fully utilizes the physical characteristics of the photoelectric storage and calculation integrated unit, realizes a multi-threshold configuration device and method, greatly reduces the time required by the actual writing of the photoelectric storage and calculation integrated unit, avoids the error caused by the writing of devices, reduces the configuration links and improves the working efficiency.
Drawings
FIG. 1 is a schematic view of the structure of the device of the present invention. Wherein O is 00 ~O nm -n rows and m columns of photovoltaic storage integrated units.
Fig. 2 is a schematic diagram of a photoelectric storage integrated unit structure based on the present invention.
FIG. 3 is a graph of drain-source current as a function of exposure time and a linear approximation graph of an integrated photovoltaic cell of the present invention. Wherein Id (nA) -the magnitude of the vertical axis drain current, time(s) -the horizontal axis exposure Time length, A, B, C, D represents the interval of linear approximation, the corresponding illumination intensities l1, l2, l3, l4, and the exposure times t1, t2, t3, t4.
FIG. 4 is a diagram showing the internal arrangement of a computing array according to an embodiment of the present invention. Wherein, DAC-DAC, word Line-Word Line, bit Line, ADC-DAC, size-device Size, O 00 ~O 87 -9 x 8 photovoltaic integrated units, u-micron.
FIG. 5 is a flow chart of the operation of the multi-threshold configuration of the device of the present invention. The configuration start-starts configuration work, the Row-currently configures a certain Row of the array, the Light intensity-current illumination intensity, the exposure time required by the current configuration, the Weight and Addr-the Row to be configured with weights and Weight addresses, change Light intensity-changes the illumination intensity, and the configuration done-complete overall configuration work.
FIG. 6 is a schematic diagram of the sequencing operation of the device of the present invention.
Detailed Description
Example 1:
the invention aims to realize a multi-threshold configuration device by utilizing an integrated photoelectric storage and calculation unit, shorten writing time, reduce writing errors, further realize matrix multiplication by the device and accelerate matrix vector multiplication operation in a neural network.
As shown in fig. 1, the data flow of the multi-threshold configuration device mainly comprises a configuration path and a calculation path. The multi-threshold configuration device comprises a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module; the buffer memory module is connected with the configuration module, the calculation array, the reading module and the global control module in sequence, and the sequencing module is connected with the buffer memory module, the calculation array and the global control module respectively.
The system comprises a cache module, a computing array, a reading module and a global control module, wherein the cache module, the computing array, the reading module and the global control module are used as a shared module; the core computing array can be used for computing large-scale matrix multiplication, the configuration module can generate control signals for controlling the computing array to work, the storage functions of the photoelectric storage and calculation integrated units with different thresholds are realized through continuous switching between configuration rows and illumination intensity, and the sequencing module can expand convolution operation into matrix vector multiplication to assist the photoelectric storage and calculation integrated units to realize the computing functions.
The buffer memory module is used for realizing the functions of storing and updating data and backward distributing the data; the configuration module is connected with the cache module and the computing array and is used for generating an external control signal to control the writing and reading of the data of the computing array; the calculation array is formed by arranging a plurality of photoelectric calculation integrated units and is used for realizing a high-precision calculation function; the reading module is connected with the computing array and the cache module and is used for reading data stored in the array; the ordering module is used for generating a data format required by array calculation; the global control module is used for controlling data distribution and positioning the current configuration state. The device and the method can realize the configuration and calculation of multiple thresholds based on the photoelectric calculation integrated unit, shorten the time required by actual writing, reduce the error caused by writing of devices, reduce configuration links and improve the working efficiency.
Further, the buffer module is used for receiving weight data, calculation data, configuration row address, illumination intensity and illumination duration under the light intensity sent by the upper computer, buffering the weight and the excitation value, and updating the configuration state and the configuration information of the current weight.
Furthermore, the computing array is formed by arranging a plurality of photoelectric computing integrated units, each photoelectric computing integrated unit adopts a typical nor flash architecture and has different sizes, the thickness of a medium layer at the top layer and the medium layer at the bottom layer is unchanged, the lower limit threshold value of drain-source current in different photoelectric computing integrated units is represented only by changing the gate width or gate length of a device, and then the minimum data quantity which can be stored by different units is represented.
Further, the photoelectric integrated unit realizes electron transfer and storage through photoelectric conversion during exposure, photons are incident into a depletion region of the substrate, and the substrate absorbs the photons and excites electron hole pairs. And the photoelectrons move to the channel under the gate voltage to obtain energy, and finally enter the charge coupling layer under the drive of the gate oxide electric field to realize charge storage. In the process, the charge quantity of the charge coupling layer can influence the threshold voltage of the photoelectric storage integrated unit when the photoelectric storage integrated unit is started, so that the magnitude of the current at the drain end is influenced, and the quantity of photoelectrons entering the charge coupling layer when the photosensitive is read out by judging the magnitude of the current at the drain end. The nonlinear function curve formed by the drain current and the exposure time is approximated in a piecewise linear manner, different intervals are divided, each interval corresponds to one illumination intensity, the speed of hot electron injection is influenced by the different illumination intensities, and the linearity of the curve function between the current of the readout area and the exposure time in different sections is further influenced.
Further, the configuration module receives the exposure time information, the configuration row address information and the row weight configuration state information sent by the buffer module, and generates control signals for controlling the operation of the grid electrode, the source electrode and the drain electrode of the computing array and the following actual exposure time length:
wherein T is the actual exposure time, T is the change time of the drain-source current in the interval under the current illumination intensity,the overexposure time is 1 ms-10 ms;
further, the reading module reads out the weight data stored in the row array after the array exposure is calculated to reach the actual illumination time, and the weight data are sequentially sent to the buffer memory module to remap the configuration state of each weight after being sequenced.
Further, the reordering module is responsible for the computation of the device after multi-threshold configuration for rearranging the stimulus in bits into the data format required for the array computation to assist the array computation.
Further, the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
The invention can reduce the time required by the actual writing of the photoelectric storage and calculation integrated unit, avoid the error caused by the writing of the device, reduce the configuration links and improve the working efficiency, and has the main principle that: the time length for decreasing the weight by one unit in each photoelectric calculation unit is uncertain, so that in order to prevent excessive optical writing, the reading and checking are needed once for each input, for example, 99 times of input and 99 times of reading are needed for reducing the stored data quantity from 100 to 1; by changing the gate length or gate width of the device and approximating the weight value and the irregular curve of the input duration linearly in a piecewise manner, each piece corresponds to one illumination intensity, so that the data which can be stored by each device after configuration has a threshold lower limit, and thus, the data is reduced from 100 to 1, and only 1 input and 1 readout are needed.
Example 2
The implementation realizes the structure of the calculation array in the device under 4bit fixed point quantization, and the weight to be stored is 16 from 0 to 15. The calculation array consists of a plurality of photoelectric calculation integrated units, each photoelectric calculation integrated unit adopts a floating gate structure, the threshold voltage of the device can be changed in a way of storing electrons through the floating gate, and the device has photoelectric conversion capability which is not possessed by a conventional floating gate device. As shown in fig. 2, the integrated memory cell includes a carrier control region, a coupling region, a photogenerated carrier collection region, and a readout region. Wherein the P-type substrate can simultaneously carry out photosensitive and reading operations. When the photosensitive device is used for photosensitive, negative pressure pulse is applied to the P-type substrate, positive pressure pulse is applied to the control grid electrode, the P-type substrate generates a depletion layer for photoelectron collection, electrons are accelerated under the action of electric fields at two ends of the control grid electrode and the P-type substrate, pass through a bottom dielectric layer barrier between the P-type substrate and a charge coupling layer after reaching high enough energy, enter the coupling layer and finish storage. When reading, pulse voltage is applied to the control grid electrode to form a conducting channel between the N-type source end and the drain end, and then pulse voltage is applied to the N-type source end and the drain end to accelerate electrons in the conducting channel to form inter-drain-source current. The magnitude of the drain-source current is as follows:
wherein,,is the relative dielectric constant, +.>For a single electron charge quantity, ">For controlling the gate-to-floating gate capacitance +.>W, L, which are constant, represent the gate width and gate length, respectively, of the floating gate device, +.>Is the control gate voltage, ">Is the voltage between source and drain as the electrical input, ">Is the floating gate potential at the threshold value, +.>The number of photoelectrons stored in the charge coupled layer, i.e., the optical input, with the balance being constant. The drain-source current is subjected to the combined action of the gate length, the gate width, the control gate voltage, the number of electrons of the coupling layer and the source-drain voltage of the floating gate device, and electrons after the light input quantity and the electric input quantity are acted are output in a current mode. The amount of charge in the charge coupled layer affects the threshold voltage of the photo-integrated unit when the photo-integrated unit is turned on, thereby affecting the magnitude of the drain-source current. The quantity of photoelectrons entering the charge coupling layer during photosensitive can be represented by judging the magnitude of the drain-source current.
As shown in fig. 3 (a), the function curve of the drain-source current and the exposure time gradually slows down the decrease trend of the drain-source current with the increase of the exposure time, and the whole function image shows an irregular curve shape. The maximum value of the drain current may be regarded as the maximum weight 15, and the minimum value may be regarded as the minimum weight 0. Considering the effects of non-ideal factors such as device temperature drift, weight value, process materials and the like, the exposure time required by changing the weight value every 1bit cannot be determined in advance, so that in order to realize consistency correction of the memory cells, configuration work of each cell is usually completed by a plurality of exposure-verification-exposure-verification cycles, if the readout weight value is smaller than the standard weight value due to overexposure light during configuration, erasure work is required to be performed on the overexposed memory cells, a large amount of writing time is required for the operation, and the accuracy of drain-source current of a read area is influenced in the frequent erasing process of a Flash device, so that a certain writing error is brought.
In contrast, if the linear approximation principle is adopted, the nonlinear function curve formed by the drain current and the exposure time can be approximated in a piecewise linear manner, k sections with different sizes are divided according to the drain current, the curve of each section is changed in a linear manner, the exposure time required by unit weight change in a single section can be estimated accurately, and the writing error is avoided. Meanwhile, if the threshold value of the drain-source current drop of different photoelectric storage integrated units can be changed, the threshold value is used for representing different weight values, the process of erasing due to overexposure is avoided, certain generality is sacrificed, convenience and reliability are replaced, writing time is greatly shortened, and configuration efficiency is improved; therefore, according to the formula of the drain-source current, the length of the control gate is unchanged, 16 photoelectric calculation integrated units with different floating gate sizes and gate widths of 0.6 u-2.5 u are customized, the lower threshold limit of the drain-source current is respectively represented, and the weight value read under overexposure is further represented, wherein 1u represents 1 micron.
In addition, different illumination intensities can affect the speed of hot electron injection; the greater the light intensity, the faster the hot electron injection speed, and the greater the number of charges entering the charge coupled layer within the same exposure time, the faster the drain-source current drops. Conversely, the smaller the light intensity, the slower the hot electron injection speed, the smaller the number of charges entering the charge coupling layer in the same exposure time, and the slower the drain-source current drop. As shown in fig. 3 (b), under 4bit fixed point quantization, the curves of the drain-source current and the exposure time are approximately linearly formed into A, B, C, D sections from high to low, the exposure time corresponding to each section is sequentially t1, t2, t3 and t4, and the corresponding illumination intensity is sequentially l1, l2, l3 and l4, so that the curves in the 4 sections are approximately formed into 4 sections of linear lines, and the exposure time required by each change of the weight in the corresponding section can be obtained.
Finally, 16 photoelectric calculation integrated units with different sizes are customized, wherein the lower limit of the 0.6u characterization weight is 0, the lower limit of the 2.5u characterization weight is 15, and different units are exposed by adopting different illumination intensities. In this way, even if a certain cell is overexposed, the drain-source current (weight) stays at its lower threshold.
Take an example of an 8-channel convolution kernel of size 3*3 to construct the compute array. The computing array is formed by intersecting 72 photoelectric computing integrated units with different sizes into a structure of FIG. 4, control gates of the computing units in the same row are connected through Word lines, source ends and drain ends of the same column are connected through Bit lines, and the computing units O 00 ~O 87 The size of (2) is 0.6-2.5 u; the array control signal transmitted by the configuration module is converted into analog quantity by 9 digital-analog converters DAC at the periphery to control the starting of the photoelectric storage and calculation integrated unit, all units are closed after a certain time is reached, then the total current after the current of each column of the array is converged is converted into digital quantity by the analog-digital converters ADC, and the final calculation result is obtained through shift accumulation.
Example 3
In this embodiment, the configuration work of writing the device weight is completed by using the device calculation array portion formed by arranging the 16 photoelectric memory integrated units in embodiment 1. The configuration flow is shown in fig. 5, where configuration start indicates to start configuration work, row points to a certain Row of the current configuration array, light intensity is current illumination intensity, timing is exposure time required by the current configuration, weight and Addr are weights and Weight addresses to be configured for the Row, change Light intensity indicates to change illumination intensity, and configuration done indicates to complete overall configuration work.
Firstly, initializing Row as a first Row, initializing illumination intensity as l1, initializing exposure time as t1, transmitting a group of weights and weight addresses of the first Row to a cache module each time, setting a register group with the bit width of 5 and the depth of 8 by the cache module, sequentially storing the weights into the lower 4 bits of the corresponding addresses of the registers, mapping the configuration state of each weight by the highest bit, initializing all 1 to represent that all the weights are not successfully configured, and transmitting the configuration state of the weights configuring the Row and the 8bit and the exposure time to a configuration module after the registers are fully stored; the configuration module generates control signals for controlling the operation of the grid electrode, the source electrode and the drain electrode of the computing array according to the configuration state of the current weight value (the weight value configuration state is given to the grid control signals, and the source electrode and the drain electrode control signals are all pulled up) so that 8 photoelectric calculation integrated units in the first row of the computing array are simultaneously exposed, and meanwhile, the following actual exposure time T is generated, so that the photoelectric calculation integrated units configured each time can be overexposed to improve the configuration accuracy:
wherein,,for the current exposure time, +.>For the overexposure duration, in ms, the relationship with the current exposure time is as follows:
the configuration module sets a counter, and when the counter counts the appointed exposure time, the array control signal is unchanged, and all the selected row weights are read out to the reading module; the reading module sorts the 8 read weights and distributes the weights to the cache module in sequence; the buffer module compares the received weight with the weight in the register group, and updates the state of the up-to-standard weight, at this time, all the weights in the A interval are configured, and the weights in the other intervals are written into the weight corresponding to the 800nA drain current.
And switching the illumination intensity to l2, the exposure time to t2, and transmitting the configuration state of the current weight to the configuration module by the buffer module, wherein the configuration module drives the array to re-expose the unqualified weight of the line with new exposure intensity t2, and after the unqualified weight reaches the new exposure time, re-reading the weight and distributing the weight to the buffer module by the reading module, wherein the buffer module continuously maps the configuration state of the current weight. And repeating the steps to sequentially switch the illumination intensity to l3 and l4, and switching the exposure time to t3 and t4 to complete the configuration work of calculating the first row of the array.
And repeating the configuration steps to sequentially finish the configuration work of the 2 nd to 9 th rows. Each row configuration is completed, the illumination intensity is reset to l1, and the exposure time is reset to t1. The single writing and reading are regarded as completing one configuration, the whole device needs 4*4 configuration works, which are far lower than 4 x 16 times of conventional write-while-check, and along with the increase of quantization precision, the device has more remarkable advantages in writing time and precision.
Example 4
This embodiment uses the configured compute array of embodiment 2 to implement the compute function. The multiplication is performed according to the formula of drain-source current in embodiment 1, whereinAs optical input, i.e. first bit multiplier, ">As an electrical input, i.e. the second bit multiplier, the gate voltage is controlled +>Keep constant, drain-source current +.>I.e., the result of the multiplication operation, is equivalent to the following equation:
wherein a, b and k are constants, X is a first multiplier, Y is a second multiplier, and R is the operation result, namely drain-source current.
As shown in fig. 6, the rule of convolution calculation may be implemented by matrix multiplication. The numbers in the figure represent numbering information, converting the convolution of 8 3*3 convolution kernels with 4*4 input excitation into the result of the multiplication of 4 1*9 vectors with the 9*8 convolution kernel matrix.
The buffer module buffers the input 8bit excitation data, the sequencing module reorders the excitation into a format required by the computing array, only 3*3 required excitation are taken each time, and each bit of the excitation is sequentially used as a gate control signal to be sent into the computing array.
Taking the multiplication of vector a and matrix W as an example:
w is the weight of 9*8, and the array is calculated by exposure writing; a is 8 excitation elements, the excitation elements are sent into a calculation array through electric signals, and each element in A is subjected to the following binary conversion:
the elements in the vector A are serially input from the control grid, and binary data of different bits are input from the lowest bit in a time sharing way. When the lowest-bit data is input, the weight stored in the computing array is multiplied by the corresponding bit of the vector lowest-bit data, so that the following formula operation is realized:
before current convergence, the calculation result of each photoelectric calculation unit in the calculation array of 9*8 is as follows:
after the convergence of the column currents, the outputs of each column are mutually connected, and the output result of the matrix vector multiplication is obtained as follows:
the second low bit of the vector is input into a control grid electrode by a reordering module after the conversion of an analog-to-digital converter ADC, a multiplication result of the second low bit and a weight matrix is obtained after the convergence and accumulation of column current, and the result is shifted to the left by 1bit and added with the lowest bit multiplication result of the buffer memory; the calculation process of the rest bits is the same as that of the rest bits, and after all bits are shifted and accumulated, the final operation result of multiplying the vector A by the weight matrix W can be read out by the reading module. And so on ultimately allows for the accelerated operation of the convolution.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.
Claims (9)
1. The multi-threshold configuration device based on the photoelectric storage and calculation integrated unit is characterized by comprising a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module; the buffer memory module is sequentially connected with the configuration module, the calculation array, the reading module and the global control module, and the sequencing module is respectively connected with the buffer memory module, the calculation array and the global control module;
the buffer memory module is used for realizing the functions of storing and updating data and backward distributing the data; the configuration module is used for generating an external control signal to control the writing and reading of the calculation array data; the calculation array is formed by arranging a plurality of photoelectric calculation integrated units, and the photoelectric calculation integrated units have different convergence lower limits of data quantity or weight values which can be stored; the reading module is used for reading data stored in the array; the ordering module is used for generating a data format required by array calculation; the global control module is used for controlling data distribution and positioning the current configuration state;
the cache module transmits the configuration row address, the configuration state and the exposure time to the configuration module; the configuration module generates an array control signal to control the memory unit of a certain row of the array to be exposed simultaneously, and after the specified exposure time is counted, the configuration module drives the array to read all the weights stored in the row to the reading module; the reading module sorts the read weights and sequentially distributes the weights to the cache module; the sequencing module takes out the appointed number of excitation from the buffer module and rearranges the excitation into a format required by the array according to bit positions, and the final calculation result is read out by the reading module; the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
2. The multi-threshold configuration device according to claim 1, wherein the buffer module is configured to receive weight data, calculation data, configuration row address, illumination intensity, and illumination duration under the light intensity sent by the upper computer; the caching module is used for caching the weight and the incentive value and updating the configuration state and the configuration information of the current weight.
3. The multi-threshold configuration device according to claim 1, wherein the computing array is formed by arranging a plurality of photo-electric storage integrated units, each photo-electric storage integrated unit has different sizes, the thickness of a dielectric layer at a top layer and a dielectric layer at a bottom layer is unchanged, and the lower limit threshold value of drain-source current in different photo-electric storage integrated units is represented only by changing the gate width or gate length of a device, so that the minimum weight data which can be stored by different units is represented.
4. The multi-threshold configuration device according to claim 1, wherein the photoelectric calculation unit realizes transfer and storage of electrons through photoelectric conversion when exposing, photons are incident into a depletion region of the substrate, and the substrate absorbs the photons and excites electron-hole pairs; photoelectrons move to a channel under the gate voltage to obtain energy, and finally enter a charge coupling layer under the drive of a gate oxide electric field to realize charge storage; in the process, the charge quantity of the charge coupling layer can influence the threshold voltage when the photoelectric storage integrated unit is started, so that the magnitude of the current at the drain end is influenced, and the quantity of photoelectrons entering the charge coupling layer during sensitization can be read out by judging the magnitude of the current at the drain end; the nonlinear function curve formed by the drain current and the exposure time is approximated in a piecewise linear manner, different intervals are divided, each interval corresponds to one illumination intensity, the speed of hot electron injection is influenced by the different illumination intensities, and the linearity of the curve function between the current of the readout area and the exposure time in different sections is further influenced.
5. The multi-threshold configuration device according to claim 4, wherein the configuration module receives the exposure time information, the configuration row address information and the row weight configuration status information sent by the buffer module, and generates a control signal for controlling the operation of the computing array gate, the source and the drain, and the following actual exposure time length:
wherein T is the actual exposure time, T is the change time of the drain-source current in the interval under the current illumination intensity,the overexposure time is 1 ms-10 ms.
6. The multi-threshold configuration device according to claim 1, wherein the reading module reads out the weight data stored in the row array where the exposure occurs after calculating the exposure time of the array to reach the actual illumination time, and sequentially passes the buffer module to remap the configuration state of each weight after sequencing.
7. The multi-threshold configuration device of claim 1 wherein the ordering module is responsible for the computation of the device after multi-threshold configuration for splitting and rearranging stimuli into data formats required for array computation to aid array computation; the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
8. A method for using a multi-threshold configuration device based on an integrated photovoltaic computing unit according to any one of claims 1 to 7, characterized in that the specific steps include:
(1) According to a network model to be deployed or the amount to be calculated, customizing m multiplied by n photoelectric memory integrated units to be arranged into an m multiplied by n row-column staggered crossbar computing array structure, wherein the gate width of each memory unit is unequal from 0.6u to 3 u;
wherein m represents the row number of the array, n represents the column number of the array, u represents the size of the array, and the photoelectric calculation integrated units with different gate widths have different convergence lower limits of the data quantity or weight which can be stored, namely the calculation array is formed by arranging a plurality of threshold value calculation units;
(2) The nonlinear function curve formed by the leakage end current and the exposure time of any photoelectric calculation integrated unit is linearly approximated in a piecewise mode, k sections with different sizes are divided according to the leakage end current, and the exposure intensity of each section is different;
wherein k is the number of linear intervals after piecewise linear approximation, and also represents the total k types of exposure intensities;
(3) Taking the light intensity of the current maximum value interval of the drain end as initial exposure intensity, sequentially inputting the exposure time, the configuration row address, the weight to be stored and the weight address of the row into a buffer module, and transmitting the configuration row address, the configuration state and the exposure time to the configuration module after the buffer module buffers the weight and the weight configuration state of the row;
(4) The configuration module generates an array control signal to control the memory unit of a certain row of the array to be exposed simultaneously, and after the specified exposure time is counted, the configuration module drives the array to read all the weights stored in the row to the reading module; the read-out module sorts the read weights and sequentially distributes the read weights to the buffer module, and the buffer module compares the sizes of the received weights and the buffered weights and maps out new configuration states to complete a round of configuration work;
(5) Switching the exposure intensity, transmitting the new exposure time to a buffer module, transmitting the current configuration state of the row weight to the configuration module by the buffer module, driving the array by the configuration module to re-expose the row unqualified weight with the new exposure intensity, re-reading the weight after the new exposure time is reached, and distributing the weight to the buffer module by a reading module to map the new configuration state;
(6) Repeating the step (5) until the weight of one row is completely configured;
(7) Sequentially switching configuration row addresses and repeating the steps (3) - (6) to complete all configuration work of the multi-threshold configuration device; taking single writing and reading as one configuration work, the multi-threshold configuration device performs m times of configuration work at most;
wherein m is the number of rows of the array, and k is the number of sections similar to the section;
(8) After the configuration work is completed, the excitation data for calculation are transmitted to the buffer module for buffer, the ordering module takes out the appointed number of excitation from the buffer module and rearranges the excitation into the format required by the array according to the bit, the excitation data participate in the calculation through the switch of the control grid signal, and the final calculation result is read out by the reading module.
9. A computing device, comprising: one or more processors, storage devices; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of use of claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310341643.1A CN116049094B (en) | 2023-04-03 | 2023-04-03 | Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310341643.1A CN116049094B (en) | 2023-04-03 | 2023-04-03 | Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116049094A CN116049094A (en) | 2023-05-02 |
CN116049094B true CN116049094B (en) | 2023-07-21 |
Family
ID=86116843
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310341643.1A Active CN116049094B (en) | 2023-04-03 | 2023-04-03 | Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116049094B (en) |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110276440B (en) * | 2019-05-19 | 2023-03-24 | 南京惟心光电系统有限公司 | Convolution operation accelerator based on photoelectric calculation array and method thereof |
CN110263297B (en) * | 2019-05-25 | 2023-03-24 | 南京惟心光电系统有限公司 | Control method for working state of matrix vector multiplier |
CN110647983B (en) * | 2019-09-30 | 2023-03-24 | 南京大学 | Self-supervision learning acceleration system and method based on storage and calculation integrated device array |
-
2023
- 2023-04-03 CN CN202310341643.1A patent/CN116049094B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN116049094A (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
TWI774147B (en) | Pulse convolutional neural network algorithm and related integrated circuits and method of manufacture thereof, computing devices and storage media | |
Le Gallo et al. | A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference | |
JP7542638B2 (en) | In-memory computing architecture and method for performing MAC operations - Patents.com | |
US7098437B2 (en) | Semiconductor integrated circuit device having a plurality of photo detectors and processing elements | |
US9697075B2 (en) | Efficient search for optimal read thresholds in flash memory | |
Li et al. | A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references | |
US11797643B2 (en) | Apparatus and method for matrix multiplication using processing-in-memory | |
Xiang et al. | Efficient and robust spike-driven deep convolutional neural networks based on NOR flash computing array | |
CN112601037B (en) | Floating gate device-based image sensing and storage integrated pixel unit and pixel array | |
Xiang et al. | Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation | |
CN114615445A (en) | Photoelectric transistor and photosensitive method thereof | |
CN115390789A (en) | Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method | |
CN110244817B (en) | Partial differential equation solver based on photoelectric computing array and method thereof | |
CN110113548B (en) | CMOS image sensor and signal transmission method thereof | |
Parmar et al. | Demonstration of Differential Mode FeFET-Array for multi-precision storage and IMC applications | |
CN112700810B (en) | CMOS sense-memory integrated circuit structure integrating memristors | |
CN116049094B (en) | Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit | |
Zeng et al. | MLFlash-CIM: Embedded multi-level NOR-flash cell based computing in memory architecture for edge AI devices | |
JP2019004358A (en) | Imaging apparatus and imaging system | |
CN1774769A (en) | Method of programming dual cell memory device to store multiple data states per cell | |
US11863899B2 (en) | CMOS image sensor, image sensor unit and signal transmission methods therefor | |
US9706143B2 (en) | Readout circuit and method of using the same | |
Choi et al. | Implementation of an On-Chip Learning Neural Network IC Using Highly Linear Charge Trap Device | |
Wang et al. | 34.9 A Flash-SRAM-ADC-Fused Plastic Computing-in-Memory Macro for Learning in Neural Networks in a Standard 14nm FinFET Process | |
Lu et al. | Compute-in-RRAM with limited on-chip resources |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |