CN116049094B - Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit - Google Patents

Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit Download PDF

Info

Publication number
CN116049094B
CN116049094B CN202310341643.1A CN202310341643A CN116049094B CN 116049094 B CN116049094 B CN 116049094B CN 202310341643 A CN202310341643 A CN 202310341643A CN 116049094 B CN116049094 B CN 116049094B
Authority
CN
China
Prior art keywords
module
configuration
array
calculation
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310341643.1A
Other languages
Chinese (zh)
Other versions
CN116049094A (en
Inventor
王宇宣
梅正宇
张博书
崔展豪
李楠
潘红兵
彭成磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN202310341643.1A priority Critical patent/CN116049094B/en
Publication of CN116049094A publication Critical patent/CN116049094A/en
Application granted granted Critical
Publication of CN116049094B publication Critical patent/CN116049094B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7821Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06EOPTICAL COMPUTING DEVICES; COMPUTING DEVICES USING OTHER RADIATIONS WITH SIMILAR PROPERTIES
    • G06E3/00Devices not provided for in group G06E1/00, e.g. for processing analogue or hybrid data
    • G06E3/001Analogue devices in which mathematical operations are carried out with the aid of optical or electro-optical elements
    • G06E3/005Analogue devices in which mathematical operations are carried out with the aid of optical or electro-optical elements using electro-optical or opto-electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Mathematical Physics (AREA)
  • Nonlinear Science (AREA)
  • Optics & Photonics (AREA)
  • Transforming Light Signals Into Electric Signals (AREA)

Abstract

The invention discloses a multi-threshold configuration device and a method based on a photoelectric storage and calculation integrated unit, belonging to the field of photoelectric detection and the field of digital signal processing of very large scale integrated circuits. The device comprises a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module, wherein the cache module is used for realizing the storage, updating and distribution of data; the configuration module is used for generating an external control signal to control the writing and reading of the calculation array data; the computing array is used for realizing a high-precision calculation function; the reading module reads data stored in the array; the ordering module is used for generating a data format required by array calculation; the global control module is used for controlling data distribution and locating the current configuration state. The device and the method can realize the configuration and calculation of multiple thresholds based on the photoelectric memory integrated unit, shorten the time required by actual writing, reduce the error caused by writing of devices, reduce configuration links and improve the working efficiency.

Description

Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit
Technical Field
The invention relates to a multi-threshold configuration device and a method thereof based on a photoelectric storage and calculation integrated unit, belonging to the field of photoelectric detection and the field of digital signal processing of very large scale integrated circuits.
Background
Most conventional computers employ von neumann architecture, and instructions and data are stored in the same memory, resulting in excessive dependence of the system on memory. The separation of simultaneous storage and computation results in the speed of the CPU accessing memory limiting the speed of system operation.
The integrated memory technology solves the problem of data handling power consumption caused by separation of the memory cells from the computing cells under the traditional von neumann architecture, and the memory cells are used for participating in logic computation, so that greater computational power and higher energy efficiency can be provided in specific fields. At present, the market development drive of the integrated memory and calculation is very strong, and besides AI calculation, the memory and calculation technology is widely applied to the integrated memory and calculation chip and the brain-like chip.
In 2009, the center of Nanjing university sensing and imaging technology is based on the acceleration injection effect of an N-type floating gate transistor on photo-generated electrons in a depletion region by a substrate electric field under the strong inversion condition of a substrate bias, so that the pixel structure of which all photosensitive-reading-addressing and resetting functions can be completed by using only a single device under a standard CMOS process is successfully realized, the duty ratio of pixels and the potential of further shrinking the pixels are greatly improved, and the new breakthrough of the sensor technology is realized.
The above-mentioned integrated photoelectric memory device still has a certain disadvantage, because the time of each optical writing cannot be predicted, only a short step length is written each time to ensure accuracy, and the device is read out immediately after writing to ensure the accuracy of the storage process. Even so, the optical writing process still causes some errors due to some non-ideal factors, and moreover, the configuration work of each unit is completed by a plurality of exposure-verification-exposure-verification cycles, which occupies a great deal of writing time. Therefore, how to avoid the writing error and shorten the writing time becomes the first difficulty to be solved.
Disclosure of Invention
In order to shorten the time required for actually writing the integrated photoelectric storage and calculation unit, reduce the error caused by writing devices, reduce configuration links and improve the working efficiency, the invention provides a multi-threshold configuration device and a multi-threshold configuration method based on the integrated photoelectric storage and calculation unit. The multi-threshold configuration device based on the photoelectric storage and calculation integrated unit solves the problems of writing errors, long configuration time, low efficiency and the like based on the modes of piecewise linear approximation and the like through the arrangement of the configuration module, the calculation array and the like.
The technical scheme adopted by the invention is as follows:
a multi-threshold configuration device based on an integrated photoelectric storage and calculation unit comprises a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module; the buffer memory module is connected with the configuration module, the calculation array, the reading module and the global control module in sequence, and the sequencing module is connected with the buffer memory module, the calculation array and the global control module respectively.
Further, the buffer module is used for receiving weight data, calculation data, configuration row address, illumination intensity and illumination duration under the light intensity sent by the upper computer, buffering the weight and the excitation value, and updating the configuration state and the configuration information of the current weight.
Furthermore, the computing array is formed by arranging a plurality of photoelectric computing integrated units, each photoelectric computing integrated unit adopts a typical nor flash architecture and has different sizes, the thickness of a medium layer at the top layer and the medium layer at the bottom layer is unchanged, the lower limit threshold value of drain-source current in different photoelectric computing integrated units is represented only by changing the gate width or gate length of a device, and then the minimum data quantity which can be stored by different units is represented.
Further, the photoelectric integrated unit realizes electron transfer and storage through photoelectric conversion during exposure, photons are incident into a depletion region of the substrate, and the substrate absorbs the photons and excites electron hole pairs. And the photoelectrons move to the channel under the gate voltage to obtain energy, and finally enter the charge coupling layer under the drive of the gate oxide electric field to realize charge storage. In the process, the charge quantity of the charge coupling layer can influence the threshold voltage of the photoelectric storage integrated unit when the photoelectric storage integrated unit is started, so that the magnitude of the current at the drain end is influenced, and the quantity of photoelectrons entering the charge coupling layer when the photosensitive is read out by judging the magnitude of the current at the drain end. The nonlinear function curve formed by the drain current and the exposure time is approximated in a piecewise linear manner, different intervals are divided, each interval corresponds to one illumination intensity, the speed of hot electron injection is influenced by the different illumination intensities, and the linearity of the curve function between the current of the readout area and the exposure time in different sections is further influenced.
Further, the configuration module receives the exposure time information, the configuration row address information and the row weight configuration state information sent by the buffer module, and generates control signals for controlling the operation of the grid electrode, the source electrode and the drain electrode of the computing array and the following actual exposure time length:
wherein T is the actual exposure time, T is the change time of the drain-source current in the interval under the current illumination intensity,the overexposure time is 1 ms-10 ms;
further, the reading module reads out the weight data stored in the row array after the array exposure is calculated to reach the actual illumination time, and the weight data are sequentially sent to the buffer memory module to remap the configuration state of each weight after being sequenced.
Further, the reordering module is responsible for the computation of the device after multi-threshold configuration for rearranging the stimulus in bits into the data format required for the array computation to assist the array computation.
Further, the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
The invention also provides a using method of the writing device based on the photoelectric storage integrated unit, which comprises the following specific steps:
(1) According to a network model to be deployed or the amount to be calculated, customizing an m multiplied by n photoelectric memory integrated unit to be arranged into an m multiplied by n row-column staggered crossbar calculation array structure, wherein the gate width of each memory unit is unequal from 0.6u to 3 u.
Wherein m represents the row number of the array, n represents the column number of the array, u represents the size of the array, and the photoelectric calculation integrated units with different gate widths have different data amounts or weight convergence lower limits, i.e. the calculation array is formed by arranging a plurality of threshold value calculation units.
(2) The nonlinear function curve formed by the leakage end current and the exposure time of any photoelectric calculation integrated unit is linearly approximated in a piecewise mode, k sections with different sizes are divided according to the leakage end current, and the exposure intensity of each section is different;
wherein k is the number of linear intervals after piecewise linear approximation, and also represents that the exposure intensity is divided into k types;
(3) The light intensity of the maximum value interval of the leakage current is used as initial exposure intensity, the exposure time of the intensity interval, the configuration row address, the weight to be stored and the weight address of the row are sequentially input into a buffer module, and the buffer module buffers the row weight and the weight configuration state and then transmits the configuration row address, the configuration state and the exposure time to the configuration module.
(4) The configuration module generates an array control signal to control the memory unit of a certain row of the array to be exposed simultaneously, and after the specified exposure time is counted, the configuration module drives the array to read all the weights stored in the row to the reading module. The read-out module sorts the read-out weights and sequentially distributes the read-out weights to the buffer module, and the buffer module compares the sizes of the received weights and the buffered weights and maps out new configuration states to complete a round of configuration work.
(5) Switching the exposure intensity, transmitting the new exposure time to a buffer module, transmitting the current configuration state of the row weight to the configuration module by the buffer module, driving the array by the configuration module to re-expose the row unqualified weight with the new exposure intensity, re-reading the weight after the new exposure time is reached, and distributing the weight to the buffer module by a reading module to map the new configuration state;
(6) Repeating the step (5) until the weight of one row is completely configured.
(7) And (3) sequentially switching the configuration row addresses and repeating the steps (3) - (6) to complete the whole configuration work of the multi-threshold device. Taking single writing and reading as one configuration work, the multi-threshold device performs m times of configuration work at most;
where m is the number of rows of the array and k is the number of sections approximated by segments.
(8) After the configuration work is completed, the excitation data for calculation are transmitted to a buffer module for buffer, a reordering module takes out the appointed number of excitation from the buffer module and rearranges the excitation into a format required by an array according to bit, the excitation data participate in calculation through a switch of a control grid signal, and finally the calculation result is read out by a reading module.
The invention also provides application of the multi-threshold configuration device based on the photoelectric storage integrated unit, wherein the application comprises the digital signal processing field used in the photoelectric detection field and the very large scale integrated circuit.
The present invention also provides a computing device comprising: one or more processors, storage devices; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method for using a photovoltaic-based writing apparatus according to the present invention.
The invention has the beneficial effects that:
the invention comprises a calculation array formed by devices with different sizes, and four areas which are approximately divided by linear segmentation, wherein each area corresponds to one light intensity, the time required by one-time input to the assigned weight is calculated, and the time input to the array is configured by a configuration module. The invention fully utilizes the physical characteristics of the photoelectric storage and calculation integrated unit, realizes a multi-threshold configuration device and method, greatly reduces the time required by the actual writing of the photoelectric storage and calculation integrated unit, avoids the error caused by the writing of devices, reduces the configuration links and improves the working efficiency.
Drawings
FIG. 1 is a schematic view of the structure of the device of the present invention. Wherein O is 00 ~O nm -n rows and m columns of photovoltaic storage integrated units.
Fig. 2 is a schematic diagram of a photoelectric storage integrated unit structure based on the present invention.
FIG. 3 is a graph of drain-source current as a function of exposure time and a linear approximation graph of an integrated photovoltaic cell of the present invention. Wherein Id (nA) -the magnitude of the vertical axis drain current, time(s) -the horizontal axis exposure Time length, A, B, C, D represents the interval of linear approximation, the corresponding illumination intensities l1, l2, l3, l4, and the exposure times t1, t2, t3, t4.
FIG. 4 is a diagram showing the internal arrangement of a computing array according to an embodiment of the present invention. Wherein, DAC-DAC, word Line-Word Line, bit Line, ADC-DAC, size-device Size, O 00 ~O 87 -9 x 8 photovoltaic integrated units, u-micron.
FIG. 5 is a flow chart of the operation of the multi-threshold configuration of the device of the present invention. The configuration start-starts configuration work, the Row-currently configures a certain Row of the array, the Light intensity-current illumination intensity, the exposure time required by the current configuration, the Weight and Addr-the Row to be configured with weights and Weight addresses, change Light intensity-changes the illumination intensity, and the configuration done-complete overall configuration work.
FIG. 6 is a schematic diagram of the sequencing operation of the device of the present invention.
Detailed Description
Example 1:
the invention aims to realize a multi-threshold configuration device by utilizing an integrated photoelectric storage and calculation unit, shorten writing time, reduce writing errors, further realize matrix multiplication by the device and accelerate matrix vector multiplication operation in a neural network.
As shown in fig. 1, the data flow of the multi-threshold configuration device mainly comprises a configuration path and a calculation path. The multi-threshold configuration device comprises a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module; the buffer memory module is connected with the configuration module, the calculation array, the reading module and the global control module in sequence, and the sequencing module is connected with the buffer memory module, the calculation array and the global control module respectively.
The system comprises a cache module, a computing array, a reading module and a global control module, wherein the cache module, the computing array, the reading module and the global control module are used as a shared module; the core computing array can be used for computing large-scale matrix multiplication, the configuration module can generate control signals for controlling the computing array to work, the storage functions of the photoelectric storage and calculation integrated units with different thresholds are realized through continuous switching between configuration rows and illumination intensity, and the sequencing module can expand convolution operation into matrix vector multiplication to assist the photoelectric storage and calculation integrated units to realize the computing functions.
The buffer memory module is used for realizing the functions of storing and updating data and backward distributing the data; the configuration module is connected with the cache module and the computing array and is used for generating an external control signal to control the writing and reading of the data of the computing array; the calculation array is formed by arranging a plurality of photoelectric calculation integrated units and is used for realizing a high-precision calculation function; the reading module is connected with the computing array and the cache module and is used for reading data stored in the array; the ordering module is used for generating a data format required by array calculation; the global control module is used for controlling data distribution and positioning the current configuration state. The device and the method can realize the configuration and calculation of multiple thresholds based on the photoelectric calculation integrated unit, shorten the time required by actual writing, reduce the error caused by writing of devices, reduce configuration links and improve the working efficiency.
Further, the buffer module is used for receiving weight data, calculation data, configuration row address, illumination intensity and illumination duration under the light intensity sent by the upper computer, buffering the weight and the excitation value, and updating the configuration state and the configuration information of the current weight.
Furthermore, the computing array is formed by arranging a plurality of photoelectric computing integrated units, each photoelectric computing integrated unit adopts a typical nor flash architecture and has different sizes, the thickness of a medium layer at the top layer and the medium layer at the bottom layer is unchanged, the lower limit threshold value of drain-source current in different photoelectric computing integrated units is represented only by changing the gate width or gate length of a device, and then the minimum data quantity which can be stored by different units is represented.
Further, the photoelectric integrated unit realizes electron transfer and storage through photoelectric conversion during exposure, photons are incident into a depletion region of the substrate, and the substrate absorbs the photons and excites electron hole pairs. And the photoelectrons move to the channel under the gate voltage to obtain energy, and finally enter the charge coupling layer under the drive of the gate oxide electric field to realize charge storage. In the process, the charge quantity of the charge coupling layer can influence the threshold voltage of the photoelectric storage integrated unit when the photoelectric storage integrated unit is started, so that the magnitude of the current at the drain end is influenced, and the quantity of photoelectrons entering the charge coupling layer when the photosensitive is read out by judging the magnitude of the current at the drain end. The nonlinear function curve formed by the drain current and the exposure time is approximated in a piecewise linear manner, different intervals are divided, each interval corresponds to one illumination intensity, the speed of hot electron injection is influenced by the different illumination intensities, and the linearity of the curve function between the current of the readout area and the exposure time in different sections is further influenced.
Further, the configuration module receives the exposure time information, the configuration row address information and the row weight configuration state information sent by the buffer module, and generates control signals for controlling the operation of the grid electrode, the source electrode and the drain electrode of the computing array and the following actual exposure time length:
wherein T is the actual exposure time, T is the change time of the drain-source current in the interval under the current illumination intensity,the overexposure time is 1 ms-10 ms;
further, the reading module reads out the weight data stored in the row array after the array exposure is calculated to reach the actual illumination time, and the weight data are sequentially sent to the buffer memory module to remap the configuration state of each weight after being sequenced.
Further, the reordering module is responsible for the computation of the device after multi-threshold configuration for rearranging the stimulus in bits into the data format required for the array computation to assist the array computation.
Further, the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
The invention can reduce the time required by the actual writing of the photoelectric storage and calculation integrated unit, avoid the error caused by the writing of the device, reduce the configuration links and improve the working efficiency, and has the main principle that: the time length for decreasing the weight by one unit in each photoelectric calculation unit is uncertain, so that in order to prevent excessive optical writing, the reading and checking are needed once for each input, for example, 99 times of input and 99 times of reading are needed for reducing the stored data quantity from 100 to 1; by changing the gate length or gate width of the device and approximating the weight value and the irregular curve of the input duration linearly in a piecewise manner, each piece corresponds to one illumination intensity, so that the data which can be stored by each device after configuration has a threshold lower limit, and thus, the data is reduced from 100 to 1, and only 1 input and 1 readout are needed.
Example 2
The implementation realizes the structure of the calculation array in the device under 4bit fixed point quantization, and the weight to be stored is 16 from 0 to 15. The calculation array consists of a plurality of photoelectric calculation integrated units, each photoelectric calculation integrated unit adopts a floating gate structure, the threshold voltage of the device can be changed in a way of storing electrons through the floating gate, and the device has photoelectric conversion capability which is not possessed by a conventional floating gate device. As shown in fig. 2, the integrated memory cell includes a carrier control region, a coupling region, a photogenerated carrier collection region, and a readout region. Wherein the P-type substrate can simultaneously carry out photosensitive and reading operations. When the photosensitive device is used for photosensitive, negative pressure pulse is applied to the P-type substrate, positive pressure pulse is applied to the control grid electrode, the P-type substrate generates a depletion layer for photoelectron collection, electrons are accelerated under the action of electric fields at two ends of the control grid electrode and the P-type substrate, pass through a bottom dielectric layer barrier between the P-type substrate and a charge coupling layer after reaching high enough energy, enter the coupling layer and finish storage. When reading, pulse voltage is applied to the control grid electrode to form a conducting channel between the N-type source end and the drain end, and then pulse voltage is applied to the N-type source end and the drain end to accelerate electrons in the conducting channel to form inter-drain-source current. The magnitude of the drain-source current is as follows:
wherein,,is the relative dielectric constant, +.>For a single electron charge quantity, ">For controlling the gate-to-floating gate capacitance +.>W, L, which are constant, represent the gate width and gate length, respectively, of the floating gate device, +.>Is the control gate voltage, ">Is the voltage between source and drain as the electrical input, ">Is the floating gate potential at the threshold value, +.>The number of photoelectrons stored in the charge coupled layer, i.e., the optical input, with the balance being constant. The drain-source current is subjected to the combined action of the gate length, the gate width, the control gate voltage, the number of electrons of the coupling layer and the source-drain voltage of the floating gate device, and electrons after the light input quantity and the electric input quantity are acted are output in a current mode. The amount of charge in the charge coupled layer affects the threshold voltage of the photo-integrated unit when the photo-integrated unit is turned on, thereby affecting the magnitude of the drain-source current. The quantity of photoelectrons entering the charge coupling layer during photosensitive can be represented by judging the magnitude of the drain-source current.
As shown in fig. 3 (a), the function curve of the drain-source current and the exposure time gradually slows down the decrease trend of the drain-source current with the increase of the exposure time, and the whole function image shows an irregular curve shape. The maximum value of the drain current may be regarded as the maximum weight 15, and the minimum value may be regarded as the minimum weight 0. Considering the effects of non-ideal factors such as device temperature drift, weight value, process materials and the like, the exposure time required by changing the weight value every 1bit cannot be determined in advance, so that in order to realize consistency correction of the memory cells, configuration work of each cell is usually completed by a plurality of exposure-verification-exposure-verification cycles, if the readout weight value is smaller than the standard weight value due to overexposure light during configuration, erasure work is required to be performed on the overexposed memory cells, a large amount of writing time is required for the operation, and the accuracy of drain-source current of a read area is influenced in the frequent erasing process of a Flash device, so that a certain writing error is brought.
In contrast, if the linear approximation principle is adopted, the nonlinear function curve formed by the drain current and the exposure time can be approximated in a piecewise linear manner, k sections with different sizes are divided according to the drain current, the curve of each section is changed in a linear manner, the exposure time required by unit weight change in a single section can be estimated accurately, and the writing error is avoided. Meanwhile, if the threshold value of the drain-source current drop of different photoelectric storage integrated units can be changed, the threshold value is used for representing different weight values, the process of erasing due to overexposure is avoided, certain generality is sacrificed, convenience and reliability are replaced, writing time is greatly shortened, and configuration efficiency is improved; therefore, according to the formula of the drain-source current, the length of the control gate is unchanged, 16 photoelectric calculation integrated units with different floating gate sizes and gate widths of 0.6 u-2.5 u are customized, the lower threshold limit of the drain-source current is respectively represented, and the weight value read under overexposure is further represented, wherein 1u represents 1 micron.
In addition, different illumination intensities can affect the speed of hot electron injection; the greater the light intensity, the faster the hot electron injection speed, and the greater the number of charges entering the charge coupled layer within the same exposure time, the faster the drain-source current drops. Conversely, the smaller the light intensity, the slower the hot electron injection speed, the smaller the number of charges entering the charge coupling layer in the same exposure time, and the slower the drain-source current drop. As shown in fig. 3 (b), under 4bit fixed point quantization, the curves of the drain-source current and the exposure time are approximately linearly formed into A, B, C, D sections from high to low, the exposure time corresponding to each section is sequentially t1, t2, t3 and t4, and the corresponding illumination intensity is sequentially l1, l2, l3 and l4, so that the curves in the 4 sections are approximately formed into 4 sections of linear lines, and the exposure time required by each change of the weight in the corresponding section can be obtained.
Finally, 16 photoelectric calculation integrated units with different sizes are customized, wherein the lower limit of the 0.6u characterization weight is 0, the lower limit of the 2.5u characterization weight is 15, and different units are exposed by adopting different illumination intensities. In this way, even if a certain cell is overexposed, the drain-source current (weight) stays at its lower threshold.
Take an example of an 8-channel convolution kernel of size 3*3 to construct the compute array. The computing array is formed by intersecting 72 photoelectric computing integrated units with different sizes into a structure of FIG. 4, control gates of the computing units in the same row are connected through Word lines, source ends and drain ends of the same column are connected through Bit lines, and the computing units O 00 ~O 87 The size of (2) is 0.6-2.5 u; the array control signal transmitted by the configuration module is converted into analog quantity by 9 digital-analog converters DAC at the periphery to control the starting of the photoelectric storage and calculation integrated unit, all units are closed after a certain time is reached, then the total current after the current of each column of the array is converged is converted into digital quantity by the analog-digital converters ADC, and the final calculation result is obtained through shift accumulation.
Example 3
In this embodiment, the configuration work of writing the device weight is completed by using the device calculation array portion formed by arranging the 16 photoelectric memory integrated units in embodiment 1. The configuration flow is shown in fig. 5, where configuration start indicates to start configuration work, row points to a certain Row of the current configuration array, light intensity is current illumination intensity, timing is exposure time required by the current configuration, weight and Addr are weights and Weight addresses to be configured for the Row, change Light intensity indicates to change illumination intensity, and configuration done indicates to complete overall configuration work.
Firstly, initializing Row as a first Row, initializing illumination intensity as l1, initializing exposure time as t1, transmitting a group of weights and weight addresses of the first Row to a cache module each time, setting a register group with the bit width of 5 and the depth of 8 by the cache module, sequentially storing the weights into the lower 4 bits of the corresponding addresses of the registers, mapping the configuration state of each weight by the highest bit, initializing all 1 to represent that all the weights are not successfully configured, and transmitting the configuration state of the weights configuring the Row and the 8bit and the exposure time to a configuration module after the registers are fully stored; the configuration module generates control signals for controlling the operation of the grid electrode, the source electrode and the drain electrode of the computing array according to the configuration state of the current weight value (the weight value configuration state is given to the grid control signals, and the source electrode and the drain electrode control signals are all pulled up) so that 8 photoelectric calculation integrated units in the first row of the computing array are simultaneously exposed, and meanwhile, the following actual exposure time T is generated, so that the photoelectric calculation integrated units configured each time can be overexposed to improve the configuration accuracy:
wherein,,for the current exposure time, +.>For the overexposure duration, in ms, the relationship with the current exposure time is as follows:
the configuration module sets a counter, and when the counter counts the appointed exposure time, the array control signal is unchanged, and all the selected row weights are read out to the reading module; the reading module sorts the 8 read weights and distributes the weights to the cache module in sequence; the buffer module compares the received weight with the weight in the register group, and updates the state of the up-to-standard weight, at this time, all the weights in the A interval are configured, and the weights in the other intervals are written into the weight corresponding to the 800nA drain current.
And switching the illumination intensity to l2, the exposure time to t2, and transmitting the configuration state of the current weight to the configuration module by the buffer module, wherein the configuration module drives the array to re-expose the unqualified weight of the line with new exposure intensity t2, and after the unqualified weight reaches the new exposure time, re-reading the weight and distributing the weight to the buffer module by the reading module, wherein the buffer module continuously maps the configuration state of the current weight. And repeating the steps to sequentially switch the illumination intensity to l3 and l4, and switching the exposure time to t3 and t4 to complete the configuration work of calculating the first row of the array.
And repeating the configuration steps to sequentially finish the configuration work of the 2 nd to 9 th rows. Each row configuration is completed, the illumination intensity is reset to l1, and the exposure time is reset to t1. The single writing and reading are regarded as completing one configuration, the whole device needs 4*4 configuration works, which are far lower than 4 x 16 times of conventional write-while-check, and along with the increase of quantization precision, the device has more remarkable advantages in writing time and precision.
Example 4
This embodiment uses the configured compute array of embodiment 2 to implement the compute function. The multiplication is performed according to the formula of drain-source current in embodiment 1, whereinAs optical input, i.e. first bit multiplier, ">As an electrical input, i.e. the second bit multiplier, the gate voltage is controlled +>Keep constant, drain-source current +.>I.e., the result of the multiplication operation, is equivalent to the following equation:
wherein a, b and k are constants, X is a first multiplier, Y is a second multiplier, and R is the operation result, namely drain-source current.
As shown in fig. 6, the rule of convolution calculation may be implemented by matrix multiplication. The numbers in the figure represent numbering information, converting the convolution of 8 3*3 convolution kernels with 4*4 input excitation into the result of the multiplication of 4 1*9 vectors with the 9*8 convolution kernel matrix.
The buffer module buffers the input 8bit excitation data, the sequencing module reorders the excitation into a format required by the computing array, only 3*3 required excitation are taken each time, and each bit of the excitation is sequentially used as a gate control signal to be sent into the computing array.
Taking the multiplication of vector a and matrix W as an example:
w is the weight of 9*8, and the array is calculated by exposure writing; a is 8 excitation elements, the excitation elements are sent into a calculation array through electric signals, and each element in A is subjected to the following binary conversion:
the elements in the vector A are serially input from the control grid, and binary data of different bits are input from the lowest bit in a time sharing way. When the lowest-bit data is input, the weight stored in the computing array is multiplied by the corresponding bit of the vector lowest-bit data, so that the following formula operation is realized:
before current convergence, the calculation result of each photoelectric calculation unit in the calculation array of 9*8 is as follows:
after the convergence of the column currents, the outputs of each column are mutually connected, and the output result of the matrix vector multiplication is obtained as follows:
the second low bit of the vector is input into a control grid electrode by a reordering module after the conversion of an analog-to-digital converter ADC, a multiplication result of the second low bit and a weight matrix is obtained after the convergence and accumulation of column current, and the result is shifted to the left by 1bit and added with the lowest bit multiplication result of the buffer memory; the calculation process of the rest bits is the same as that of the rest bits, and after all bits are shifted and accumulated, the final operation result of multiplying the vector A by the weight matrix W can be read out by the reading module. And so on ultimately allows for the accelerated operation of the convolution.
The present invention is not limited to the above-mentioned embodiments, and any changes or substitutions that can be easily understood by those skilled in the art within the technical scope of the present invention are intended to be included in the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims (9)

1. The multi-threshold configuration device based on the photoelectric storage and calculation integrated unit is characterized by comprising a cache module, a configuration module, a calculation array, a reading module, a sequencing module and a global control module; the buffer memory module is sequentially connected with the configuration module, the calculation array, the reading module and the global control module, and the sequencing module is respectively connected with the buffer memory module, the calculation array and the global control module;
the buffer memory module is used for realizing the functions of storing and updating data and backward distributing the data; the configuration module is used for generating an external control signal to control the writing and reading of the calculation array data; the calculation array is formed by arranging a plurality of photoelectric calculation integrated units, and the photoelectric calculation integrated units have different convergence lower limits of data quantity or weight values which can be stored; the reading module is used for reading data stored in the array; the ordering module is used for generating a data format required by array calculation; the global control module is used for controlling data distribution and positioning the current configuration state;
the cache module transmits the configuration row address, the configuration state and the exposure time to the configuration module; the configuration module generates an array control signal to control the memory unit of a certain row of the array to be exposed simultaneously, and after the specified exposure time is counted, the configuration module drives the array to read all the weights stored in the row to the reading module; the reading module sorts the read weights and sequentially distributes the weights to the cache module; the sequencing module takes out the appointed number of excitation from the buffer module and rearranges the excitation into a format required by the array according to bit positions, and the final calculation result is read out by the reading module; the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
2. The multi-threshold configuration device according to claim 1, wherein the buffer module is configured to receive weight data, calculation data, configuration row address, illumination intensity, and illumination duration under the light intensity sent by the upper computer; the caching module is used for caching the weight and the incentive value and updating the configuration state and the configuration information of the current weight.
3. The multi-threshold configuration device according to claim 1, wherein the computing array is formed by arranging a plurality of photo-electric storage integrated units, each photo-electric storage integrated unit has different sizes, the thickness of a dielectric layer at a top layer and a dielectric layer at a bottom layer is unchanged, and the lower limit threshold value of drain-source current in different photo-electric storage integrated units is represented only by changing the gate width or gate length of a device, so that the minimum weight data which can be stored by different units is represented.
4. The multi-threshold configuration device according to claim 1, wherein the photoelectric calculation unit realizes transfer and storage of electrons through photoelectric conversion when exposing, photons are incident into a depletion region of the substrate, and the substrate absorbs the photons and excites electron-hole pairs; photoelectrons move to a channel under the gate voltage to obtain energy, and finally enter a charge coupling layer under the drive of a gate oxide electric field to realize charge storage; in the process, the charge quantity of the charge coupling layer can influence the threshold voltage when the photoelectric storage integrated unit is started, so that the magnitude of the current at the drain end is influenced, and the quantity of photoelectrons entering the charge coupling layer during sensitization can be read out by judging the magnitude of the current at the drain end; the nonlinear function curve formed by the drain current and the exposure time is approximated in a piecewise linear manner, different intervals are divided, each interval corresponds to one illumination intensity, the speed of hot electron injection is influenced by the different illumination intensities, and the linearity of the curve function between the current of the readout area and the exposure time in different sections is further influenced.
5. The multi-threshold configuration device according to claim 4, wherein the configuration module receives the exposure time information, the configuration row address information and the row weight configuration status information sent by the buffer module, and generates a control signal for controlling the operation of the computing array gate, the source and the drain, and the following actual exposure time length:
wherein T is the actual exposure time, T is the change time of the drain-source current in the interval under the current illumination intensity,the overexposure time is 1 ms-10 ms.
6. The multi-threshold configuration device according to claim 1, wherein the reading module reads out the weight data stored in the row array where the exposure occurs after calculating the exposure time of the array to reach the actual illumination time, and sequentially passes the buffer module to remap the configuration state of each weight after sequencing.
7. The multi-threshold configuration device of claim 1 wherein the ordering module is responsible for the computation of the device after multi-threshold configuration for splitting and rearranging stimuli into data formats required for array computation to aid array computation; the global control module is used for distinguishing the working state of the device, controlling the cache module to distribute data, positioning the current configuration progress and selecting a proper calculation array control signal.
8. A method for using a multi-threshold configuration device based on an integrated photovoltaic computing unit according to any one of claims 1 to 7, characterized in that the specific steps include:
(1) According to a network model to be deployed or the amount to be calculated, customizing m multiplied by n photoelectric memory integrated units to be arranged into an m multiplied by n row-column staggered crossbar computing array structure, wherein the gate width of each memory unit is unequal from 0.6u to 3 u;
wherein m represents the row number of the array, n represents the column number of the array, u represents the size of the array, and the photoelectric calculation integrated units with different gate widths have different convergence lower limits of the data quantity or weight which can be stored, namely the calculation array is formed by arranging a plurality of threshold value calculation units;
(2) The nonlinear function curve formed by the leakage end current and the exposure time of any photoelectric calculation integrated unit is linearly approximated in a piecewise mode, k sections with different sizes are divided according to the leakage end current, and the exposure intensity of each section is different;
wherein k is the number of linear intervals after piecewise linear approximation, and also represents the total k types of exposure intensities;
(3) Taking the light intensity of the current maximum value interval of the drain end as initial exposure intensity, sequentially inputting the exposure time, the configuration row address, the weight to be stored and the weight address of the row into a buffer module, and transmitting the configuration row address, the configuration state and the exposure time to the configuration module after the buffer module buffers the weight and the weight configuration state of the row;
(4) The configuration module generates an array control signal to control the memory unit of a certain row of the array to be exposed simultaneously, and after the specified exposure time is counted, the configuration module drives the array to read all the weights stored in the row to the reading module; the read-out module sorts the read weights and sequentially distributes the read weights to the buffer module, and the buffer module compares the sizes of the received weights and the buffered weights and maps out new configuration states to complete a round of configuration work;
(5) Switching the exposure intensity, transmitting the new exposure time to a buffer module, transmitting the current configuration state of the row weight to the configuration module by the buffer module, driving the array by the configuration module to re-expose the row unqualified weight with the new exposure intensity, re-reading the weight after the new exposure time is reached, and distributing the weight to the buffer module by a reading module to map the new configuration state;
(6) Repeating the step (5) until the weight of one row is completely configured;
(7) Sequentially switching configuration row addresses and repeating the steps (3) - (6) to complete all configuration work of the multi-threshold configuration device; taking single writing and reading as one configuration work, the multi-threshold configuration device performs m times of configuration work at most;
wherein m is the number of rows of the array, and k is the number of sections similar to the section;
(8) After the configuration work is completed, the excitation data for calculation are transmitted to the buffer module for buffer, the ordering module takes out the appointed number of excitation from the buffer module and rearranges the excitation into the format required by the array according to the bit, the excitation data participate in the calculation through the switch of the control grid signal, and the final calculation result is read out by the reading module.
9. A computing device, comprising: one or more processors, storage devices; a storage means for storing one or more programs; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of use of claim 8.
CN202310341643.1A 2023-04-03 2023-04-03 Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit Active CN116049094B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310341643.1A CN116049094B (en) 2023-04-03 2023-04-03 Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310341643.1A CN116049094B (en) 2023-04-03 2023-04-03 Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit

Publications (2)

Publication Number Publication Date
CN116049094A CN116049094A (en) 2023-05-02
CN116049094B true CN116049094B (en) 2023-07-21

Family

ID=86116843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310341643.1A Active CN116049094B (en) 2023-04-03 2023-04-03 Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit

Country Status (1)

Country Link
CN (1) CN116049094B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110276440B (en) * 2019-05-19 2023-03-24 南京惟心光电系统有限公司 Convolution operation accelerator based on photoelectric calculation array and method thereof
CN110263297B (en) * 2019-05-25 2023-03-24 南京惟心光电系统有限公司 Control method for working state of matrix vector multiplier
CN110647983B (en) * 2019-09-30 2023-03-24 南京大学 Self-supervision learning acceleration system and method based on storage and calculation integrated device array

Also Published As

Publication number Publication date
CN116049094A (en) 2023-05-02

Similar Documents

Publication Publication Date Title
TWI774147B (en) Pulse convolutional neural network algorithm and related integrated circuits and method of manufacture thereof, computing devices and storage media
Le Gallo et al. A 64-core mixed-signal in-memory compute chip based on phase-change memory for deep neural network inference
JP7542638B2 (en) In-memory computing architecture and method for performing MAC operations - Patents.com
US7098437B2 (en) Semiconductor integrated circuit device having a plurality of photo detectors and processing elements
US9697075B2 (en) Efficient search for optimal read thresholds in flash memory
Li et al. A 40-nm MLC-RRAM compute-in-memory macro with sparsity control, on-chip write-verify, and temperature-independent ADC references
US11797643B2 (en) Apparatus and method for matrix multiplication using processing-in-memory
Xiang et al. Efficient and robust spike-driven deep convolutional neural networks based on NOR flash computing array
CN112601037B (en) Floating gate device-based image sensing and storage integrated pixel unit and pixel array
Xiang et al. Analog deep neural network based on NOR flash computing array for high speed/energy efficiency computation
CN114615445A (en) Photoelectric transistor and photosensitive method thereof
CN115390789A (en) Magnetic tunnel junction calculation unit-based analog domain full-precision memory calculation circuit and method
CN110244817B (en) Partial differential equation solver based on photoelectric computing array and method thereof
CN110113548B (en) CMOS image sensor and signal transmission method thereof
Parmar et al. Demonstration of Differential Mode FeFET-Array for multi-precision storage and IMC applications
CN112700810B (en) CMOS sense-memory integrated circuit structure integrating memristors
CN116049094B (en) Multi-threshold configuration device and method based on photoelectric storage and calculation integrated unit
Zeng et al. MLFlash-CIM: Embedded multi-level NOR-flash cell based computing in memory architecture for edge AI devices
JP2019004358A (en) Imaging apparatus and imaging system
CN1774769A (en) Method of programming dual cell memory device to store multiple data states per cell
US11863899B2 (en) CMOS image sensor, image sensor unit and signal transmission methods therefor
US9706143B2 (en) Readout circuit and method of using the same
Choi et al. Implementation of an On-Chip Learning Neural Network IC Using Highly Linear Charge Trap Device
Wang et al. 34.9 A Flash-SRAM-ADC-Fused Plastic Computing-in-Memory Macro for Learning in Neural Networks in a Standard 14nm FinFET Process
Lu et al. Compute-in-RRAM with limited on-chip resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant