CN111611195A - Software-definable storage and calculation integrated chip and software definition method thereof - Google Patents

Software-definable storage and calculation integrated chip and software definition method thereof Download PDF

Info

Publication number
CN111611195A
CN111611195A CN201910143132.2A CN201910143132A CN111611195A CN 111611195 A CN111611195 A CN 111611195A CN 201910143132 A CN201910143132 A CN 201910143132A CN 111611195 A CN111611195 A CN 111611195A
Authority
CN
China
Prior art keywords
arithmetic operation
flash memory
module
register file
configuration information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910143132.2A
Other languages
Chinese (zh)
Inventor
王绍迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Witinmem Technology Co ltd
Original Assignee
Beijing Witinmem Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Witinmem Technology Co ltd filed Critical Beijing Witinmem Technology Co ltd
Priority to CN201910143132.2A priority Critical patent/CN111611195A/en
Priority to PCT/CN2019/081339 priority patent/WO2020172951A1/en
Publication of CN111611195A publication Critical patent/CN111611195A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7882Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for self reconfiguration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Microcomputers (AREA)
  • Logic Circuits (AREA)

Abstract

The invention provides a software-definable integral storage chip and a software definition method thereof, wherein a flash memory processing array of the software-definable integral storage chip comprises a plurality of flash memory processing sub-arrays used for respectively executing different analog vector-matrix multiplication operations, the programmable arithmetic operation module comprises a plurality of programmable arithmetic operation units for respectively realizing different arithmetic operations, the control module carries out combined configuration on each module in the integrated storage and calculation chip according to the configuration information of practical application and the information of the finite-state machine, realizes the dynamic configuration of the circuit structure in the chip, enables the chip to flexibly adjust the circuit structure of the chip according to practical tasks, and peripheral circuits such as ADC, DAC, register, programmable arithmetic unit, etc. can realize multiplexing, and further, the circuit area is reduced, the requirements of integration and miniaturization are met, and the chip cost is effectively reduced.

Description

Software-definable storage and calculation integrated chip and software definition method thereof
Technical Field
The invention relates to the field of semiconductor integrated circuits, in particular to a software-definable storage and calculation integrated chip and a software definition method thereof.
Background
Flash memory is a type of non-volatile memory that achieves the storage of data by regulating the threshold voltage of the flash memory transistors. Flash memories are largely classified into NOR-type flash memories and NAND-type flash memories according to the difference in flash transistors and array structures. The read-write of the NAND-type flash memory takes pages and blocks as units, has large capacity and low cost, and is widely applied to large-scale independent memories; NOR-type flash memory supports random access of data, has a lower density, smaller capacity, higher cost than NAND-type flash memory, and is mainly applied to embedded memories.
In recent years, In order to solve the bottleneck of the traditional von neumann Computing architecture, a Memory-In-Memory (CIM) chip architecture is gaining wide attention, and the basic idea is to directly utilize a Memory to perform logic computation, so as to reduce the data transmission amount and transmission distance between the Memory and a processor, reduce power consumption and improve performance.
Once the existing storage and computation integrated chip architecture is customized, the circuit structure is fixed and cannot be flexibly adjusted according to actual tasks, and circuit modules cannot be shared, so that the circuit area is large, and the requirements of integration and miniaturization cannot be met.
Disclosure of Invention
In view of the above, the present invention provides a storage and computation integrated chip, method, apparatus and device capable of being defined by software, which are capable of performing dynamic configuration on a circuit structure of the chip according to actual application requirements by using a plurality of flash memory processing sub-arrays, a plurality of programmable arithmetic operation units and a control module, and performing flexible adjustment according to actual tasks, and realizing multiplexing of peripheral circuits such as an ADC, a DAC, a register, a programmable arithmetic operation unit, etc., thereby reducing a circuit area and adapting to requirements of integration and miniaturization.
In order to achieve the purpose, the invention adopts the following technical scheme:
in a first aspect, there is provided a software definable computational integrated chip comprising: a flash memory processing array, a programmable arithmetic operation module and a control module connected with the flash memory processing array and the programmable arithmetic operation module,
the flash memory processing array comprises a plurality of flash memory processing sub-arrays for respectively performing different analog vector-matrix multiplication operations;
the programmable arithmetic operation module comprises a plurality of programmable arithmetic operation units for respectively realizing different arithmetic operations;
the control module performs combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to the configuration information, and realizes dynamic configuration of a circuit structure in the chip.
Further, the software definable storage integrated chip further comprises:
the input interface module is used for receiving external input data;
the input register file is connected with the input interface module and used for storing the external input data or the data to be processed;
the input end of the digital-to-analog conversion module is connected with the input register file, the output end of the digital-to-analog conversion module is connected with the flash memory processing array and is used for converting the external input data or the data to be processed into an analog signal and outputting the analog signal to the flash memory processing array, and the flash memory processing array performs analog vector-matrix multiplication operation on the analog signal and outputs an operation result;
the input end of the analog-to-digital conversion module is connected with the flash memory processing array, the output end of the analog-to-digital conversion module is connected with the programmable arithmetic operation module and is used for converting the analog vector-matrix multiplication operation result into a digital signal and outputting the digital signal to the programmable arithmetic operation module, and the programmable arithmetic operation module performs arithmetic operation on the digital signal and outputs an arithmetic operation result;
the output register file is connected with the programmable arithmetic operation module and the input register file and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result to the input register file as the data to be processed;
the output interface module is connected with the output register file, receives the output data of the output register file and outputs the output data outwards;
the control module is connected with the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module, and is used for dynamically configuring the circuit module according to actual application requirements.
Furthermore, the output end of the input register file is also connected with the programmable arithmetic operation module.
Further, a plurality of the programmable arithmetic operation units are connected in series, each of the programmable arithmetic operation units including: a demultiplexer, an arithmetic operation subunit and a multiplexer;
the input end of the multi-path distributor is connected with a programmable arithmetic operation unit or the analog-to-digital conversion module, one output end of the multi-path distributor is connected with the arithmetic operation subunit, the other output end and the output end of the arithmetic operation subunit are connected with the next programmable arithmetic operation unit or the output register file through the multi-path selector, and the control end of the multi-path distributor is connected with the control module.
Further, the software definable storage integrated chip further comprises: the programming circuit is connected with the source electrode, the grid electrode and/or the substrate of each flash memory unit in the flash memory processing array and is used for regulating and controlling the threshold voltage of the flash memory unit;
wherein, this programming circuit includes: a voltage generating circuit for generating a program voltage or an erase voltage, and a voltage control circuit for applying the program voltage to a selected flash memory cell.
Further, the software definable storage integrated chip further comprises:
and the row-column decoder is connected with the flash memory processing array and the control module and is used for performing row-column decoding on the flash memory processing array under the control of the control module.
Further, the control module dynamically configures each circuit module connected to the control module according to configuration information, where the configuration information includes: the method comprises the following steps of dynamically configuring each circuit module connected with a flash memory processing subarray according to configuration information, wherein the configuration information of the flash memory processing subarray, the configuration information of a programmable arithmetic operation unit, the configuration information of a digital-to-analog conversion module, the configuration information of an analog-to-digital conversion module, the configuration information of an input interface module, the configuration information of an output interface module, the configuration information of an input register file and the configuration information of an output register file comprises the following steps:
dividing the flash memory processing array into a plurality of flash memory processing sub-arrays according to the configuration information of the flash memory processing sub-arrays, and controlling the working time sequence of the plurality of flash memory processing sub-arrays;
controlling the working states of the demultiplexer and the multiplexer corresponding to each programmable arithmetic operation unit according to the configuration information of the programmable arithmetic operation unit, so that the plurality of programmable arithmetic operation units realize random combination operation;
controlling the opening and closing state of a digital-to-analog conversion circuit participating in an actual task according to the configuration information of the digital-to-analog conversion module;
controlling the on-off state of an analog-to-digital conversion circuit participating in an actual task according to the configuration information of the analog-to-digital conversion module;
controlling the on-off state of an input interface circuit participating in an actual task according to the configuration information of the input interface module;
controlling the on-off state of an output interface circuit participating in an actual task according to the configuration information of the output interface module;
controlling the data to be stored in the input register to be from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file;
and controlling the output register file to output the data therein or output the data to the input register file as the data to be processed according to the configuration information of the output register file.
In a second aspect, there is provided a software-defined method for a software-definable storage integrated chip, the software-defined method being applied to the software-definable storage integrated chip, the software-defined method including:
acquiring configuration information and finite state machine information;
configuring an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module according to the configuration information to realize the dynamic configuration of a circuit structure in a chip;
and controlling the working time sequence of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to the information of the finite state machine.
Further, the software definition method comprises the following steps:
dividing the flash memory processing array into a plurality of flash memory processing sub-arrays according to the configuration information of the flash memory processing sub-arrays, and controlling the working time sequence of the plurality of flash memory processing sub-arrays according to the finite state machine information;
and controlling the working state of the selector corresponding to each programmable arithmetic operation unit according to the configuration information of the programmable arithmetic operation unit, so that the plurality of programmable arithmetic operation units realize random combination operation, and controlling the working time sequence of the plurality of programmable arithmetic operation units according to the finite state machine information.
In a third aspect, an electronic device is provided, which includes the above-mentioned storage integrated chip capable of being defined by software.
The invention provides a software-definable integral computing chip, a method and an electronic device, wherein a flash memory processing array of the software-definable integral computing chip comprises a plurality of flash memory processing sub-arrays used for respectively executing different analog vector-matrix multiplication operations, the programmable arithmetic operation module comprises a plurality of programmable arithmetic operation units for respectively realizing different arithmetic operations, the control module carries out combined configuration on each module of the integrated storage and calculation chip according to the configuration information of practical application and the information of the finite-state machine, realizes the dynamic configuration of the circuit structure in the chip, enables the chip to flexibly adjust the circuit structure in the chip according to the practical task, and peripheral circuits such as ADC, DAC, register, programmable arithmetic unit, etc. can realize multiplexing, and further, the circuit area is reduced, the requirements of integration and miniaturization are met, and the chip cost is effectively reduced.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a first block diagram of a software definable storage integrated chip according to an embodiment of the present invention;
FIG. 2 is a second block diagram of a storage and computation integrated chip that can be defined by software according to an embodiment of the present invention;
FIG. 3 is a block diagram of a programmable arithmetic unit 30 in a software definable memory integrated chip according to an embodiment of the present invention;
FIG. 4 is a diagram of a programmable arithmetic operation subunit of a storage-computation integrated chip that can be defined by software according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a programmable arithmetic operation module in a software definable storage and computation integrated chip according to an embodiment of the present invention to implement a compound operation;
FIG. 6 is a first block diagram of a flash memory processing sub-array in a software definable cost-integrated chip according to an embodiment of the present invention;
FIG. 7 is a second block diagram of a flash memory processing sub-array in a storage-compute monolithic chip that is software definable according to an embodiment of the present invention;
FIG. 8 is a block diagram of a third exemplary embodiment of a flash memory processing sub-array in a storage-compute monolithic chip that can be software defined;
FIG. 9 is a third block diagram of a storage and computation integrated chip that can be defined by software according to an embodiment of the present invention;
FIG. 10 is a flow chart of a software definable method according to an embodiment of the invention;
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Once the existing storage and computation integrated chip architecture is customized, the circuit structure is fixed, flexible adjustment cannot be performed according to actual tasks, and circuit modules cannot be shared, so that the circuit area is large.
To solve the above problems in the prior art, embodiments of the present invention provide a software-definable integrated memory chip, a method and an electronic device, where a flash memory processing array of the software-definable integrated memory chip includes a plurality of flash memory processing sub-arrays for respectively performing different analog vector-matrix multiplication operations, a programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations, and a control module performs combined configuration on each module of the integrated memory chip according to configuration information of actual applications and finite-state machine information to implement dynamic configuration of a circuit structure in the chip, so that the chip can flexibly adjust the circuit structure in the chip according to actual tasks, and peripheral circuits such as an ADC, a DAC, a register and a programmable arithmetic operation unit can implement multiplexing, thereby reducing the circuit area, the requirements of integration and miniaturization are met.
Fig. 1 is a first structural diagram of a storage integrated chip capable of being defined by software according to an embodiment of the present invention. As shown in fig. 1, the software definable storage chip includes: a flash memory processing array 20, a programmable arithmetic operation module 30, and a control module 10 connected to the flash memory processing array 20 and the programmable arithmetic operation module 30,
the flash processing array 20 includes a plurality of flash processing sub-arrays (not shown in fig. 1) for respectively performing different analog vector-matrix multiplication operations.
The flash memory processing subarrays may be flash memory processing subarrays having the same structure, or the structures of the flash memory processing subarrays may be set to be different according to actual application requirements, for example, the number of rows and the number of columns of each flash memory processing subarray may be set according to actual application requirements, which is not limited in this embodiment of the present invention.
The programmable arithmetic operation module 30 includes a plurality of programmable arithmetic operation units (not shown in fig. 1) for respectively implementing different arithmetic operations.
The programmable arithmetic operation unit is implemented in hardware for performing a specific arithmetic operation.
Wherein the arithmetic operation comprises: one or more of multiplication, addition, subtraction, division, shift, activation function, maximum value, minimum value, average value, pooling, etc.
The control module 10 performs combined configuration on an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module in the chip according to the configuration information and the finite-state machine information, so as to realize dynamic configuration of a circuit structure in the chip.
The configuration information and the finite-state machine information can be obtained through a compiling tool according to the actual application requirements.
The configuration information is usually static, such as the state of each module participating in the task, the configuration size of each unit; configuration information is typically stored in memory and scheduled before the task runs. The finite state machine information is dynamic in general, and controls the time sequence and the state of the actual task when the task runs.
Specifically, the control module 10 performs combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to the configuration information, selects the flash memory processing sub-array and the programmable arithmetic operation unit which are put into operation, and controls a combined pairing manner of the flash memory processing sub-array and the programmable arithmetic operation unit to realize specific operation.
It can be understood that each programmable arithmetic operation unit in the plurality of programmable arithmetic operation units can realize one or more arithmetic operations, and the plurality of programmable arithmetic operation units can be arranged and combined to form a plurality of composite operations, and can realize a plurality of combination configurations in cooperation with the plurality of flash memory processing sub-arrays, thereby realizing complex operation functions.
As can be seen from the above description, in the software definable storage and calculation integrated chip provided in the embodiments of the present invention, the flash memory processing array includes a plurality of flash memory processing sub-arrays for respectively performing different analog vector-matrix multiplication operations, the programmable arithmetic operation module includes a plurality of programmable arithmetic operation units for respectively implementing different arithmetic operations, and the control module performs combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to the configuration information, so as to implement dynamic configuration of the chip architecture, which not only can flexibly adjust the chip architecture according to actual tasks, but also can implement various complex operation functions, and peripheral circuits such as the ADC, the DAC, the register, the programmable arithmetic operation unit, etc. can implement multiplexing, thereby reducing the circuit area and adapting to the needs of integration and miniaturization.
In an alternative embodiment, referring to fig. 2, the software definable cost integrated chip may further include: an input interface module 40, an input register file 50, a digital-to-analog conversion module 60, an analog-to-digital conversion module 70, an output register file 80, and an output interface module 90.
The input interface module 40 has an input end connected to an external device for receiving input data (i.e. data to be operated) from the external device.
The input end of the input register file 50 is connected to the output end of the input interface module 40 for temporarily storing the input data or a data to be processed.
The input end of the digital-to-analog conversion module 60 is connected to the output end of the input register file 50, the output end is connected to the input end of the flash memory processing array 20, and the digital-to-analog conversion module is configured to convert the external input data or the data to be processed output from the input register file 50 into an analog signal and output the analog signal to the flash memory processing array 20, and the flash memory processing array 20 performs an analog vector-matrix multiplication operation on the analog signal and outputs an analog vector-matrix multiplication operation result.
The analog-to-digital conversion module 70 has an input end connected to the flash memory processing array 20, an output end connected to the programmable arithmetic operation module 30, and is configured to convert the analog vector-matrix multiplication result into a digital signal and output the digital signal to the programmable arithmetic operation module 30, and the programmable arithmetic operation module 30 performs an arithmetic operation on the digital signal and outputs an arithmetic operation result.
The output register file 80 has an input end connected to the programmable arithmetic operation module 30 and an output end connected to the input register file 50, and is used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result as the data to be processed to the input register file 50.
The input terminal of the output interface module 90 is connected to the output terminal of the output register file 80, receives the output data of the output register file 80, and outputs the output data to an external device.
The control module 10 is connected to the input interface module 40, the input register file 50, the digital-to-analog conversion module 60, the flash memory processing array 20, the analog-to-digital conversion module 70, the output register file 80, the programmable arithmetic operation module 30, and the output interface module 90, and is configured to dynamically configure the circuit modules according to configuration information.
The control module 10 dynamically configures each circuit module connected to it according to configuration information, where the configuration information includes: flash memory processing sub-array 201~20nConfiguration information, programmable arithmetic operation unit 301~30nThe configuration information of the digital-to-analog conversion module 60, the configuration information of the analog-to-digital conversion module 70, the configuration information of the input interface module 40, the configuration information of the output interface module 90, the configuration information of the input register file 50, the configuration information of the output register file 80, and the like, and the dynamic configuration of each circuit module connected thereto according to the configuration information may include the following:
processing the sub-array 20 according to the flash memory1~20nDivides the flash processing array 20 into a plurality of flash processing sub-arrays 201~20nAnd controls a plurality of flash memory processing sub-arrays 201~20nThe operation timing of (2).
According to the programmable arithmetic operation unit 301~30nThe configuration information controls the working state of the selector corresponding to each programmable arithmetic operation unit, so that the plurality of programmable arithmetic operation units realize random combination operation to participate in work.
Controlling the on-off state of the digital-to-analog conversion circuit participating in the actual task according to the configuration information of the digital-to-analog conversion module 60;
controlling the on-off state of the analog-to-digital conversion circuit participating in the actual task according to the configuration information of the analog-to-digital conversion module 70;
controlling the on-off state of the input interface circuit participating in the actual task according to the configuration information of the input interface module 40;
controlling the on-off state of the output interface circuit participating in the actual task according to the configuration information of the output interface module 90;
controlling the data to be stored in the input register to be from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file 50;
the output register file 80 is controlled to output the data therein or output the data as the data to be processed to the input register file 50 according to the configuration information of the output register file 80.
Specifically, the input terminal of the input register file 50 is connected to the output terminal of the input interface module 40 and the output terminal of the output register file 80 through a Multiplexer (MUX)110 to selectively receive the external input data from the input interface module 40 or the data to be processed from the output register file 80. The control module 10 is connected to the Multiplexer (MUX)100, and controls the multiplexer 100 according to the configuration information, thereby controlling whether the input register file 50 receives the external input data or the data to be processed.
The digital-to-analog conversion module 60 selectively connects the plurality of flash memory processing sub-arrays (20) through a Demultiplexer (DEMUX)1201~20n). The control module 10 is connected to the demultiplexer 120 to control the demultiplexer Q according to the configuration information, thereby selecting which flash memory processing sub-array is to participate in the operation.
The plurality of flash memory processing sub-arrays (20)1~20n) Is connected to the analog-to-digital conversion module 70 via a multiplexer 130. The control module 10 is connected to the multiplexer 130, and controls the multiplexer 130 according to the configuration information, so as to select which flash memory processing sub-array has its output connected to the input of the analog-to-digital conversion module 70, i.e. the output of the above-mentioned flash memory processing sub-array participating in the operation is connected to the input of the analog-to-digital conversion module 70.
The input of the programmable arithmetic operation module 30 is connected to the output of the demultiplexer 110 and the output of the analog-to-digital conversion module 70 through a multiplexer 140.
A plurality of the programmable arithmetic operation units 30 of the programmable arithmetic operation module 301~30nSerially connected, each of the programmable arithmetic units comprising: a demultiplexer 30a, an arithmetic operation subunit 30b, and a multiplexer 30c, see fig. 3.
The input end of the demultiplexer 30a is connected to a programmable arithmetic unit or the analog-to-digital conversion module 70, one of the output ends is connected to the arithmetic operation subunit 30b, the output end of the arithmetic operation subunit 30b and the other output end of the demultiplexer 30a are connected to the next programmable arithmetic operation unit or the output register file 80 through a multiplexer 30c, and the control ends of the demultiplexer 30a and the multiplexer 30c are connected to the control module 20.
Specifically, the first programmable arithmetic operation unit 301The input terminal of the demultiplexer is connected to the output terminal of the analog-to-digital conversion module 70, and one of the output terminals is connected to the first programmable arithmetic operation unit 301The input terminal, the other output terminal and the output terminal of the arithmetic operation subunit in (1) are connected to a second programmable arithmetic operation unit 30 through a multiplexer2The control terminals of the demultiplexer and the multiplexer are connected to the control module 20.
Second programmable arithmetic operation unit 302The input terminal of the demultiplexer is connected to the first programmable arithmetic operation unit 301One of the outputs of (a) is connected to the second programmable arithmetic operation unit 302The input terminal, the other output terminal and the output terminal of the arithmetic operation subunit in (1) are connected to a third programmable arithmetic operation unit 30 through a multiplexer3The control terminals of the demultiplexer and the multiplexer are connected to the control module 20. And so on through the nth programmable arithmetic unit 30nThe nth programmable arithmetic operation unit 30nThe input terminal of the demultiplexer is connected to the (n-1) th programmable arithmetic operation unit 30n-1One of the output terminals of (1) is connected to the nth programmable arithmetic operation unit 30nThe input terminal of the arithmetic operation subunit in (1), the otherThe output and the output of the arithmetic unit are connected to the input of the output register file 80 via a multiplexer, the control terminals of which are connected to the control module 20.
The control module 20 is connected with the demultiplexer and the multiplexer in each programmable arithmetic operation unit, and controls the demultiplexer and the multiplexer in each programmable arithmetic operation unit according to the configuration information to select whether the arithmetic operation subunit in the programmable arithmetic operation unit participates in the operation, thereby realizing the permutation and combination configuration of a plurality of programmable arithmetic operation units, realizing different complex operations and flexibly configuring the arithmetic operation function.
In an alternative embodiment, each of the programmable arithmetic operation sub-units may include a plurality of arithmetic operators arranged side by side, such as one or more of a multiplier, an adder, a subtractor, a divider, a shifter, an activation function, a maximum value calculator, a minimum value calculator, a mean value calculator and a pooling device, and the arithmetic operators are connected in parallel, and the input ends of the arithmetic operators are respectively connected to the output ends of the corresponding demultiplexers, and the output ends of the arithmetic operators are respectively connected to the input ends of the corresponding multiplexers, as shown in fig. 4.
The process of the programmable arithmetic operation module performing the compound operation is shown in fig. 5.
The output of the output register file 80 is selectively connected to the input of the output interface module 90 or the input of the input register file 50 through a demultiplexer 150. The control module 20 is connected to the demultiplexer 150, and controls the operating state of the demultiplexer 150 according to the configuration information to select whether the output result of the output register file 80 is output to the output interface module 90 or the input register file 50, and when the output result of the output register file 80 is selected to be output to the input register file 50, it means that a new round of operation processing is performed on the output result.
In an alternative embodiment, the output terminal of the input register file 50 can be selectively connected to the input terminal of the digital-to-analog conversion module 50 or the input terminal of the programmable arithmetic operation module 30 through a demultiplexer 110, the control module 10 is connected to the demultiplexer 110, and the operating state of the demultiplexer 110 is controlled according to configuration information to select whether to connect the output terminal of the input register file 50 to the input terminal of the digital-to-analog conversion module 50 or the input terminal of the programmable arithmetic operation module 30, wherein when the output terminal of the input register file 50 is connected to the input terminal of the digital-to-analog conversion module 50, it means to perform analog vector-matrix multiplication and arithmetic operation on the output terminal of the input register file 50; when the output of the input register file 50 is connected to the input of the programmable arithmetic operation module 30, it means that a certain arithmetic operation is performed on the output of the input register file 50, thereby further increasing the flexibility of the chip architecture.
In an alternative embodiment, each of the flash memory processing sub-arrays employs a source-coupled, drain-summed topology, see fig. 6, including a plurality of programmable semiconductor devices (also referred to as flash memory cells) arranged in an array.
The source electrodes of all the programmable semiconductor devices in each column are connected to the same analog voltage input end, and the programmable semiconductor devices in multiple columns are correspondingly connected with a plurality of analog voltage input ends; the drain electrodes of all the programmable semiconductor devices in each row are connected to the same analog current output end, and the programmable semiconductor devices in the rows are correspondingly connected with a plurality of analog current output ends; the grid electrodes of all the programmable semiconductor devices in each row are connected to the same bias voltage input end, and the programmable semiconductor devices in multiple rows are correspondingly connected with a plurality of bias voltage input ends; wherein the threshold voltage of each of the programmable semiconductor devices is adjustable.
In another alternative embodiment, each of the flash memory processing sub-arrays includes a plurality of programmable semiconductor devices arranged in an array; the grid electrodes of all the programmable semiconductor devices in each row are connected to the same analog voltage input end, and the programmable semiconductor devices in multiple rows are correspondingly connected with a plurality of analog voltage input ends; the drain electrodes of all the programmable semiconductor devices in each row are connected to the same first end, and the programmable semiconductor devices in the rows are correspondingly connected with the first ends; the source electrodes of all the programmable semiconductor devices in each row are connected to the same second end, and the plurality of rows of programmable semiconductor devices are correspondingly connected with the plurality of second ends; the threshold voltage of each programmable semiconductor device can be adjusted; wherein the first terminal is a bias voltage input terminal, and the second terminal is an analog current output terminal, so as to realize a topological structure of gate coupling and source summation, as shown in fig. 7; alternatively, the first terminal is an analog current output terminal, and the second terminal is a bias voltage input terminal, so as to realize a gate coupling and drain summation topology, as shown in fig. 8.
Specifically, the flash memory processing sub-array realizes the matrix multiplication function by adjusting the threshold voltage of the programmable semiconductor devices, regarding each programmable semiconductor device as a variable equivalent analog weight, which is equivalent to analog matrix data, and applying an analog voltage to the programmable semiconductor device array.
In an alternative embodiment, the software definable computable chip may further comprise: the circuit 22 is programmed.
The programming circuit 22 is coupled to the source, gate and/or substrate of each flash memory cell in the flash memory processing array for regulating the threshold voltage of the flash memory cell.
Wherein the programming circuit comprises: a voltage generating circuit for generating a program voltage or an erase voltage, and a voltage control circuit for applying the program voltage to a selected flash memory cell.
Specifically, the programming circuit utilizes the hot electron injection effect to apply a high voltage to the source of the flash memory cell according to the threshold voltage requirement data of the flash memory cell to accelerate channel electrons to a high speed so as to increase the threshold voltage of the flash memory cell.
And the programming circuit applies high voltage to the grid electrode or the substrate of the flash memory unit according to the threshold voltage requirement data of the flash memory unit by utilizing the tunneling effect, thereby reducing the threshold voltage of the flash memory unit.
In addition, the control module 10 is connected to the programming circuit for controlling the programming circuit according to the configuration information to adjust the weights stored in the flash memory processing array 20.
In an alternative embodiment, the software definable computable chip may further comprise: a row-column decoder.
The row-column decoder is connected to the flash memory processing array 20 and the control module 10, and is configured to perform row-column decoding on the flash memory processing array 20 under the control of the control module 10.
In an alternative embodiment, the programmable semiconductor device may be implemented using floating gate transistors.
Wherein the flash memory processing array comprises: NOR type flash memory processing array and NAND type flash memory processing array, although the invention is not limited thereto.
Based on the above, the present application provides a scenario for implementing neural network operation by using the software definable storage and computation integrated chip according to the embodiment of the present invention, so as to describe a workflow of the software definable storage and computation integrated chip.
The neural network is used for realizing the operation on the data P, and the neural network comprises R layers of neurons, each layer of neurons mainly realizes the vector-matrix multiplication operation, and the neurons in each layer are connected through a certain arithmetic operation (because the application focuses on the software-definable storage-integrated chip and the software definition method thereof, the operation on the neural network is not deeply described here, only the operation architecture thereof is described, so as to exemplarily illustrate the working flow of the software-definable storage-integrated chip, and not limit the invention).
Aiming at the operation of the neural network, the working process of the software definable storage and calculation integrated chip is as follows:
the control module 10 obtains configuration information and finite state machine information, where the configuration information and the finite state machine information include R cycles of configuration information and finite state machine information, where the R cycles correspond to operations (such as convolution, pooling) of R layer neurons of the neural network, and each cycle corresponds to an operation of one layer of neurons. The configuration information for each cycle includes: configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the output register file, configuration information of the input register file, and the like. The control module 10 divides the flash memory processing array 20 into R flash memory processing sub-arrays according to the configuration information, each flash memory processing sub-array corresponds to one period, that is, each flash memory processing sub-array realizes operation of one layer of the neural network, and then the control module 10 controls the working timing sequence of each circuit module according to the finite state machine information.
The input interface module 40 receives the data P;
the control module 10 controls a Demultiplexer (DEMUX) a at the front end of the input register file 50 according to configuration information of a first period and finite state machine information, makes the input interface module 40 communicate with the input register file 50, controls a demultiplexer (MUX) Q at the front end of the flash memory processing array 20, makes the digital-to-analog conversion module 60 communicate with the flash memory processing subarray 1 corresponding to the first layer of the neural network, controls a demultiplexer B at the rear end of the flash memory processing array 20, makes the flash memory processing subarray 1 communicate with the analog-to-digital conversion module 70, controls a selector and an alternative selector of each programmable arithmetic operation unit of the programmable arithmetic operation module 30, implements arithmetic operation 1 corresponding to the first layer of the neural network, and controls a demultiplexer W at the output end of the output register file 80 and a Demultiplexer (DEMUX) a at the front end of the input register file 50 after data P is input to the input register file 50, the input end of the input register file 50 is connected to the output end of the output register file 80 to realize the configuration of the operation architecture of the first period;
the data P is temporarily stored in the input register file 50 and then is output to the digital-to-analog conversion module 60, the data P is converted into an analog signal and then is output to the flash memory processing subarray 1, the flash memory processing subarray 1 performs analog vector-matrix multiplication 1 (such as matrix multiplication) on the analog signal, the analog vector-matrix multiplication result 1 is converted into a digital signal through the analog-to-digital conversion module 70, an arithmetic operation result 1 is obtained through the programmable arithmetic operation module 30, and the digital signal is output to the input register file 50 through the output register file 80, so that the operation of the first layer of neural network is completed;
at this time, the control module 10 is automatically triggered, and the control module 10 controls a Multiplexer (MUX) Q at the front end of the flash memory processing array 20 according to the configuration information of the second period and the finite state machine information, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing subarray 2 corresponding to the second layer of the neural network, controls a multiplexer B at the rear end of the flash memory processing array 20, so that the flash memory processing subarray 2 is communicated with the analog-to-digital conversion module 70, controls the selectors of the programmable arithmetic operation units of the programmable arithmetic operation module 30, implements the arithmetic operation 2 corresponding to the second layer of the neural network, and implements the configuration of the operation architecture of the second period.
The arithmetic operation result 1 of the first layer neural network is temporarily stored in the input register file 50 and then transmitted to the digital-to-analog conversion module 60, and then transmitted to the flash memory processing subarray 2, the flash memory processing subarray 2 performs analog vector-matrix multiplication 2 (such as matrix multiplication) on the analog signal, the analog vector-matrix multiplication result is converted into a digital signal by the analog-to-digital conversion module 70, the digital signal passes through the programmable arithmetic operation module 30 to obtain an arithmetic operation result 2, the arithmetic operation result is transmitted to the input register file 50 after passing through the output register file 80, and the operation of the second layer neural network is completed, and so on until the last layer neural network, wherein when the last layer neural network is configured, the multi-path distributor W at the output end of the output register file 80 is controlled, so that the output end of the output register file 80 is connected with the input end of the output interface module 90, and further, the operation result of the whole neural network is output to the external device through the output interface module 90.
It can be understood by those skilled in the art that when a certain layer of neural network only needs arithmetic operation and does not need analog vector-matrix multiplication, it only needs to configure the circuit in the control module 10 to control the demultiplexer E output by the input register file 50, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not described in detail.
According to the technical scheme, the software-definable storage and calculation integrated chip provided by the embodiment of the invention can be used for flexibly combining the chip architecture according to the actual application requirements by matching the control module with the plurality of flash memory processing subarrays and the plurality of programmable arithmetic operation units, can realize complex operation tasks, is suitable for various application occasions such as voice processing, image processing, machine processing, Artificial Intelligence (AI) and the like, can realize multiplexing of peripheral circuits such as ADC, DAC, registers, programmable arithmetic operation units and the like, further reduces the circuit area, meets the requirements of integration and miniaturization, and effectively reduces the chip cost.
Fig. 9 is a third structural diagram of a software definable storage integrated chip according to an embodiment of the invention. As shown in fig. 9, on the basis of the software definable storage integrated chip shown in fig. 2, the input end of the input register file 50 is connected to the output end of the input interface module 40 and the output end of the output register file 80 through a Demultiplexer (DEMUX)100 to selectively receive external input data from the input interface module 40 or data to be processed from the output register file 80. The control module 10 is connected to the multiplexer (DEMUX) 100.
The digital-to-analog conversion module 60 selectively connects the plurality of flash processing sub-arrays (20) through a demultiplexer (MUX)1201~20n). The control module 10 is connected to the demultiplexer Q.
The plurality of flash memory processing sub-arrays (20)1~20n) Is connected to the analog-to-digital conversion module 70 via a multiplexer 130. The control module 10 is connected to the multiplexer B.
The input of the programmable arithmetic operation module 30 is connected to the output of the demultiplexer 110 and the output of the analog-to-digital conversion module 70 through a multiplexer 140.
A plurality of the programmable arithmetic operation units 30 of the programmable arithmetic operation module 301~30nConnected in series, each of the programmable arithmetic units includes a selector 30a and an arithmetic operation subunit 30 b.
The input end of the selector 30a is connected to a programmable arithmetic unit or the analog-to-digital conversion module 70, one of the output ends is connected to the arithmetic operation sub-unit 30b, the other output end and the output end of the arithmetic operation sub-unit 30b are connected to the next programmable arithmetic operation unit or the output register file 80 through an alternative selector, and the control end is connected to the control module 20.
The output of the output register file 80 is selectively connected to the input of the output interface module 90 or the input of the input register file 50 through a demultiplexer 150. The control module 20 is connected to the demultiplexer W, and controls the operating state of the demultiplexer W according to the configuration information to select whether the output result of the output register file 80 is output to the output interface module 90 or the input register file 50, and when the output result of the output register file 80 is selected to be output to the input register file 50, it means that a new round of operation processing is performed on the output result.
The output end of the input register file 50 is selectively connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30 through a demultiplexer 110, the control module 10 is connected to the demultiplexer E, and controls the working state of the demultiplexer E according to the configuration information to select whether the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50 or the input end of the programmable arithmetic operation module 30, wherein when the output end of the input register file 50 is connected to the input end of the digital-to-analog conversion module 50, it means performing analog vector-matrix multiplication and arithmetic operation on the output of the input register file 50; when the output of the input register file 50 is connected to the input of the programmable arithmetic operation module 30, it means that a certain arithmetic operation is performed on the output of the input register file 50, thereby further increasing the flexibility of the chip architecture.
It can be understood by those skilled in the art that when a certain layer of neural network only needs arithmetic operation and does not need analog vector-matrix multiplication, it only needs to configure the circuit in the control module 10 to control the demultiplexer E output by the input register file 50, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not described in detail.
In addition, as can be understood by those skilled in the art, when the configuration information is generated according to the actual application requirement, the configuration information may be implemented according to a preset instruction-architecture correspondence table.
It should be noted that, when the configuration information is generated according to the actual application requirement, the number of flash memory processing sub-arrays to be used and the scale of each flash memory processing sub-array may be known, and at this time, a dividing instruction of the flash memory processing array may be obtained according to the actual application requirement, and then the flash memory processing array may be divided into a plurality of flash memory processing sub-arrays according to the dividing instruction, corresponding to a plurality of matrix multiplication scales.
It can be understood by those skilled in the art that, when the embodiment of the present invention is applied to a software-defined storage integrated chip, when performing a plurality of cycle operations, the flash memory processing sub-arrays corresponding to the cycle may be programmed in each cycle, or the flash memory processing sub-arrays may be uniformly programmed according to a programming instruction before performing each cycle operation.
Fig. 10 is a flowchart of a software definition method according to an embodiment of the present invention, where the software definition method is applied to the above-mentioned software definable storage integrated chip. As shown in fig. 10, the software definition method includes the following:
step S1001: configuration information and finite state machine information are obtained.
The configuration information and the finite-state machine information can be obtained through a compiling tool according to the actual application requirements.
Step S1002: and configuring the plurality of flash memory processing sub-arrays, the plurality of programmable arithmetic operation units, the output register file and other circuit modules according to the configuration information to realize the dynamic configuration of the chip architecture.
Step S1003: and controlling the working time sequence of the flash memory processing array, the programmable arithmetic operation module, the output register file and other circuit modules according to the information of the finite state machine.
Specifically, according to the configuration information, the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units are configured in a combined manner, the flash memory processing sub-arrays and the programmable arithmetic operation units which are put into operation are selected, and the combined pairing manner of the flash memory processing sub-arrays and the programmable arithmetic operation units is controlled to realize specific operation.
Because each programmable arithmetic operation unit in the plurality of programmable arithmetic operation units can realize one or more arithmetic operations, the plurality of programmable arithmetic operation units can be arranged and combined to form a plurality of composite operations, and can realize a plurality of combination configurations by matching with a plurality of flash memory processing sub-arrays, thereby realizing complex operation functions.
Wherein the arithmetic operation comprises: one or more of multiplication, addition, subtraction, division, shift, activation function, maximum value, minimum value, average value, pooling, etc.
The analog vector-matrix multiplication operation realized by the flash memory processing subarray mainly comprises the following steps: simulating a vector-matrix multiplication operation.
The software definition method provided by the embodiment of the invention can be used for carrying out combined configuration on a plurality of flash memory processing sub-arrays and a plurality of programmable arithmetic operation units according to the actual application requirements, realizing dynamic configuration of a chip architecture, flexibly adjusting the chip architecture according to the actual tasks, and realizing multiplexing of peripheral circuits such as an ADC (analog to digital converter), a DAC (digital to analog converter), a register, a programmable arithmetic operation unit and the like, thereby reducing the circuit area, adapting to the requirements of integration and miniaturization, and effectively reducing the chip cost.
In an alternative embodiment, step S1002 includes:
step 1: dividing the flash memory processing array into a plurality of flash memory processing sub-arrays according to the configuration information of the flash memory processing sub-arrays, and controlling the working time sequence of the plurality of flash memory processing sub-arrays according to the finite state machine information;
step 2: controlling the working state of the selector corresponding to each programmable arithmetic operation unit according to the configuration information of the programmable arithmetic operation unit, enabling the plurality of programmable arithmetic operation units to realize random combination operation, and controlling the working time sequence of the plurality of programmable arithmetic operation units according to the finite state machine information;
and step 3: the output register file 80 is controlled to output the data therein or output the data as the data to be processed to the input register file 50 according to the configuration information of the output register file 80.
Based on the above, the present application provides a scenario for implementing a neural network operation by performing software definition on a software definable storage integrated chip by using a software definition method according to an embodiment of the present invention, so as to describe a workflow of the software definition method.
The neural network is used for implementing operations on the data P, and the neural network includes R layers of neurons, each layer of neurons mainly implements matrix multiplication operations, and the neurons in each layer are connected through certain arithmetic operations (since this example focuses on explaining the software definition method, the neural network operations are not described in depth here, and only the operation architecture thereof is described, so as to exemplify the flow of the software definition method, and not to limit the present invention).
Aiming at the neural network operation, the working flow of the software definition method is as follows:
(1) configuration information and finite state machine information are obtained. Wherein the configuration information contains R periods of configuration information, the R periods correspond to operations (such as convolution, pooling, etc.) of R layer neurons of the neural network, and each period corresponds to an operation of one layer of neurons. The configuration information for each cycle includes: configuration information of the flash memory processing subarray, configuration information of the programmable arithmetic operation unit, configuration information of the output register file, configuration information of the input register file, and the like. The control module 10 divides the flash memory processing array 20 into R flash memory processing sub-arrays according to the configuration information, where each flash memory processing sub-array corresponds to one cycle, that is, each flash memory processing sub-array realizes operation of one layer of the neural network.
(2) Controlling a Demultiplexer (DEMUX) A at the front end of the input register file 50 according to configuration information of a first period and finite state machine information, communicating an input interface module 40 with the input register file 50, controlling a demultiplexer (MUX) Q at the front end of the flash memory processing array 20, communicating the digital-to-analog conversion module 60 with the flash memory processing sub-array 1 corresponding to a first layer of a neural network, controlling a demultiplexer B at the rear end of the flash memory processing array 20, communicating the flash memory processing sub-array 1 with an analog-to-digital conversion module 70, controlling selectors of respective programmable arithmetic operation units of a programmable arithmetic operation module 30, implementing an arithmetic operation 1 corresponding to the first layer of the neural network, and controlling a demultiplexer W at an output end of an output register file 80 and a Demultiplexer (DEMUX) A at the front end of the input register file 50 after data P is input to the input register file 50, the input end of the input register file 50 is connected to the output end of the output register file 80 to realize the configuration of the operation architecture of the first period;
(3) controlling a demultiplexer (MUX) Q at the front end of the flash memory processing array 20 according to the configuration information of the second cycle and the finite state machine information, so that the digital-to-analog conversion module 60 is communicated with the flash memory processing subarray 2 corresponding to the second layer of the neural network, controlling a multiplexer B at the rear end of the flash memory processing array 20, so that the flash memory processing subarray 2 is communicated with the analog-to-digital conversion module 70, controlling selectors of each programmable arithmetic operation unit of the programmable arithmetic operation module 30, implementing the arithmetic operation 2 corresponding to the second layer of the neural network, and implementing the configuration of the operation architecture of the second cycle. … …, repeating the above steps until the last layer of neural network configuration step, wherein, when the last layer of neural network configuration is performed, the demultiplexer W at the output end of the output register file 80 is controlled to connect the output end of the output register file 80 with the input end of the output interface module 90, so that the operation result of the whole neural network is output to the external device through the output interface module 90.
It can be understood by those skilled in the art that when a certain layer of neural network only needs arithmetic operation and does not need analog vector-matrix multiplication, only the demultiplexer E output by the input register file 50 needs to be controlled during circuit configuration, so that the output end of the input register file 50 is communicated with the input end of the arithmetic operation module 30, and other configuration processes are not described again.
The embodiment of the present invention further provides an electronic device, which can execute a neural network algorithm, where the neural network includes multiple layers of neurons, and each layer of neurons performs corresponding operations according to an output result of a layer of neurons on the neuron, and the electronic device includes the above software-definable storage-computation-integrated chip.
The embodiment of the present invention further provides another electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the steps of the software definition method are implemented.
The electronic device may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smartphone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the software definition method described above.
The principle and the implementation mode of the invention are explained by applying specific embodiments in the invention, and the description of the embodiments is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A software definable, computationally-integrated chip, comprising: a flash memory processing array, a programmable arithmetic operation module and a control module connected with the flash memory processing array and the programmable arithmetic operation module,
the flash memory processing array comprises a plurality of flash memory processing sub-arrays for respectively executing different analog vector-matrix multiplication operations;
the programmable arithmetic operation module comprises a plurality of programmable arithmetic operation units for respectively realizing different arithmetic operations;
and the control module performs combined configuration on the plurality of flash memory processing sub-arrays and the plurality of programmable arithmetic operation units according to the configuration information to realize the dynamic configuration of the circuit structure in the chip.
2. The software definable computationally-integrated chip of claim 1, further comprising:
the input interface module is used for receiving external input data;
the input register file is connected with the input interface module and used for storing the external input data or the data to be processed;
the input end of the digital-to-analog conversion module is connected with the input register file, the output end of the digital-to-analog conversion module is connected with the flash memory processing array and is used for converting the external input data or the data to be processed into an analog signal and outputting the analog signal to the flash memory processing array, and the flash memory processing array performs analog vector-matrix multiplication operation on the analog signal and outputs an operation result;
the input end of the analog-to-digital conversion module is connected with the flash memory processing array, the output end of the analog-to-digital conversion module is connected with the programmable arithmetic operation module and is used for converting the analog vector-matrix multiplication operation result into a digital signal and outputting the digital signal to the programmable arithmetic operation module, and the programmable arithmetic operation module performs arithmetic operation on the digital signal and outputs an arithmetic operation result;
the output register file is connected with the programmable arithmetic operation module and the input register file and used for temporarily storing the arithmetic operation result and outputting the arithmetic operation result or outputting the arithmetic operation result to the input register file as the data to be processed;
the output interface module is connected with the output register file, receives the output data of the output register file and outputs the output data outwards;
the control module is connected with the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module, and is used for dynamically configuring the circuit module according to actual application requirements.
3. The software definable memory bank chip of claim 2, wherein the output of the input register file is further connected to the programmable arithmetic operation module.
4. The software definable memory bank chip of claim 2, wherein a plurality of the programmable arithmetic units are connected in series, each of the programmable arithmetic units comprising: a demultiplexer, an arithmetic operation subunit and a multiplexer;
the input end of the multi-path distributor is connected with a programmable arithmetic operation unit or the analog-to-digital conversion module, one output end of the multi-path distributor is connected with the arithmetic operation subunit, the other output end of the multi-path distributor and the output end of the arithmetic operation subunit are connected with a next programmable arithmetic operation unit or an output register file through the multi-path selector, and the control end of the multi-path distributor is connected with the control module.
5. The software definable computationally-integrated chip of claim 4, further comprising: the programming circuit is connected with the source electrode, the grid electrode and/or the substrate of each flash memory unit in the flash memory processing sub array and is used for regulating and controlling the threshold voltage of the flash memory units;
wherein the programming circuit comprises: a voltage generating circuit for generating a program voltage or an erase voltage, and a voltage control circuit for applying the program voltage to a selected flash memory cell.
6. The software definable storage-computation-integrated chip of claim 1 or 2, further comprising:
and the row-column decoder is connected with the flash memory processing array and the control module and is used for performing row-column decoding on the flash memory processing array under the control of the control module.
7. The software definable storage-computing integrated chip of claim 4, wherein the control module dynamically configures the circuit modules connected thereto according to configuration information, the configuration information including: the method comprises the following steps of dynamically configuring each circuit module connected with a flash memory processing subarray according to configuration information, wherein the configuration information of the flash memory processing subarray, the configuration information of a programmable arithmetic operation unit, the configuration information of a digital-to-analog conversion module, the configuration information of an analog-to-digital conversion module, the configuration information of an input interface module, the configuration information of an output interface module, the configuration information of an input register file and the configuration information of an output register file comprises the following steps:
dividing the flash memory processing array into a plurality of flash memory processing sub-arrays according to the configuration information of the flash memory processing sub-arrays, and controlling the working time sequence of the plurality of flash memory processing sub-arrays;
controlling the working states of the demultiplexer and the multiplexer corresponding to each programmable arithmetic operation unit according to the configuration information of the programmable arithmetic operation units, so that the plurality of programmable arithmetic operation units realize random combination operation;
controlling the opening and closing states of the digital-to-analog conversion circuits participating in the actual task according to the configuration information of the digital-to-analog conversion module;
controlling the on-off state of an analog-to-digital conversion circuit participating in an actual task according to the configuration information of the analog-to-digital conversion module;
controlling the on-off state of an input interface circuit participating in an actual task according to the configuration information of the input interface module;
controlling the on-off state of an output interface circuit participating in an actual task according to the configuration information of the output interface module;
controlling the data to be stored in the input register to be from the input data of the input interface module or the data to be processed in the output register file according to the configuration information of the input register file;
and controlling the output register file to output the data therein or output the data to the input register file as the data to be processed according to the configuration information of the output register file.
8. A software-defined method of a software-definable bank chip, which is applied to the software-definable bank chip of any one of claims 1 to 7, the software-defined method comprising:
acquiring configuration information and finite state machine information;
configuring an input interface module, an input register file, a digital-to-analog conversion module, a flash memory processing array, an analog-to-digital conversion module, an output register file, a programmable arithmetic operation module and an output interface module according to the configuration information to realize the dynamic configuration of a circuit structure in a chip;
and controlling the working time sequence of the input interface module, the input register file, the digital-to-analog conversion module, the flash memory processing array, the analog-to-digital conversion module, the output register file, the programmable arithmetic operation module and the output interface module according to the information of the finite state machine.
9. The software-defined method of claim 8, comprising:
dividing the flash memory processing array into a plurality of flash memory processing sub-arrays according to the configuration information of the flash memory processing sub-arrays, and controlling the working time sequence of the plurality of flash memory processing sub-arrays according to the information of the finite state machine;
and controlling the working state of the selector corresponding to each programmable arithmetic operation unit according to the configuration information of the programmable arithmetic operation units, so that the plurality of programmable arithmetic operation units realize random combination operation, and controlling the working time sequence of the plurality of programmable arithmetic operation units according to the finite state machine information.
10. An electronic device comprising a software definable, memory integrated chip according to any of claims 1 to 7.
CN201910143132.2A 2019-02-26 2019-02-26 Software-definable storage and calculation integrated chip and software definition method thereof Pending CN111611195A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910143132.2A CN111611195A (en) 2019-02-26 2019-02-26 Software-definable storage and calculation integrated chip and software definition method thereof
PCT/CN2019/081339 WO2020172951A1 (en) 2019-02-26 2019-04-03 Software-definable computing-in-memory chip and software definition method therefor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910143132.2A CN111611195A (en) 2019-02-26 2019-02-26 Software-definable storage and calculation integrated chip and software definition method thereof

Publications (1)

Publication Number Publication Date
CN111611195A true CN111611195A (en) 2020-09-01

Family

ID=72202924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910143132.2A Pending CN111611195A (en) 2019-02-26 2019-02-26 Software-definable storage and calculation integrated chip and software definition method thereof

Country Status (2)

Country Link
CN (1) CN111611195A (en)
WO (1) WO2020172951A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395247A (en) * 2020-11-18 2021-02-23 北京灵汐科技有限公司 Data processing method and storage and calculation integrated chip
CN112989273A (en) * 2021-02-06 2021-06-18 江南大学 Method for carrying out memory operation by using complementary code
CN113918233A (en) * 2021-09-13 2022-01-11 山东产研鲲云人工智能研究院有限公司 AI chip control method, electronic equipment and AI chip
CN114242137A (en) * 2021-11-09 2022-03-25 厦门半导体工业技术研发有限公司 Configuration circuit and chip of array and configuration method of array
WO2022217575A1 (en) * 2021-04-16 2022-10-20 尼奥耐克索斯有限私人贸易公司 Low-loss computing circuit and operation method therefor
CN117289896A (en) * 2023-11-20 2023-12-26 之江实验室 Deposit and calculate integrative basic operation device

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112306931B (en) * 2020-11-20 2023-07-04 广州安凯微电子股份有限公司 Method, system and storage medium for realizing usb host controller by software

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102306141B (en) * 2011-07-18 2015-04-08 清华大学 Method for describing configuration information of dynamic reconfigurable array
CN107430586B (en) * 2015-07-31 2018-08-21 吴国盛 Adaptive chip and configuration method
US11064019B2 (en) * 2016-09-14 2021-07-13 Advanced Micro Devices, Inc. Dynamic configuration of inter-chip and on-chip networks in cloud computing system
CN108777155A (en) * 2018-08-02 2018-11-09 北京知存科技有限公司 Flash chip
CN109379087B (en) * 2018-10-24 2022-03-29 江苏华存电子科技有限公司 Method for LDPC to modulate kernel coding and decoding rate according to error rate of flash memory component

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395247A (en) * 2020-11-18 2021-02-23 北京灵汐科技有限公司 Data processing method and storage and calculation integrated chip
WO2022105805A1 (en) * 2020-11-18 2022-05-27 北京灵汐科技有限公司 Data processing method and in-memory computing chip
CN112395247B (en) * 2020-11-18 2024-05-03 北京灵汐科技有限公司 Data processing method and memory and calculation integrated chip
CN112989273A (en) * 2021-02-06 2021-06-18 江南大学 Method for carrying out memory operation by using complementary code
CN112989273B (en) * 2021-02-06 2023-10-27 江南大学 Method for carrying out memory operation by utilizing complementary code coding
WO2022217575A1 (en) * 2021-04-16 2022-10-20 尼奥耐克索斯有限私人贸易公司 Low-loss computing circuit and operation method therefor
CN113918233A (en) * 2021-09-13 2022-01-11 山东产研鲲云人工智能研究院有限公司 AI chip control method, electronic equipment and AI chip
CN114242137A (en) * 2021-11-09 2022-03-25 厦门半导体工业技术研发有限公司 Configuration circuit and chip of array and configuration method of array
CN117289896A (en) * 2023-11-20 2023-12-26 之江实验室 Deposit and calculate integrative basic operation device
CN117289896B (en) * 2023-11-20 2024-02-20 之江实验室 Deposit and calculate integrative basic operation device

Also Published As

Publication number Publication date
WO2020172951A1 (en) 2020-09-03

Similar Documents

Publication Publication Date Title
CN111611195A (en) Software-definable storage and calculation integrated chip and software definition method thereof
CN111611197B (en) Operation control method and device of software-definable storage and calculation integrated chip
US11335400B2 (en) Computing-in-memory chip and memory cell array structure
CN110427171B (en) In-memory computing device and method for expandable fixed-point matrix multiply-add operation
CN209766043U (en) Storage and calculation integrated chip and storage unit array structure
CN109409510B (en) Neuron circuit, chip, system and method thereof, and storage medium
US5509106A (en) Triangular scalable neural array processor
US11487845B2 (en) Convolutional operation device with dimensional conversion
JP2023513129A (en) Scalable array architecture for in-memory computation
EP0390907A1 (en) Parallel data processor.
CN111128279A (en) Memory computing chip based on NAND Flash and control method thereof
US20190050719A1 (en) Accelerating Neural Networks in Hardware Using Interconnected Crossbars
US10693466B2 (en) Self-adaptive chip and configuration method
CN112181895B (en) Reconfigurable architecture, accelerator, circuit deployment and data flow computing method
CN211016545U (en) Memory computing chip based on NAND Flash, memory device and terminal
CN209388304U (en) Can software definition deposit the integrated chip of calculation and electronic equipment
CN114707647A (en) Precision lossless storage and calculation integrated device and method suitable for multi-precision neural network
US11907681B2 (en) Semiconductor device and method of controlling the semiconductor device
Bavandpour et al. Acortex: An energy-efficient multipurpose mixed-signal inference accelerator
US11309026B2 (en) Convolution operation method based on NOR flash array
CN111752529B (en) Programmable logic unit structure supporting efficient multiply-accumulate operation
US11934482B2 (en) Computational memory
CN111949405A (en) Resource scheduling method, hardware accelerator and electronic equipment
CN112712457A (en) Data processing method and artificial intelligence processor
US20230253032A1 (en) In-memory computation device and in-memory computation method to perform multiplication operation in memory cell array according to bit orders

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Country or region after: China

Address after: Room 213-175, 2nd Floor, Building 1, No. 180 Kecheng Street, Qiaosi Street, Linping District, Hangzhou City, Zhejiang Province, 311100

Applicant after: Hangzhou Zhicun Computing Technology Co.,Ltd.

Address before: 1416, shining building, No. 35, Xueyuan Road, Haidian District, Beijing 100083

Applicant before: BEIJING WITINMEM TECHNOLOGY Co.,Ltd.

Country or region before: China

CB02 Change of applicant information