CN214846708U

CN214846708U - Chip architecture for AI calculation based on NVM

Info

Publication number: CN214846708U
Application number: CN202121061582.6U
Authority: CN
Inventors: 丛维; 林小峰; 金生
Original assignee: Nanjing Youcun Technology Co ltd
Current assignee: Nanjing Youcun Technology Co ltd
Priority date: 2021-05-18
Filing date: 2021-05-18
Publication date: 2021-11-23
Anticipated expiration: 2031-05-18

Abstract

The utility model provides a chip framework based on NVM carries out AI calculation, including NVM array, external interface module, NPU and MCU through bus communication connection; adopt NPU and NVM to combine together to carry out AI neural network and calculate, neural network's weight parameter digital storage is in NVM array, MCU receives outside AI operation instruction control NPU and NVM array and realizes neural network and calculate, MCU control NVM array loads the weight parameter of its internally stored neural network, carry out AI through program and the neural network model of operation and calculate, compare with the current all kinds of storage schemes that adopt NVM to carry out analog operation, digital storage and operation mode operation structure are nimble, good reliability, the precision is high, read the degree of accuracy height, so the utility model discloses when breaking through and adopting off-chip NVM storage speed bottleneck and reducing outside input power consumption, possess high feasibility, flexibility and reliability again.

Description

Chip architecture for AI calculation based on NVM

Technical Field

The utility model relates to an AI (Artificial Intelligence) technical field, in particular to chip architecture based on NVM (non-volatile memory) carries out AI calculation.

Background

The algorithm of AI derives from the enlightenment of the structure of the human brain. The human brain is a complex network of a large number of neurons connected in a complex manner, each neuron receiving information by connecting to a large number of other neurons via a large number of dendrites, each connection point being called a Synapse (Synapse). After the external stimulus has accumulated to a certain extent, a stimulus signal is generated and transmitted out through the axon. Axons have a large number of terminals, which are connected by synapses to dendrites of a large number of other neurons. It is such a network consisting of simple functional neurons that implement all the intelligent activities of human beings. Human memory and intelligence are generally believed to be stored in the different coupling strengths at each synapse.

Neural network algorithms, emerging from the 60 s of the last century, mimic the function of neurons with a function. The function accepts a plurality of inputs from other neurons, each input having a different weight, and the output is a multiplication of each input by a corresponding neuron connection weight and a summation. The function output is input to other neurons in the next layer to form a neural network.

The common AI chip optimizes the matrix parallel computation aiming at the network computation in algorithm, but because the AI computation needs extremely high storage and reading bandwidth, the architecture separating the processor, the memory and the storage meets the bottleneck of reading speed, and is also limited by external storage and reading power consumption. The industry has begun to extensively study the architecture of in-memory-computing (in-memory-computing).

At present, the scheme of using NVM to store internal calculation is to store weights in a neural network by using NVM in the form of analog signals, and to implement calculation of the neural network by a method of adding and multiplying analog signals, for a specific example, refer to chinese patent application with publication number CN 109086249A. Many scientific achievements have been made with such solutions, but practical application has been difficult. Because practical neural networks basically have a plurality of layers and very complicated connection structures, analog signals are very inconvenient when transmission between layers and various signal processing are carried out in the calculation process of the neural networks, and the analog calculation array structure is rigid, which is not beneficial to supporting a flexible neural network structure. In addition, various noises and errors in the storage, reading, writing and calculation of the analog signals can influence the reliability of the stored neural network model and the accuracy of the calculation to be limited.

SUMMERY OF THE UTILITY MODEL

The utility model aims at providing a chip framework that AI calculated based on NVM, when the scheme of calculating utilizes NVM to adopt the weight in the form storage neural network of analog signal in overcoming current storage, analog signal is realizing the transmission between the layer in the neural network calculation process, it is very inconvenient when carrying out various signal processing, analog computation array structure rigidity, be unfavorable for supporting nimble neural network structure, and analog signal's storage, various noises and error can influence the reliability of the neural network model of storage and the limited defect of accuracy of calculation in reading and writing and the calculation, a chip framework based on NVM that AI calculated that has better flexibility again when having broken through the speed bottleneck of outside storage and reducing the consumption of external input, high enforceability and reliability is carried out.

In order to achieve the above object, the present invention provides a chip architecture for AI calculation based on NVM, which includes an NVM array, an external interface module, an NPU (embedded neural network processor) and an MCU (Microcontroller Unit) connected by bus communication;

the NVM array is used for storing weight parameters of a digitalized neural network, a program operated by the MCU and a neural network model in a chip;

the NPU is used for digital domain accelerated calculation of the neural network;

the external interface module is used for receiving external AI operation instructions, inputting data and outputting AI calculation results outwards;

the MCU is used for executing the program based on the AI operation instruction so as to control the NVM array and the NPU to carry out AI calculation on the input data to obtain the result of the AI calculation.

The NPU and the NVM in the chip architecture are combined to perform AI neural network calculation, wherein the weight parameters of the neural network are digitally stored in the NVM array in the chip, the neural network calculation is also digital domain calculation, the NPU and the NVM array are controlled by the MCU based on an external AI calculation instruction to realize, the MCU controls the NVM array to load the weight parameters of the neural network stored in the MCU, a program operated by the MCU and a neural network model to perform AI calculation, compared with various existing storage schemes adopting the NVM to perform analog calculation, the digital storage and calculation mode calculation structure is flexible, and compared with the information stored in the NVM, the reliability, the precision and the reading accuracy of multi-level storage of analog signals are good, so that the scheme has high implementability, the implementation possibility, the external NVM storage speed bottleneck and the external input power consumption are reduced while the scheme breaks through, Flexibility and reliability.

Further, the chip architecture further includes a high-speed data read channel through which the NPU reads the weight parameters from the NVM array.

In addition to the on-chip bus, the scheme also sets a high-speed data reading channel between the NPU and the NVM array, and the high-speed data reading channel is used for supporting the bandwidth requirement of the NPU on the high-speed reading of the weight parameters, namely the weight data, of the neural network when the NPU performs digital domain operation.

Further, the NVM array is provided with a read channel, the read channel is N channels, N is a positive integer, the read channel reads N bits of data in one read cycle, and the NPU is configured to read the weight parameter from the NVM array through the read channel via the high-speed data read channel.

According to the scheme, a reading channel is set, wherein the number of the reading channel is N, preferably, N is 128-512, and N bits of data can be read in one reading period (usually 30-40 nanoseconds). The NPU reads the weight parameters of the neural network from the NVM array through the read channel through the high-speed data read channel, the bandwidth is far higher than the supportable read speed of the off-chip NVM, and the parameter read speed requirement required by the conventional neural network reasoning calculation can be supported.

Furthermore, the bit width of the high-speed data reading channel is m bits, and m is a positive integer; the chip architecture further comprises a data conversion unit, the data conversion unit comprises a cache module and a sequential reading module, the cache module is used for sequentially caching the weight parameters output by the reading channel according to cycles, the capacity of the cache module is N x k bits, and k represents the number of cycles; and the sequential reading module is used for converting the cache data in the cache module into m-bit wide and outputting the m-bit wide to the NPU through the high-speed data reading channel, wherein N x k is an integral multiple of m.

The arrangement further comprises a data conversion unit for converting data into a combination of data of the same bit width as the high speed data read channel, typically a combination of words of small width (e.g. 32 bits), for the case where the number of read channels does not correspond to the bit width and/or the frequency of the high speed data read channel. The NPU reads data from the data conversion unit via a high speed data read channel at its own clock frequency (up to over 1 GHz).

The data conversion unit provided by the scheme comprises a cache containing N x k bits and a sequential reader for outputting m bits at a time, wherein N x k is an integral multiple of m; the reading channel is connected with the NVM array, N bits can be output in each period, and k periods of data can be stored in the cache; the high speed data read channel width is m bits. The high-speed data read channel may include a read/write Command (CMD) and an Acknowledge (ACK) signal, which are connected to the NVM array read control circuitry. After the read operation is completed, the ACK signal informs the high-speed data reading channel and can also inform the on-chip bus at the same time, and the high-speed data reading channel asynchronously inputs the data in the cache into the NPU for multiple times through the sequential reading module.

Further, the chip architecture further includes a Static Random-Access Memory (SRAM), and the SRAM is communicatively connected to the NVM array, the external interface module, the NPU, and the MCU through the bus; the SRAM is used for caching data in the program execution process of the MCU, data in the NPU operation process and input and output data of the neural network model operation.

The chip architecture provided by the scheme comprises an embedded SRAM which is used as a cache required by operation and calculation of a chip internal system and is used for storing input and output data, intermediate data generated by calculation and the like. The method specifically comprises the steps of caching in the process of executing the program by the MCU, storing the executable program, system configuration parameters, calculation network structure configuration parameters and the like when the MCU runs; and the NPU operation caches and stores input and output data when the input and output data are operated by the neural network model.

Further, a plurality of neural network models are stored in the NVM array, and the AI operation instruction includes an algorithm selection instruction, and the algorithm selection instruction is used for selecting one of the plurality of neural network models as an algorithm for AI calculation.

The neural network models in the scheme are digitally stored in the NVM array, a plurality of neural network models can be stored according to the number of application scenes, and for the situation that various application scenes correspond to various neural network models, the MCU can flexibly select any one of the prestored neural network models to perform AI calculation according to an externally input algorithm selection instruction, so that the problem that the existing scheme integrating storage and calculation is rigid in array structure by adopting analog calculation and is not beneficial to supporting a flexible neural network structure is solved.

Further, the NVM array may employ One of a flash Memory process, an MRAM (magnetic Random Access Memory) process, an RRAM (resistive Random Access Memory) process, an MTP (Multiple Time Programming) process, an OTP (One Time Programming) process, and/or the Interface standard of the external Interface module is at least One of SPI (Serial Peripheral Interface), qpi (quad SPI) and a parallel Interface.

Further, the MCU is further configured to receive, through the external interface module, a data access instruction for operating the NVM array from the outside, and the MCU is further configured to complete logic control of basic operations of the NVM array based on the data access instruction.

Further, the NVM array employs one of a SONOS (flash memory process) flash memory process, a Floating Gate (flash memory process) flash memory process, and a Split Gate (flash memory process) flash memory process, and the interface standard of the external interface module is SPI and/or QPI;

the data access instruction is a standard flash memory operation instruction; the AI operation instruction and the data access instruction adopt the same instruction format and rule; the AI operation instruction comprises an operation code, and further comprises an address part and/or a data part, wherein the operation code of the AI operation instruction is different from the operation code of the standard flash memory operation instruction.

The chip architecture provided by the scheme is improved on the basis of the traditional flash memory chip architecture, specifically, the MCU and the NPU are embedded in the flash memory chip and are in communication connection through an on-chip Bus, and the on-chip Bus can be an Advanced High Performance Bus (AHB) or other communication Bus meeting the requirements, and is not limited herein. In the scheme, the NPU and the NVM are combined, namely calculation and storage are both in a chip, the weight parameters of the neural network are digitally stored in the NVM array, the neural network calculation is also digital domain calculation, and the NPU and the NVM array are controlled by the MCU based on an external AI operation instruction, so that the bottleneck of the off-chip NVM storage speed is broken through, the external input power consumption is reduced, and high implementability, flexibility and reliability are realized.

The scheme realizes the digital operation of the NVM array based on the MCU, specifically can include the basic operation of flash memories such as read-write erasing, and the like, and the external data access instruction and the external interface can adopt a standard flash memory chip format, so that the chip is easy to flexibly and simply apply. The MCU embedded in the scheme is used as a logic control unit of the NVM, a logic state machine in a standard flash memory is replaced, the chip structure is simplified, and the chip area is saved.

The NVM array in this scheme may be further configured to store externally input data not limited to data related to AI calculation, that is, may also be configured to store externally input other data related to AI calculation and externally input data unrelated to AI calculation, where the unrelated data specifically includes information such as system parameters, configurations and/or codes of an external device or system, in addition to the neural network model, the weight parameter and the program run by the system in the chip; the basic operations include operations such as reading, writing and erasing the neural network model, the weight parameters and a program run by the internal system, and operations such as directly reading, writing and erasing stored externally input data in the NVM array.

The instruction used for NVM direct operation and the instruction used for AI calculation processing adopt the same instruction format and rule. Taking SPI and QPI interfaces as examples, on the basis of traditional SPI and QPI flash memory operation instructions op _ code, the op _ code which is not used by flash memory operation is selected to be used for expressing an AI instruction, more information is transmitted in an address part, and AI data transmission is implemented in a data exchange period. The AI calculation can be realized only by expanding the instruction decoder to realize the multiplexing of the interface and adding a plurality of state registers and configuration registers.

Further, the chip architecture further includes a Direct Memory Access (DMA) channel, and the DMA channel is used for an external device to directly read and write the SRAM.

The utility model discloses an actively advance the effect and lie in:

the utility model provides a chip framework based on NVM carries out AI calculation, adopt NPU and NVM to combine together to carry out AI neural network calculation, wherein the weight parameter digitization of neural network is stored in the NVM array of chip inside, neural network calculation is also digital domain calculation, concretely realize through MCU based on the realization of outside AI operation instruction control NPU and NVM array, MCU control NVM array loads the weight parameter of its internally stored neural network, MCU operational program and neural network model carry out AI calculation, compare with the present various storage schemes that adopt NVM to carry out analog operation, digital storage and operation mode operation structure are nimble, NVM stored information compares in the analog signal's multi-energy level storage good reliability, the precision is high, read the degree of accuracy high, so the utility model discloses when breaking through the off-chip NVM storage speed bottleneck and reducing the external input power consumption, but also has high implementability, flexibility and reliability.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

FIG. 1 is a diagram of neurons in a prior art AI algorithm;

FIG. 2 is a diagram of a prior art three-layer neural network;

FIG. 3 is a prior art convolutional neural network diagram;

FIG. 4 is a schematic diagram of a prior art AI calculation using additional circuitry within a standard NVM array;

FIG. 5 is a schematic diagram of a chip architecture for performing AI calculations based on NVM according to the present application;

FIG. 6 is a schematic diagram of a data conversion unit of the chip architecture of the present application;

FIG. 7 is a flowchart illustrating the operation of the chip architecture of the present application to invoke NVM read and write operations;

fig. 8 is a flowchart for executing an AI operation instruction based on the chip architecture of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. Based on the embodiments in the present invention, all other embodiments obtained by a person skilled in the art without creative efforts belong to the protection scope of the present invention.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.

The basic architecture and relationship of neural networks and artificial intelligence, non-volatile storage and in-memory computation are explained first.

As previously mentioned: artificial Intelligence (AI) algorithms are generated by mimicking human brain structures, and connect to dendrites of a large number of other neurons through synapses between neurons to form a neuron network with simple functions, thereby realizing all human intelligence activities. Human memory and intelligence are generally believed to be stored in the different coupling strengths at each synapse.

Neural network algorithms, emerging from the 60 s of the 20 th century, mimic the function of neurons with a function. The function accepts a plurality of inputs, each with a different weight, and the output is the multiplication of each input by a weight and the summation, as shown in the exemplary AI algorithm neuron map of FIG. 1. The process of learning the training is to adjust the weights. The function is output to many other neurons, forming a network. The algorithm has achieved abundant results and is widely applied. The utility neural networks all have a layered structure, there is no communication inside the neurons in the same layer, and the input of each neuron is connected with the outputs of a plurality of or all the neurons in the previous layer, such as the three-layer neural network diagram shown in fig. 2, which includes an input layer, a hidden layer, and an output layer, and the input layer and the hidden layer include 784 and 15 neurons (neurons), respectively. Different connection modes exist between different layers of the neural network, and the neural network is a fully-connected network.

More commonly, a graph of a convolutional neural network, as shown in fig. 3, has a two-dimensional structure (image) at both the input (input) and output (output), with connections only at nearby points.

However, a practical neural network often has a plurality of layers, and the network structure selectively includes one or more of convolution layer, image size reduction layer, and full connection layer.

Non-volatile storage:

nonvolatile memory (NVM) is a semiconductor storage medium that can hold contents after power is turned off. Common NVMs include flash memory, EEPROM (Electrically Erasable Programmable read only memory), MRAM, RRAM, FeRAM (Ferroelectric random access memory), MTP, OTP, and the like. The NVM which is most widely used at present is Flash Memory (NOR Flash Memory), in which a NOR Flash Memory structure has higher reliability and faster reading speed than a NAND Flash Memory structure, and is commonly used for storing system codes, parameters, algorithms, and the like. In particular applications, the system may employ an external stand-alone NVM or an embedded NVM embedded within the system. Embedded NVM is generally compatible with CMOS (Complementary Metal-Oxide-Semiconductor) Semiconductor processes, can be integrated with logic computing chips, and has a faster read speed in the system.

Compared to other NVMs, there are cost and capacity advantages to current flashes. Many flash memory technologies on the market today already have the capability to store multiple bits. Flash memory is slow to erase (milliseconds) but read much faster (nanoseconds), and the read speed of flash memory and other NVMs can support the high bandwidth required for neural network computations.

Inner calculation of storage (In-Memory-calculating)

Since AI computation requires extremely high memory bandwidth, the architecture that separates the processor and memory/storage encounters a bottleneck of insufficient read speed. The industry has begun to study extensively the architecture of computing in one, memory computing. For example, as shown in fig. 4, the schematic diagram of the prior art that adopts the circuit added in the standard NVM array to perform AI calculation, i.e. the architecture of neural network calculation, which uses the nonvolatile memory to store the weights required by the neural network calculation, and uses the analog circuit to perform vector multiplication, large-scale multiplication and addition can be performed in parallel, which can improve the operation speed and save power consumption.

In the prior art, the in-memory calculation of the NVM is realized by using a nonvolatile memory to store weights in a neural network and realizing the calculation of the neural network through a simulation method, but because the practical neural network basically has a plurality of layers and very complicated connection structures, the transmission of simulation signals between the layers and various processing are very inconvenient, the flexible neural network structure is not favorably supported, the realization and the application of the whole neural network model are quite difficult, and various noises and errors in the storage, the reading and the writing of the simulation signals and the calculation obviously influence the reliability and the calculation accuracy of the model.

Fig. 5 is a diagram of the chip architecture for AI calculation based on NVM according to the present invention. As shown in fig. 5, the chip architecture for performing AI calculation based on NVM of the present invention includes NVM array 7, external interface module 2, SRAM5, NPU6 and MCU1 communicatively connected via bus 4. The MCU1 reads from and writes to the SRAM5 and internal NVM array 7 via the bus 4, and communicates with the NPU 6. The NVM array 7 is used to store the weight parameters of the digitized neural network, the program run by the MCU1, and the neural network model on-chip. The NPU6 is used for digital domain acceleration calculations for neural networks. The external interface module 2 is used for receiving external AI operation instructions, inputting data and outputting AI calculation results outwards. The MCU1 is used to execute the program stored in the NVM array 7 based on external AI operation instructions to control the NVM array 7 and NPU6 to perform AI calculations on the input data to obtain the AI calculation results.

The SRAM5 is used as a cache for system operation and calculation inside the chip, and is used for storing input and output data, intermediate data generated by calculation, and the like. The method specifically comprises the steps of caching in the process of executing the program by the MCU1, storing the executable program, system configuration parameters, calculation network structure configuration parameters and the like when the MCU1 runs; the NPU6 operates and buffers, and stores input and output data when operated by a neural network model.

In the chip architecture provided by this embodiment, the NPU6 and the NVM are combined to perform AI neural network calculation, wherein the weight parameters of the neural network are digitally stored in the NVM array 7 inside the chip, the neural network calculation is also digital domain calculation, and specifically, the MCU1 controls the NPU6 and the NVM array 7 based on an external AI calculation instruction, the MCU1 controls the NVM array 7 to perform AI calculation by loading the weight parameters of the neural network stored inside, a program run by the MCU1, and a neural network model, compared with various existing storage schemes that use the NVM to perform analog calculation, the digital storage and calculation method has a flexible calculation structure, compared with the existing various storage schemes that use the NVM to perform analog calculation, the NVM storage information has good reliability, high precision, and high reading accuracy, so that the scheme provided by this embodiment breaks through the bottleneck of speed of using the NVM outside the chip and reduces the external input power consumption, but also has high implementability, flexibility and reliability.

In one embodiment, the neural network model is stored in the NVM array 7 digitally, and the neural network model stored in the NVM array 7 may be various. The external AI operation instruction comprises an algorithm selection instruction, and one of the neural network models is selected as an algorithm for AI calculation through the algorithm selection instruction.

The neural network models in this embodiment are digitally stored in the NVM array 7, and there may be a plurality of neural network models according to the number of application scenarios, and for the case that a plurality of application scenarios correspond to a plurality of neural network models, the MCU1 can flexibly select any one of the prestored neural network models according to an externally input algorithm selection instruction to perform AI calculation, thereby overcoming the problem that the analog calculation array structure in which storage and calculation are integrated is rigid and is not favorable for supporting a flexible neural network structure in the prior art.

In one embodiment, NVM array 7 employs one of, but not limited to, flash memory, MRAM, RRAM, MTP, OTP. The interface standard of the external interface module 2 is at least one of SPI, QPI, and parallel interface.

In other embodiments, NVM array 7 employs one of, but not limited to, SONOS flash memory, Floating Gate flash memory, and Split Gate flash memory technologies. The interface standard of external interface module 2 is SPI and/or QPI.

The chip architecture provided in this embodiment is improved on the basis of the traditional flash memory chip architecture, and specifically, the MCU1 and the NPU6 are embedded in the flash memory chip and communicatively connected through the on-chip bus 4, where the on-chip bus 4 may be an AHB bus or other communication buses meeting the requirements, and is not limited herein. In the scheme provided by the embodiment, the NPU6 and the NVM are combined, that is, calculation and storage are both on-chip, wherein the weight parameters of the neural network are digitally stored in the NVM array 7, the neural network calculation is also digital domain calculation, and specifically, the NPU6 and the NVM array 7 are controlled by the MCU1 based on an external AI operation instruction, so that the bottleneck of using the off-chip NVM storage speed is broken through, the external input power consumption is reduced, and high implementability, flexibility and reliability are achieved.

In one embodiment, in addition to on-chip bus 4 communication, the chip architecture includes a high-speed data read channel; in particular, to set up a high speed data read channel between NPU6 and NVM array 7, NPU6 is also used to read the weight parameters from NVM array 7 via the high speed data read channel. In this embodiment, the high-speed data reading channel is used to support the bandwidth requirement for high-speed reading of the weight parameters, i.e., the weight data, of the neural network when the NPU6 performs digital domain operation. The bit width of the high-speed data reading channel is m bits, and m is a positive integer.

In addition, the NVM array 7 is provided with N read channels, where N is a positive integer, and the read channels read N bits of data in one read cycle, and the NPU6 is used to read the weight parameters from the NVM array 7 through the read channels via the high-speed data read channels. Preferably, N is 128-512, and in one read cycle (typically 30-40 ns), NPU6 reads the weight parameters of the neural network from NVM array 7 through the read channel via the high-speed data read channel with m-bit width. Compare with the reading speed that prior art off-chip NVM can support, the utility model discloses the bandwidth is higher far away, can support neural network reasoning commonly used and calculate required parameter reading speed demand.

In one embodiment, the present chip architecture further comprises a data conversion unit. The data conversion unit is used for converting data into a combination of data with the same bit width as the high-speed data reading channel, usually a combination of words with small width (for example 32 bits), for the case that the number of reading channels is not consistent with the bit width and/or the frequency of the high-speed data reading channel is asynchronous. The NPU6 reads data from the data conversion unit via the high speed data read channel at its own clock frequency (which may be above 1 GHz).

Fig. 6 is a schematic diagram of a data conversion unit of the chip architecture of the present application. As shown in fig. 6, the data conversion unit includes a buffer module and a sequential reading module, the buffer module is configured to buffer N bits of data output from the NVM array 7 via the reading channel in sequence according to a cycle, the capacity of the buffer module is N × k bits, and k represents a cycle number. The sequential reading module is used for converting the cache data in the cache module into m-bit wide and outputting the m-bit wide to the NPU6 through the high-speed data reading channel, wherein N x k is an integral multiple of m.

The data conversion unit comprises a buffer module containing N x k bits and a sequential reader for outputting m bits at a time, namely the sequential reading module, wherein N x k is an integral multiple of m; the reading channel is connected with the NVM array 7, N bits can be output in each period, and k periods of data can be stored in the cache; the high speed data read channel width is m bits. The high speed data read channel may contain read and write Command (CMD) and reply (ACK) signals, which are connected to the NVM array 7 read control circuitry. After the read operation is completed, the ACK signal informs the high speed data read channel, which may also inform the on-chip bus, asynchronously multiple times through the sequential read module to input the cached data to the NPU 6.

In one embodiment, the MCU1 is further configured to receive, via the external interface module 2, data access commands for operating the NVM array 7 from outside, and the MCU1 is further configured to complete logic control of basic operations of the NVM array 7 based on the data access commands, which are standard flash memory operation commands; the AI operation instruction and the data access instruction adopt the same instruction format and rule; the AI operation instruction comprises an operation code, and further comprises an address part and/or a data part, wherein the operation code of the AI operation instruction is different from the operation code of the standard flash memory operation instruction.

The instruction for NVM direct operation and the instruction for AI calculation processing in this embodiment use the same instruction format and rules. Taking SPI and QPI interfaces as examples, on the basis of traditional SPI and QPI flash memory operation instructions op _ code, the op _ code which is not used by flash memory operation is selected to be used for expressing an AI instruction, more information is transmitted in an address part, and AI data transmission is implemented in a data exchange period. The AI calculation can be realized only by expanding the instruction decoder to realize the multiplexing of the interface and adding a plurality of state registers and configuration registers.

The MCU1 realizes the digital operation of the NVM array 7, which may specifically include the basic operations of flash memory such as read/write erase, etc., and the external data access command and the external interface may adopt the standard flash memory chip format, which is easy for the chip to be applied flexibly and simply. The MCU1 embedded in the chip is used as a logic control unit of the NVM, a logic state machine in a standard flash memory is replaced, the chip structure is simplified, and the chip area is saved.

The NVM array 7 in this embodiment may be used to store externally input data not limited to data related to AI calculation, that is, may also be used to store externally input other data related to AI calculation and externally input data unrelated to AI calculation, in addition to the neural network model, the weight parameter and the program run by the system inside the chip, where the unrelated data specifically includes information such as system parameters, configuration and/or codes of an external device or system; the basic operations include operations such as reading, writing, and erasing of the neural network model, the weight parameters, and the program run by the internal system, and operations such as reading, writing, and erasing of the stored externally input data directly in the NVM array 7.

In the specific implementation process, the MCU1 receives an external command for reading and writing operations of the NVM array 7, and completes the logic control of the NVM basic operations. These basic operations include storing and reading AI operation model algorithms and parameters, and can also be used for directly storing and reading system parameters, configurations, codes, etc. in the NVM array 7. The MCU1 also accepts external AI arithmetic commands, controls internal arithmetic logic and input/output, and is also used for internal control AI arithmetic logic.

FIG. 7 is a flowchart illustrating the operation of the instructions for invoking NVM read and write operations according to the chip architecture of the present application. As shown in fig. 7, the instruction execution flow is as follows:

step S101, the external device starts the chip where the NVM is located, and the MCU1 is powered on.

Step S102, without external instruction, the MCU1 runs the required codes and parameters and loads them into the SRAM5 from the NVM array 7, and the chip is in standby state.

Step S103, the external device sends an NVM operation instruction, and the MCU1 receives and processes the instruction, where the format and processing mode of the NVM operation instruction are the same as those of the conventional standard NVM.

In one embodiment, the chip architecture further includes a DMA channel 3, the DMA channel 3 being used by an external device to directly read from or write to the SRAM 5. The external interface module 2 realizes multiplexing of data and instructions, and realizes direct read-write operation of external equipment to the SRAM5 in the chip through the DMA channel 3, thereby improving the data transmission efficiency. The external device can also call the SRAM5 as a system memory resource through the DMA channel 3, so that the flexibility of chip application is increased.

Fig. 8 is a flowchart for executing an AI operation instruction based on the chip architecture of the present application. As shown in fig. 8, the AI operation instruction execution flow includes:

step S201, the external device starts the NVM chip, and the MCU1 is powered on.

Step S202, without external command, the MCU1 runs the required codes and parameters and loads them into the SRAM5 from the NVM array 7, and the chip is in standby state.

Step S203, the external device sends an algorithm selection command to select a certain neural network model stored in the NVM array 7 of the chip.

Step S204, the MCU1 processes the instruction, and the internal corresponding storage module is powered on and addressed.

In step S205, the external device sends an AI operation command and input data, and the data is buffered in the SRAM 5.

In step S206, the MCU1 starts the NPU6, and recognizes the input data according to the AI operation command.

Step S207, NPU6 reads the weight parameter data corresponding to the neural network model from NVM array 7 for calculation.

In step S208, the external device reads the AI calculation result from the chip through the external interface module 2.

Steps S205 to S208 may be repeated to input, calculate and output data continuously.

Although embodiments of the present invention have been shown and described, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art without departing from the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A chip architecture for AI calculation based on NVM comprises an NVM array, an external interface module, an NPU and an MCU which are in communication connection through a bus;

2. The NVM-based chip architecture for AI computation of claim 1, further comprising a high speed data read channel; the NPU is also used to read the weight parameters from the NVM array through the high-speed data read channel.

3. The NVM-based chip architecture for AI computation of claim 2, wherein the NVM array is configured with a read channel, the read channel is N-way, N is a positive integer, the read channel reads N bits of data in one read cycle, and the NPU is configured to read the weight parameters from the NVM array via the read channel via the high-speed data read channel.

4. The NVM-based chip architecture for AI computation of claim 3, wherein the bit width of the high speed data read channel is m bits, m being a positive integer; the chip architecture further comprises a data conversion unit, the data conversion unit comprises a cache module and a sequential reading module, the cache module is used for sequentially caching the weight parameters output by the reading channel according to cycles, the capacity of the cache module is N x k bits, and k represents the number of cycles; and the sequential reading module is used for converting the cache data in the cache module into m-bit wide and outputting the m-bit wide to the NPU through the high-speed data reading channel, wherein N x k is an integral multiple of m.

5. The NVM-based chip architecture for AI computation of claim 1, further comprising an SRAM communicatively coupled to the NVM array, the external interface module, the NPU, and the MCU via the bus; the SRAM is used for caching data in the program execution process of the MCU, data in the NPU operation process and input and output data of the neural network model operation.

6. The NVM-based chip architecture for AI computation of claim 1, wherein a plurality of neural network models are stored in the NVM array, and wherein the AI computation instruction comprises an algorithm selection instruction for selecting one of the plurality of neural network models as an algorithm for AI computation.

7. The NVM-based chip architecture for AI computation of claim 1, wherein the NVM array employs one of a flash memory process, an MRAM process, an RRAM process, an MTP process, an OTP process, and/or wherein the interface standard of the external interface module is at least one of SPI, QPI, and parallel interface.

8. The NVM-based chip architecture for AI computation of claim 7, wherein the MCU is further configured to receive an external operation command for operating the NVM array through the external interface module, and wherein the MCU is further configured to complete logic control of basic operations of the NVM array based on the operation command.

9. The NVM-based chip architecture for AI computation of claim 8, wherein the NVM array employs one of a SONOS flash memory process, a Floating Gate flash memory process, and a Split Gate flash memory process, and the interface standard of the external interface module is SPI and/or QPI;

10. The NVM-based chip architecture for AI computation of claim 5, further comprising a DMA channel for an external device to directly read from or write to the SRAM.