CN111710356A

CN111710356A - Coding type flash memory device and coding method

Info

Publication number: CN111710356A
Application number: CN202010472550.9A
Authority: CN
Inventors: 黄鹏; 项亚臣; 康晋锋; 刘晓彦
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2020-05-29
Filing date: 2020-05-29
Publication date: 2020-09-25
Anticipated expiration: 2040-05-29
Also published as: CN111710356B

Abstract

The invention discloses a coding type flash memory device and a coding method, wherein the coding type flash memory device comprises: the flash memory comprises at least one flash memory array structure unit, a plurality of comparators and a plurality of adders, wherein each flash memory array structure unit in the at least one flash memory array structure unit is a 3D NAND FLASH array structure unit and is used for realizing coding operation to generate source line voltage on each source line in a plurality of source lines of the flash memory array structure unit; each comparator in the plurality of comparators is correspondingly connected with each source line and is used for converting the source line voltage of each correspondingly connected source line into an output result in a binary form; and each adder in the adders is connected with at least 2 comparators in the comparators through corresponding source lines and is used for adding at least two output results corresponding to the at least 2 comparators. The coding type flash memory device and the coding method can realize efficient and accurate full-connection layer or convolution layer operation, thereby realizing a deep neural network.

Description

Coding type flash memory device and coding method

Technical Field

The present invention relates to the field of semiconductor device and integrated circuit technology, and more particularly, to a coding flash memory device and coding method for implementing a deep neural network.

Background

The deep neural network is widely applied to the fields of image processing, voice recognition and the like at present and exhibits excellent performance. In the prior art, the deep neural network mostly relies on the traditional von neumann architecture for operation, but is limited by the separation of a storage unit and a computing unit in the von neumann architecture, and the data transmission through a bus can cause the increase of time delay and energy consumption, so that the improvement of the data processing capability and the energy efficiency ratio of the deep neural network has a bottleneck. For this reason, a coding flash memory device that performs an analog operation based on the 3D NAND FLASH structure has been proposed in the related art. However, the coding type flash memory device still has three main problems in the process of implementing the deep neural network as follows:

firstly, when the threshold voltage fluctuation among different devices is high, the calculation accuracy rate is reduced;

secondly, the result of the analog operation needs to pass through a complex analog-to-digital conversion circuit, so that the energy efficiency ratio caused by the operation in a 3D NANDFLASH memory is reduced to a certain extent;

third, the operation mode is limited to 3D NAND FLASH, that is, only data on a single word line WL can be operated at the same time, the efficiency of the whole computing architecture is not maximized, and the parallelism degree is not saturated.

Disclosure of Invention

Technical problem to be solved

In order to solve the technical problems that an encoding type flash memory device performing analog operation based on a 3D NAND FLASH structure in the prior art is low in calculation accuracy, low in energy efficiency ratio, not maximized in calculation architecture efficiency, not saturated in parallelism and the like, the invention provides an encoding type flash memory device and an encoding method.

(II) technical scheme

One aspect of the present invention discloses an encoding type flash memory device, including: the flash memory comprises at least one flash memory array structure unit, a plurality of comparators and a plurality of adders, wherein each flash memory array structure unit in the at least one flash memory array structure unit is a 3D NAND FLASH array structure unit and is used for realizing coding operation to generate source line voltage on each source line in a plurality of source lines of the flash memory array structure unit; each comparator in the plurality of comparators is correspondingly connected with each source line and is used for converting the source line voltage of each correspondingly connected source line into an output result in a binary form; and each adder in the adders is connected with at least 2 comparators in the comparators through corresponding source lines and is used for adding at least two output results corresponding to the at least 2 comparators so as to realize the deep neural network.

According to an embodiment of the present invention, wherein each flash memory array structure unit includes: a plurality of arithmetic units and a plurality of redundancy units; each operation unit in the plurality of operation units is a transistor at the crossing position of each word line and each source line in the plurality of word lines in the flash memory array structure unit, wherein each source line is correspondingly provided with one operation unit; the plurality of redundant units are transistors of non-operation units in each flash memory array structure unit and are used for being in an on state when coding operation is achieved.

According to an embodiment of the present invention, wherein each flash memory array structure unit further comprises: the flash memory array structure comprises a string selection line and a ground selection line, wherein the string selection line is connected with a bit line end of a flash memory array structure unit; the ground selection line is connected with the source line end of the flash memory array structure unit; wherein the string select line and the ground select line are used to apply a high level when implementing the encoding operation.

According to an embodiment of the present invention, wherein the encoding operation is a fully-concatenated operation or a convolutional layer operation, the number of the plurality of adders is identical to the number of operation results of the summation operation of the fully-concatenated operation or the convolutional layer operation.

Another aspect of the present invention discloses an encoding method implemented based on the above encoding type flash memory device, wherein the encoding method includes: performing encoding operation based on at least one flash memory array structure unit to generate source line voltage on each source line in a plurality of source lines of each flash memory array structure unit; the comparators convert the source line voltage of each source line corresponding to each comparator into output results in a binary form; and the adders add at least 2 output results corresponding to at least 2 comparators in the comparators to realize the deep neural network.

According to an embodiment of the present invention, the encoding operation is a full concatenation operation or a convolutional layer operation.

According to an embodiment of the present invention, when the encoding operation is a full-link operation, before the encoding operation is performed on at least one flash memory array structure unit to generate a source line voltage on each source line in a plurality of source lines of each flash memory array structure unit, the encoding method further includes: and pre-storing each non-zero element in a plurality of non-zero elements in the weight matrix vector of the full connection layer into a corresponding operation unit in a plurality of operation units in the coding type flash memory device.

According to an embodiment of the present invention, when the encoding operation is a convolutional layer operation, before the encoding operation performed by at least one flash memory array structure unit generates a source line voltage on each source line in a plurality of source lines of each flash memory array structure unit, the encoding method further includes: each convolution kernel element in the plurality of convolution kernel elements in the convolution matrix of the convolution layer is pre-stored in a corresponding operation unit in the plurality of operation units in the encoding type flash memory device.

According to an embodiment of the present invention, after pre-storing each non-zero element of a plurality of non-zero elements in a weight matrix vector of a fully-connected layer into a corresponding operation unit of a plurality of operation units in an encoded flash memory device, or after pre-storing each convolution kernel element of a plurality of convolution kernel elements in a convolution matrix of a convolutional layer into a corresponding operation unit of a plurality of operation units in an encoded flash memory device, the encoding method further includes: correspondingly inputting the input elements of the corresponding input vector into each word line in the plurality of word lines to generate word line voltages on each word line, so that the plurality of redundant units of the coding type flash memory device are in an on state.

According to an embodiment of the present invention, after correspondingly inputting the input elements of the corresponding input vector into each of the plurality of word lines to generate a word line voltage on the corresponding each word line, so that the plurality of redundant cells of the encoding type flash memory device are in an on state, the encoding method further includes: a high level is applied to a string selection line and a ground selection line of an encoding type flash memory device to generate a source line voltage on each of a plurality of source lines of each flash memory array structural unit.

According to an embodiment of the present invention, when the encoding operation is a full-link operation, performing the encoding operation based on at least one flash memory array structure unit to generate a source line voltage on each source line of a plurality of source lines of each flash memory array structure unit, includes: the method comprises the steps that coding operation of corresponding input vectors and weight matrix vectors is conducted simultaneously on the basis of a plurality of flash memory array structure units, or coding operation of corresponding input vectors and weight matrix vectors is conducted on the basis of one flash memory array structure unit in a time-sharing multiplexing mode; wherein the input vector is multi-bit data.

According to an embodiment of the present invention, when the encoding operation is a convolutional layer operation, the encoding method further includes: shifting and summing the summation result output by the summation operation of the summers to realize a deep neural network; the convolution matrix of the convolutional layer is multi-bit data.

(III) advantageous effects

Drawings

FIG. 1 is a block diagram of an encoded flash memory device corresponding to a single flash memory array structure unit according to an embodiment of the present invention;

FIG. 2 is a flow chart illustrating an encoding method of an encoding type flash memory device according to an embodiment of the present invention;

FIG. 3 is a block diagram of an encoding type flash memory device for full-link operation according to another embodiment of the present invention;

FIG. 4 is a partial flowchart of an encoding method applied to an encoding type flash memory device for full-link operation according to another embodiment of the present invention;

FIG. 5 is a block diagram of an encoded flash memory device for convolutional layer operation according to another embodiment of the present invention;

FIG. 6 is a partial flowchart of an encoding method applied to an encoding type flash memory device for convolutional layer operation according to another embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to specific embodiments and the accompanying drawings.

The coding flash memory device and the coding method can realize the coding operation of the parallel memory calculation based on the 3D NAND FLASH structure based on the following technical principles: on the same word line WL, when all memory cells (i.e., transistors) are in an on state, the corresponding source line SL can sense the source line current. Based on this principle, in order to implement the parallel memory computing based on the 3D NAND FLASH structure, different from the encoding method for implementing the 3D NAND FLASH memory computing by the traditional serial operation word line WL, as shown in fig. 1, the technical idea of the encoding method of the encoding type flash memory device based on the present invention is as follows:

only one memory cell is preset on each source line SL on the 3D NAND FLASH structure to be responsible for data operations (i.e. operation cells),

the arithmetic unit can store a value of 0 or 1, wherein the threshold voltage on the arithmetic unit is V_th＝V_{th_Low}(i.e. the operation unit stores a value of 0) or V_th＝V_{th_High}(i.e., the operation unit stores a value of 1);

the rest of the memory cells (non-operation cells) on the 3D NAND FLASH structure are set as redundant cells, and if the values stored in the redundant cells are all 0, the threshold voltage of the corresponding redundant cell is V_th＝V_{th_Low}；

When a value of 0 or 1 is input to the operation unit, it corresponds to the word line voltage V applied to the word line WL where the operation unit is located_g＝V_{g_High}Or V_g＝V_{g_Low}；

It can be seen that no matter the input value of the operation unit is 0 or 1, the other redundant units are in the on state, and therefore, the source line current of the source line SL will correspond to the following operation result:

when the source line current read out from the source line SL is 0, it indicates that the operation unit is in the off state, i.e. V_th＝V_{th_High}(1)，V_g＝V_{g_Low}(1) The representative operation result is 1;

when the source line current of SL readout is 1, it indicates that the operation unit is in an on state, i.e. there is one of the following three cases: v_th＝V_{th_High}(1)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_Low}(1) The representative operation result is 0.

Therefore, by using the encoding type flash memory device of the present invention, the encoding method of the parallel memory calculation can be realized within the 3D NAND FLASH configuration. Specifically, the coded flash memory device of the present invention can utilize the storage and computation integration technology to complete the functions of each computation layer in parallel and obtain accurate computation results through the deep neural network based on 3D NANDFLASH assisted by the adder.

An aspect of the present invention discloses an encoding type flash memory device, as shown in fig. 1, 3 and 5, wherein the encoding type flash memory device includes: at least one flash array structure unit, a plurality of comparators and a plurality of adders, for implementing the 3D NAND FLASH-based deep neural network described above.

The encoding type flash memory device includes: at least one flash memory array structure unit 100, such as flash memory array structure units 100-1 to 110-N connected in parallel with each other in FIG. 3 and FIG. 4, has N flash memory array structure units, where N is greater than or equal to 1. Each flash memory array structure unit 100 of the N flash memory array structure units is a 3D NAND FLASH array structure unit, and the 3D NAND FLASH array structure unit simultaneously has a plurality of memory cells, a plurality of bit lines BL, a plurality of word lines WL, and a plurality of source lines SL.

As shown in fig. 1, the number of the plurality of memory cells may be M × M, where M memory cells are connected in series according to a certain arrangement rule to form M memory cell strings, one end of each memory cell string is correspondingly connected to one bit line BL, the other end of each memory cell string is correspondingly connected to one source line SL, each word line WL is perpendicular to the memory cell string, and connects corresponding memory cells on the plurality of memory cell strings, so that the number of the bit lines BL and the source lines SL is M, the number of the word lines WL corresponds to the number of the memory cells connected in series on the memory cell strings, and each flash memory array structure unit 100 is configured to implement an encoding operation to generate a source line voltage on each source line of the plurality of source lines of the flash memory array structure unit.

Specifically, the arithmetic unit preset on each source line SL stores a value of 0 or 1, and the other redundant units storeThe storage value is 0, the redundant unit is in an on state, and the word line voltage applied to the word line WL corresponding to the operation unit is V_g＝V_{g_High}Or V_g＝V_{g_Low}At this time, a source line voltage corresponding to the source line SL in which the arithmetic unit is located can be generated.

The comparator can be a standard device in the field of integrated circuits, and can be a circuit for comparing an analog voltage signal with a reference voltage, the two paths of input of the comparator can be analog signals, the output is a

binary value

0 or 1, and the output value is kept constant when the input voltage fluctuates in value. In another embodiment of the present invention, the comparator may be replaced with a sampling circuit.

Each adder in the adders is connected with at least 2 comparators in the comparators through corresponding source lines and is used for adding at least two output results corresponding to the at least 2 comparators so as to realize the deep neural network. An adder is a logic device in the field of computer technology, and is used for performing digital addition operation. The deep neural network comprises processing of a plurality of convolution layer data or full connection layer data.

Therefore, the coding flash memory device of the invention outputs the binary output result through the comparator, not only can maintain higher calculation accuracy when the threshold voltage fluctuation between different devices is higher, but also can avoid the complexity similar to the traditional output result value, avoid a complex analog-to-digital conversion circuit required by the traditional analog operation, improve the energy efficiency ratio caused by the memory operation of the 3DNAND FLASH structure, finally realize the maximization of the efficiency and the parallelism of the whole calculation framework by the aid of the adder, and can complete the parallel coding operation within one clock period.

According to an embodiment of the present invention, wherein, as shown in fig. 1, 3 and 5, each flash memory array structure unit includes: a plurality of arithmetic units C and a plurality of redundancy units; each of the plurality of arithmetic units CC is a transistor at the crossing position of each word line and each source line in the plurality of word lines in the flash memory array structure unit, wherein each source line is correspondingly provided with an operation unit C; the arithmetic unit C is a preset memory unit in the memory unit string on each source line SL in the 3D NAND FLASH structure, and only this memory unit in the source line SL memory unit string is responsible for data arithmetic. The arithmetic unit C can store a value of 0 or 1, wherein the threshold voltage of the arithmetic unit C is V_th＝V_{th_Low}(i.e., the operation unit C stores a value of 0) or V_th＝V_{th_High}(i.e., the computing unit C stores a value of 1). The operation unit is used for full-connection layer operation or convolution layer operation, and each nonzero-position element in a plurality of nonzero-position elements in weight matrix vectors of the full-connection layer is correspondingly stored in the corresponding operation unit C during full-connection layer operation; when carrying out convolution layer operation, each convolution kernel element in a plurality of convolution kernel elements in the convolution matrix of the convolution layer is correspondingly stored in a corresponding operation unit C.

The plurality of redundant cells are transistors of the non-operational cell C in each flash array structural cell 100 for being in an on-state when implementing the encoding operation. The redundant cell is a memory cell corresponding to the non-operation cell C in the memory cell string on each source line, and corresponds to a transistor at a position where each source line and each non-word line cross each other, that is, a redundant cell. When the encoding type flash memory device of the present invention performs the encoding budget, the redundancy unit is in the on state no matter the input value of the operation unit C is 0 or 1.

According to an embodiment of the present invention, as shown in fig. 1, 3 and 5, each flash memory array structure unit 100 further includes: a string selection line SSL (serial select line) and a ground selection line gsl (ground select line), where the string selection line SSL is connected to a bit line end of the flash memory array structure unit 100, and specifically, the string selection line SSL is provided with a plurality of selection tubes, one end of each selection tube is connected to the bit line BL, and the other end is connected to a bit end of the memory cell string, and the string selection line SSL connects the plurality of selection tubes in series along a direction perpendicular to the source line SL.

The ground selection line GSL is connected to a source line terminal of the flash array structure unit 100; specifically, the ground selection line GSL is provided with a plurality of selection tubes, one end of each selection tube is connected to the source line SL, the other end of each selection tube is connected to the source end of the memory cell string, and the ground selection line GSL connects the plurality of selection tubes in series along a direction perpendicular to the source line SL.

The string selection line SSL and the ground selection line GSL are used to apply a high level when implementing the encoding operation.

According to an embodiment of the present invention, wherein the encoding operation is a fully-concatenated operation or a convolutional layer operation, the number of the plurality of adders is identical to the number of operation results of the summation operation of the fully-concatenated operation or the convolutional layer operation. Specifically, the number L of adders in each flash memory array structure unit 100 is equal to the number of the operation results of the summation operation of the full-connection operation or the convolution layer operation in the flash memory array structure unit 100, the number L of adders in the coded flash memory device of the present invention is L × N, and N is the number of flash memory array structure units 100 connected in parallel in the coded flash memory device.

For the full-link layer operation, as shown in fig. 3, the input vector X (1 × M) is multiplied by the weight matrix vector K (M × N) to obtain the output vector Y (1 × N), and the mathematical expression is as follows:

Y_i＝X₁·K_1,i+X₂·K_2,i+…+X_M·K_M,i

wherein i is more than or equal to 1 and less than or equal to N.

Therefore, the output result Y of the full-join operation can have N, and the number of adders required for the full-join operation is equal to N.

For convolutional layer operations, as shown in fig. 5, in a fully connected layer of a plurality of local regions, each part of a matrix X (M × N) of an input vector is subjected to matrix-vector multiplication with a convolution matrix K (K × K), so as to complete convolution operations and obtain a matrix Y (M × N) of an output vector, and the following formula (2) is expressed mathematically:

Y_i,j＝X_i,j·K_k,k+X_i,j+1·K_k,k-1+X_i+1,j·K_k-1,k+…+X_i+k-1,j+k-1·K_1,1

wherein i is more than or equal to 1 and less than or equal to M-k +1(M), and j is more than or equal to 1 and less than or equal to N-k +1 (N).

Thus, the output result Y of the convolutional layer operation may have (M-k +1) × (N-k +1), which constitutes a complete output matrix, and the number of adders required for the corresponding convolutional layer operation is equal to (M-k +1) × (N-k + 1).

Either convolution or full join operations are equivalent to vector matrix multiplication operations. In contrast, convolutional layers increase the shift operation of the convolutional kernel over fully-connected layers. When two types of vector matrix multiplication are implemented by using the flash array structure unit 100 with the 3DNAND FLASH structure, the main difference is that the bit line BL input of the convolution layer is updated to the next local region of the input image to be convolved after each cycle.

The above has described in detail an embodiment of the encoding type flash memory device of the present invention with reference to fig. 1, 3 and 5.

The application of 3D NAND FLASH to vector matrix multiplication operations to achieve parallelized data processing is now described in detail.

Another aspect of the present invention discloses an encoding method, as shown in fig. 2, implemented based on the above encoding type flash memory device, wherein the encoding method includes:

step S410: performing encoding operation based on at least one flash memory array structure unit to generate source line voltage on each source line in a plurality of source lines of each flash memory array structure unit; specifically, the preset operation unit stores a

value

0 or 1 on each source line SL, the other redundancy units store a value 0, the redundancy units are in an on state, and the word line voltage applied to the word line WL corresponding to the operation unit is V_g＝V_{g_High}Or V_g＝V_{g_Low}At this time, a source line voltage corresponding to the source line SL in which the arithmetic unit is located can be generated.

Step S420: the comparators convert the source line voltage of each source line corresponding to each comparator into output results in a binary form; the comparator can output a binary value of 0 or 1 by using analog signals as two paths of input, and the output value is kept constant when the input voltage fluctuates.

Step S430: and the adders add at least 2 output results corresponding to at least 2 comparators in the comparators to realize the deep neural network. An adder is a logic device in the field of computer technology, and is used for performing digital addition operation. The deep neural network comprises processing of a plurality of convolution layer data or full connection layer data.

The encoding method of the encoding type flash memory device of the present invention can realize the encoding method of the parallel memory calculation in the 3D NAND FLASH structure. Specifically, the coding flash memory device of the present invention can utilize the storage integration technology to complete the functions of each computation layer in parallel and obtain accurate computation results through the adder-assisted 3D NANDFLASH-based deep neural network.

Therefore, the coding method of the invention outputs the binary output result through the comparator, not only can maintain higher calculation accuracy when the threshold voltage fluctuation between different devices is higher, but also can avoid the complexity similar to the traditional output result value, avoid a complex analog-to-digital conversion circuit required by the traditional analog operation, improve the energy efficiency ratio caused by the memory operation of the 3D NANDFLASH structure, finally realize the maximization of the efficiency and the parallelism of the whole calculation framework by the aid of the adder, and can complete the parallel coding operation within one clock period.

The encoding method of the encoding type flash memory device having a single flash memory array structure unit according to the present invention is further explained as follows.

According to an embodiment of the present invention, as shown in FIG. 1, for the above-mentioned coded flash memory device of the present invention, a certain flash array structure unit 100 is to encode each element (K) of the weight matrix vector K (M × N) before the start of the encoding operation₁～K_M) The non-zero elements in (A) are respectively stored in the corresponding arithmetic units C on the M source lines SL of the 3D NAND FLASH structural unit, namely the non-zero elements K₁Storing the cross position of the word line WL1 and the source line SL1, a non-zero bit element K₂Storing into the crossing location … … of the word line WL2 and the source line SL2 and so on, the non-zero element K_MMemory wordLine WL_MAnd source line SL_MThe crossing position of (c).

As described above, the

value

0 or 1 stored in the operation cell C corresponds to the threshold voltage V of the operation cell_th＝V_{th_Low}Or V_{th_High}Only one device on each source line SL is an arithmetic unit C, the other units are set as redundant units, and the value stored in each redundant unit is 0, namely the threshold voltage V on the corresponding redundant unit_th＝V_{th_Low}So that the redundant cell remains on.

Each element (X) of the input vector (1 × M)₁～X_M) Respectively converted into corresponding voltage values and applied to a word line WL 1-a word line WL-M, the corresponding voltage values correspond to the

input

0 or 1 on the word line WL, so that the word line voltage applied on the word line WL is V_g＝V_{g_High}Or V_{g_Low}. Therefore, the input data of the word line WL corresponding to the redundant unit can be ensured not to influence the state of the redundant unit, namely the redundant unit is always in an on state.

Finally, when the string selection line SSL and the ground selection line GSL of the flash memory array structure unit 100 are simultaneously raised, the output of the source line SL at this time depends only on the state of the corresponding arithmetic unit C, specifically: when the weight value stored in the operation unit C and the input data input through the word line WL are both 1, that is, the threshold voltage of the operation unit C is V_th＝V_{th_High}Word line voltage V of the word line WL on which it is located_g＝V_{g_Low}When the operation unit C is in a cut-off state, the corresponding source line SL passive line current represents that the operation result is 1; and (3) the rest conditions are as follows: v_th＝V_{th_High}(1)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_High}(0)；V_th＝V_{th_Low}(0)，V_g＝V_{g_Low}(1) The operation units are all in an open state, and the corresponding source line SL has a source line current, which represents that the operation result is 0.

The comparator 200 connected with the source line through the end of the source line SL can read out the presence or absence of the source line current and convert the source line current into a

calculation result

0 or 1, that is, an output result in a binary form of the comparator 200; the output results obtained by the operation units C of different source lines SL are summed by the adder, and the summed result of the input data and the weight data can be obtained, so that the deep neural network is realized.

The encoding method of the encoding type flash memory device having a plurality of flash array structure units according to the present invention will be further explained below.

Y_i＝X₁·K_1,i+X₂·K_2,i+…+X_M·K_M,i

wherein i is more than or equal to 1 and less than or equal to N.

Y_i,j＝X_i,j·K_k,k+X_i,j+1·K_k,k-1+X_i+1,j·K_k-1,k+…+X_i+k-1,j+k-1·K_1,1

According to another embodiment of the present invention, as shown in fig. 3 and fig. 4, where the encoding method according to the embodiment of the present invention is implemented based on the encoding flash memory device shown in fig. 3, and when the encoding operation is a full-link operation, before step S410, the encoding method further includes:

step S510: and pre-storing each non-zero element in a plurality of non-zero elements in the weight matrix vector of the full connection layer into a corresponding operation unit C in a plurality of operation units C in the coding type flash memory device.

According to still another embodiment of the present invention, after step S510, the encoding method further includes:

step S520: correspondingly inputting the input elements of the corresponding input vector into each word line in the plurality of word lines to generate word line voltages on each word line, so that the plurality of redundant units of the coding type flash memory device are in an on state.

According to another embodiment of the present invention, after step S520, the method further includes:

step S530: a high level is applied to a string selection line and a ground selection line of an encoding type flash memory device to generate a source line voltage on each of a plurality of source lines of each flash memory array structural unit.

Specifically, when the encoding operation is a full join operation, as shown in fig. 3 and 4, one flash array structural unit 100 (i.e., 3D NAND FLASH structural unit) in the encoding type flash memory device may be responsible for handling the full join operation of the single-bit input vector X (1 × M) and the single-ratio privilege value matrix K (M × N) as a single "string (serial)".

The same string has N groups of storage modules which respectively correspond to the output vectors Y-1 to Y-N. Each group of memory modules comprises M source lines SL, which are respectively corresponding to input word lines WL-1-WL-M. Before the operation starts, the calculation resources of the 3D NAND FLASH structural unit need to be configured, and the non-zero elements of the weight matrix are written into the corresponding operation unit C in the 3D NAND FLASH structural unit according to the corresponding relationship between the input vector and each element of the weight matrix vector when the vector matrix multiplication is performed, that is, corresponding to step S510.

Input vectors (X) of different strings₁～X_M) Is applied to the corresponding word line WL 1-WL-M, which translates to the input word line voltage on the corresponding word line WL 1-WL-M, respectively, corresponding to step 520, leaving the redundant cell on the 3D NAND FLASH structural unit in an on state.

When the string selection line SSL line and the ground selection line GSL line are high-level, the encoding operation starts: since the same source line SL has only one FLASH unit (i.e., the operation unit C), the FLASH unit stores the weight value (i.e., the non-zero element of the weight matrix) and performs the operation, and the operation result (0 or 1) can be reflected by the source line voltage, i.e., corresponds to step S530.

The source line current value corresponding to the source line voltage is read by the comparator and converted into an output result in a binary form. The N adders add the operation results of the output results of the N groups of corresponding comparators (each group of comparators corresponds to M SL) respectively to obtain the output results Y1-Y-N of the full connection layer.

According to another embodiment of the invention, different strings (SSL-1 to SSL-N) are used for cooperatively processing the full-connection operation of the multi-bit input and the multi-bit weight matrix. When the encoding operation is a full-concatenation operation, the input vector is multi-bit data, and the multi-bit data may be data of a multi-bit input vector and a multi-bit weight matrix vector. Step S410 includes:

because the plurality of flash memory array structure units are connected in parallel in the coding type flash memory device, the full connection operation of multi-bit input and multi-bit weight matrix can be realized simultaneously, namely, the full connection operation of single-bit data is realized by each flash memory array structure unit of the plurality of flash memory array structure units.

Or based on a flash memory array structure unit, coding operation of corresponding input vectors and weight matrix vectors is carried out in a time division multiplexing mode; for the condition that the number of the flash memory array structure units is limited, the full-connection operation of multi-bit input and multi-bit weight matrix can be divided into multiple times based on one flash memory array structure unit at different time, namely, the full-connection operation of single-bit data is realized by the flash memory array structure unit each time.

Therefore, according to the encoding method provided by the embodiment of the invention, the memory computing architecture based on the 3D NAND FLASH structural unit can complete parallel full-connection operation in one clock cycle.

According to another embodiment of the present invention, as shown in fig. 5 and fig. 6, where the encoding method according to the embodiment of the present invention is implemented based on the encoding flash memory device shown in fig. 5, when the encoding operation is a convolutional layer operation, before step S410, the encoding method further includes:

step S610: each convolution kernel element in the plurality of convolution kernel elements in the convolution matrix of the convolution layer is pre-stored in a corresponding operation unit in the plurality of operation units in the encoding type flash memory device.

According to another embodiment of the present invention, after step S610, the encoding method further includes:

step S620: correspondingly inputting the input elements of the corresponding input vector into each word line in the plurality of word lines to generate word line voltages on each word line, so that the plurality of redundant units of the coding type flash memory device are in an on state.

According to another embodiment of the present invention, after step S620, the encoding method further includes:

step S630: a high level is applied to a string selection line and a ground selection line of an encoding type flash memory device to generate a source line voltage on each of a plurality of source lines of each flash memory array structural unit.

Specifically, for the encoding operation being a convolutional layer operation, as shown in fig. 5 and 6, each string of 3D NAND FLASH structural elements is responsible for handling the convolution operation of a single-bit input matrix X (M × N) with a convolution kernel K (K × K) by way of example in fig. 5, a 3X 3 image (X) is input_1,1,X_1,2,X_1,3,X_2,1,X_2,2,X_2,3,X_3,1,X_3,2,X_3,3) Convolution kernel with 2 x 2 (K)_1,1,K_1,0,K_0,1,K_0,0) The resulting 2 x 2 output image after convolution is (Y)_1,1,Y_1,2,Y_2,1,Y_2,2). To implement the above convolution operation, the input image (i.e. the corresponding input matrix vector) needs to be split into four sub-matrices (X)_1,1,X_1,2,X_2,1,X_2,2)，(X_1,2,X_1,3,X_2,2,X_2,3)，(X_2,1,X_2,2,X_3,1,X_3,2)，(X_2,2,X_2，3,X_3,2,X_3,3) Respectively with convolution kernels (K)_1,1,K_1,0,K_0,1,K_0,0) Multiplication of vector matrix to obtain Y_1,1,Y_1,2,Y_2,1,Y_2,2。

In the encoding operation, each convolution kernel element in the convolution matrix of the convolution layer is mapped to an input vector (X)_1,1,X_1,2,X_2,1,X_2,2) The intersection positions of the bit lines WL1, WL2, WL4, WL5 and the source lines SL in the corresponding operation units C correspond to step S610. Considering that each convolution layer in the deep neural network has a plurality of convolution kernels to perform multi-dimensional feature extraction on an input image, other convolution kernels can be written into corresponding operation units C of WL 1-WL 9 respectively according to the same principle.

The input image is split into line vectors (X) in the direction of the lines_1,1,X_1,2,…,X_1,N,…,,,X_M,1,X_M,2,…,X_M,N) And applied as inputs to the word lines WL of the 3D NAND FLASH structural cells to be respectively converted into input word line voltages on the corresponding word lines WL, corresponding to step S620.

When the string selection line SSL line and the ground selection line GSL line are high-level, the operation starts: since the same source line SL has only one FLASH unit (i.e., the operation unit C), the FLASH unit stores the convolution kernel element value and performs the operation, and the operation result (0 or 1) thereof can be reflected by the source line voltage, i.e., corresponds to step S630. Finally, as shown in fig. 5, the 4 adders 300 add the output results of the corresponding 4 groups of comparators 200 (4 source lines SL in each group) to obtain the output result (Y) of the convolutional layer operation_1,1,Y_1,2,Y_2,1,Y_2,2) Therefore, the deep neural network is realized.

According to another embodiment of the present invention, when the encoding operation is a convolutional layer operation, the convolutional layer convolution matrix is multi-bit data. The encoding method of the present invention further includes:

step S440: shifting and summing the summation result output by the summation operation of the summers to realize a deep neural network; specifically, when the convolution kernel is multi-bit data, shift summation is performed on the summation result of the adder obtained by the encoding operation, so that an output image after the input image is convolved with the multi-bit convolution kernel can be obtained.

Therefore, the encoding method of the present invention is based on the above encoding flash memory device, and can realize parallel convolution operation in one clock cycle on the memory computing architecture based on the 3D NAND FLASH structure.

It can be seen that, in the encoding method of the present invention, before the encoding operation starts, the non-zero elements in the weight matrix vector or the convolution kernel elements in the convolution vector are written into the corresponding positions (operation units) of the 3D NAND FLASH structural units according to a certain mapping rule, and the values (0 or 1) of the different elements in the input vector determine the input voltage (i.e. word line voltage V) on the word line WL in the flash memory array structural unit_g＝V_{g_High}Or V_{g_Low}) Then, the output source line current of the corresponding source line SL reflects the calculation result of the arithmetic unit. Finally, the calculation results are converted into binary form (0 or 1) and read out in parallel by a plurality of comparators provided at the ends of the respective source lines SL in the 3D NAND FLASH structural unit. On the basis, the adder connected with the comparators performs addition operation on output results of the comparators from the multiple source lines to obtain a final addition operation result, and efficient and accurate full-connection or convolution layer operation is realized, so that the deep neural network is realized.

So far, the embodiments of the present invention have been described in detail with reference to the accompanying drawings.

It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.

It should also be noted that directional terms, such as "upper", "lower", "front", "rear", "left", "right", etc., used in the embodiments are only directions referring to the drawings, and are not intended to limit the scope of the present invention. Throughout the drawings, like elements are represented by like or similar reference numerals. Conventional structures or constructions will be omitted when they may obscure the understanding of the present invention.

And the shapes and sizes of the respective components in the drawings do not reflect actual sizes and proportions, but merely illustrate contents of the embodiments of the present invention. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.

Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.

The use of ordinal numbers such as "first," "second," "third," etc., in the specification and in the claims to modify a corresponding element does not by itself connote any ordinal number of the element or any ordering of one element from another or the order of manufacture, and the use of the ordinal numbers is only used to distinguish one element having a certain name from another element having a same name.

Those skilled in the art will appreciate that the modules in the device of an embodiment may be adaptively changed and placed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. An encoding type flash memory device, comprising:

at least one flash memory array structure unit, wherein each flash memory array structure unit is a 3D NAND FLASH array structure unit, and is used for realizing coding operation to generate source line voltage on each source line in a plurality of source lines of the flash memory array structure unit;

a plurality of comparators, wherein each comparator is correspondingly connected with each source line and is used for converting the source line voltage of each correspondingly connected source line into an output result in a binary form; and

and each adder is connected with at least 2 comparators in the comparators through corresponding source lines and is used for summing at least two output results corresponding to the at least 2 comparators so as to realize the deep neural network.

2. The coded flash memory device according to claim 1, wherein each of the flash array structure units comprises:

the memory comprises a plurality of operation units, wherein each operation unit is a transistor at the crossing position of each word line in a plurality of word lines and each source line in the flash memory array structure unit, and one operation unit is correspondingly arranged on each source line;

and the redundant units are transistors of non-operation units in each flash array structure unit and are used for being in an on state when the coding operation is realized.

3. The coded flash memory device according to claim 2, wherein each of said flash array structure units further comprises:

the string selection line is connected with the bit line end of the flash memory array structure unit;

the ground selection line is connected with a source line end of the flash memory array structure unit;

wherein the string select line and the ground select line are used to apply a high level when implementing the encoding operation.

4. The coded flash memory device of claim 1, wherein the coding operation is a full concatenation operation or a convolutional layer operation,

the number of the adders is equal to the number of operation results of the addition operation of the full-concatenation operation or the convolution layer operation.

5. An encoding method implemented based on the encoding type flash memory device of any one of claims 1 to 4, wherein the encoding method comprises:

performing encoding operation based on at least one flash memory array structure unit to generate source line voltage on each source line in a plurality of source lines of each flash memory array structure unit;

a plurality of comparators convert the source line voltage of each source line corresponding to each comparator into an output result in a binary form; and

and the adders add at least 2 output results corresponding to at least 2 comparators in the comparators to realize the deep neural network.

6. The encoding method of claim 5, wherein the encoding operation is a fully concatenated operation or a convolutional layer operation.

7. The encoding method of claim 6, wherein when the encoding operation is a full-link operation, before the encoding operation performed on at least one flash memory array structure unit generates a source line voltage on each source line of a plurality of source lines of each flash memory array structure unit, the method further comprises:

and pre-storing each non-zero element in a plurality of non-zero elements in the weight matrix vector of the full connection layer into a corresponding operation unit in a plurality of operation units in the coding type flash memory device.

8. The encoding method of claim 6, wherein when the encoding operation is a convolutional layer operation, before the encoding operation performed by at least one flash memory array structure unit generates a source line voltage on each source line in a plurality of source lines of each flash memory array structure unit, the method further comprises:

and pre-storing each convolution kernel element in a plurality of convolution kernel elements in the convolution matrix of the convolution layer into a corresponding operation unit in a plurality of operation units in the coding type flash memory device.

9. The encoding method according to claim 7 or 8,

after pre-storing each non-zero element in a plurality of non-zero elements in the weight matrix vector of the fully-connected layer into a corresponding operation unit in a plurality of operation units in the coded flash memory device, or after pre-storing each convolution kernel element in a plurality of convolution kernel elements in the convolution matrix of the convolutional layer into a corresponding operation unit in a plurality of operation units in the coded flash memory device, the method further includes:

correspondingly inputting the input elements of the corresponding input vector into each word line of the plurality of word lines to generate word line voltages on each word line, so that the plurality of redundant units of the coding type flash memory device are in an on state.

10. The encoding method of claim 9, wherein after correspondingly inputting the input elements of the corresponding input vector into each of the plurality of word lines to generate a word line voltage on the corresponding each word line such that the plurality of redundant cells of the encoding-type flash memory device are in an on state, further comprising:

applying a high level to a string select line and a ground select line of the encoding type flash memory device to generate a source line voltage on each of a plurality of source lines of each flash memory array structural unit.

11. The encoding method of claim 5, wherein when the encoding operation is a full join operation, the performing the encoding operation based on at least one flash memory array structural unit to generate a source line voltage on each source line of a plurality of source lines of each flash memory array structural unit comprises:

based on multiple flash memory array structure units, simultaneously performing coding operation corresponding to input vector and weight matrix vector, or

Performing coding operation of corresponding input vectors and weight matrix vectors by adopting a time division multiplexing mode based on a flash memory array structure unit;

wherein the input vector is multi-bit data.

12. The encoding method of claim 5, wherein when the encoding operation is a convolutional layer operation, the encoding method further comprises:

shifting and summing the summation result output by the summation operation of the summers to realize a deep neural network;

wherein the convolution matrix of the convolutional layer is multi-bit data.