WO2021027238A1 - 基于flash存算阵列的图像压缩系统和方法 - Google Patents

基于flash存算阵列的图像压缩系统和方法 Download PDF

Info

Publication number
WO2021027238A1
WO2021027238A1 PCT/CN2019/130472 CN2019130472W WO2021027238A1 WO 2021027238 A1 WO2021027238 A1 WO 2021027238A1 CN 2019130472 W CN2019130472 W CN 2019130472W WO 2021027238 A1 WO2021027238 A1 WO 2021027238A1
Authority
WO
WIPO (PCT)
Prior art keywords
flash
neural network
convolutional neural
storage
image
Prior art date
Application number
PCT/CN2019/130472
Other languages
English (en)
French (fr)
Inventor
康晋锋
项亚臣
黄鹏
刘晓彦
韩润泽
Original Assignee
北京大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京大学 filed Critical 北京大学
Priority to US17/634,442 priority Critical patent/US20220321900A1/en
Publication of WO2021027238A1 publication Critical patent/WO2021027238A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/127Prioritisation of hardware or computational resources
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/423Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation characterised by memory arrangements

Definitions

  • the invention belongs to the field of semiconductor devices and integrated circuits, and in particular is an image compression system and method based on a FLASH storage and calculation array.
  • Image compression is to reduce the time, space, and spectrum redundancy in the image, and use a small number of bits to lossy or losslessly represent the original image, so as to realize the image processing technology of efficient storage and transmission of image data.
  • Image compression is divided into three parts: encoding, quantization and decoding. Encoding and decoding operations take up a large proportion of image compression.
  • Image compression can reduce the irrelevance and redundancy of images, thereby achieving storage or transmission of images at low bit rates.
  • Traditional image coding standards such as JPEG and JPEG2000, when the image compression ratio is increased, the quantization step size increases, which will lead to the reduction of the bit per pixel (BPP) and the problems of blocking or noise in the decoded image.
  • the present disclosure proposes an image compression system and method based on FLASH storage and calculation array, which mainly solves the following technical problems: (1) FLASH-based storage and calculation integrated architecture and hardware implementation; (2) FLASH storage and calculation array to achieve image compression ; (3) Accelerate image compression based on FLASH storage and calculation array.
  • an image compression system based on a FLASH storage and arithmetic array including: a coding convolutional neural network based on the FLASH storage and arithmetic array, a decoding convolutional neural network and a quantization module based on the FLASH storage and arithmetic array;
  • the encoding convolutional neural network based on the FLASH storage and calculation array encodes the original image to obtain the characteristic image
  • the quantization module quantizes the characteristic image to obtain a quantized image
  • the decoding convolutional neural network based on the FLASH storage and calculation array decodes the quantized image to obtain a compressed image.
  • an image compression method based on a FLASH storage and arithmetic array including:
  • the quantized image is decoded by using a decoding convolutional neural network based on FLASH storage and arithmetic array to obtain a compressed image.
  • the image compression system and method of the present disclosure are based on hardware implementation, which can greatly reduce the data exchange between the processor and the memory unit, significantly improve the energy efficiency ratio of the encoding and decoding process, reduce system hardware overhead and reduce energy consumption.
  • FIG. 1 is a schematic diagram of an image compression system based on a FLASH storage and arithmetic array according to the first embodiment of the present invention.
  • Figure 2 is a schematic diagram based on FLASH storage and calculation array
  • Fig. 3 is a flowchart of an image compression method based on a FLASH storage and arithmetic array in the second embodiment of the present invention.
  • the convolutional neural network can extract feature images from the image.
  • the extracted feature images are processed by the convolutional neural network and the compressed image obtained can reflect the original image features to the maximum extent and effectively solve the blocking effect and noise. problem.
  • the image compression system and method based on the FLASH-based storage and calculation integrated array (storage and calculation array) of the present invention can execute a large number of matrix-vector multiplication operations in the convolutional neural network in the image encoding and decoding process in parallel, so that the hardware level Image compression is accelerated while greatly reducing energy and hardware resource consumption, which is of great significance to image compression.
  • the image compression system and method based on the FLASH storage and calculation array of the present invention constructs and trains a convolutional neural network for encoding and decoding based on a CPU/GPU, and obtains the weight distribution of the convolutional neural network. Program the weights obtained by training into the FLASH storage and arithmetic array, and realize the coding and decoding convolutional neural network at the hardware level.
  • the input image is compressed according to the preset compression ratio.
  • the image compression system and method of the present invention can significantly improve the energy efficiency ratio of encoding and decoding processes, reduce system hardware overhead and reduce energy consumption.
  • the first embodiment of the present disclosure provides an image compression system based on a FLASH storage and calculation array, as shown in Figure 1, including a control module, a signal generation module, a coded convolutional neural network based on the FLASH storage and calculation array, and a FLASH storage and calculation Array of decoding convolutional neural network and processor.
  • the control module is connected to the signal generation module, the coded convolutional neural network based on the FLASH storage and calculation array, and the decoding convolutional neural network and the processor based on the FLASH storage and calculation array. According to the control instructions of the processor, the signal generation module and the FLASH storage and calculation The encoding convolutional neural network of the array and the decoding convolutional neural network based on the FLASH storage and arithmetic array output control signals to control the working sequence of the image compression system.
  • the encoding convolutional neural network based on FLASH storage and calculation array and the decoding convolutional neural network based on FLASH storage and calculation array are respectively responsible for the encoding and decoding operations in image compression.
  • the coded convolutional neural network based on FLASH storage and arithmetic array is a multi-layer neural network, including: input layer, multiple hidden layers and output layer. The output of the previous layer is used as the input of this layer, and the output of this layer is used as the input of the next layer.
  • Each layer of the coded convolutional neural network includes a FLASH-based storage and calculation array.
  • the FLASH-based storage and calculation array includes: multiple FLASH cells, multiple word lines, multiple source lines, multiple bit lines, and multiple subtractors.
  • a storage array composed of multiple FLASH cells, where the gate of each column of FLASH cells is connected to the same word line (WL), the source is connected to the same source line, and the drain of each row of FLASH cells is connected to the same bit line ( BL).
  • the number of word lines corresponds to the number of columns in the storage array, and input data is input to the FLASH unit through the word lines.
  • the number of source lines corresponds to the number of columns of the storage array, and the source lines are all connected to a fixed driving voltage V ds , which is applied to the source of the FLASH unit.
  • the number of bit lines corresponds to the number of rows in the storage array and is used to output the signal of the drain of the FLASH unit.
  • Each row of bit lines superimposes the drain signal of the FLASH unit in each column of the row, and outputs the superimposed drain signal as the output signal . That is, the drains of the FLASH cells in each row are connected to the same bit line, and the total current value on the bit line is the sum of the output values of the FLASH cells in each column of the row.
  • the threshold voltage of the FLASH cell can be set by programming and erasing.
  • programming a FLASH cell hot electrons are injected, and the threshold voltage of the FLASH cell increases, and its storage state is regarded as "0", that is, the FLASH cell stores data "0".
  • the FLASH cell is erased, electrons tunnel and the threshold voltage of the FLASH cell is reduced, and its storage state is regarded as "1", that is, the FLASH cell stores data "1". It can be seen that by programming and erasing the FLASH unit, the FLASH unit can store two kinds of data "0" and "1".
  • the FLASH unit with the storage state of "0” is used to represent the “0” in the binary weight
  • the FLASH unit with the storage state of "1” is used to represent the "1” in the binary weight, thereby a storage and calculation array composed of multiple FLASH units Then the weight matrix can be expressed.
  • the source lines of the FLASH units are all connected to a fixed driving voltage V ds .
  • the input data is converted into binary numbers, and input to the FLASH unit through the word line.
  • the 0 voltage is applied to the gate of the FLASH cell through the word line, and the drain output current is the product of the input data “0” and the stored data (“0” or “1”) of the FLASH cell ;
  • V g is applied to the gate of the FLASH cell through the word line, which is the product of the input data "1” and the data stored in the FLASH cell.
  • Each row of bit lines superimpose the drain signals of the FLASH cells in each column of the row, and the superimposed drain signal "sum current” is output as the output signal, that is, the total current value on the bit line is the sum of the output signals of the FLASH cells in this row and each column , Reflects the result of multiplying the input vector and the weight matrix stored in the FLASH storage array.
  • the number of subtractors corresponds to half of the number of rows in the storage array, and the positive and negative terminals of each subtractor are respectively connected to two adjacent bit lines.
  • every two adjacent bit lines are connected to a subtractor.
  • the FLASH unit on the bit line connected to the positive terminal stores the positive weight value, and the bit line connected to the negative terminal
  • the upper FLASH unit stores negative weight values, thereby realizing matrix-vector multiplication operations.
  • Each layer of the coded convolutional neural network also includes an activation unit, the output of the subtractor is connected to the activation unit, and the activation unit activates the output signal, and the activation result is sent to the next layer as the output data of the layer.
  • the structure of the decoding convolutional neural network based on the FLASH storage and arithmetic array is the same as the above-mentioned coded convolutional neural network, and will not be repeated here.
  • the signal generation module has two functions. One is to program the FLASH storage and arithmetic array according to the output signal of the control module, and the weights obtained by training are written into the corresponding FLASH unit in turn; the other is the image compression encoding and decoding stages, respectively The input image and the quantized image are converted into voltage signals and added to the word line of the FLASH array.
  • the signal generation module converts the weights in the weight matrices of each layer of the convolutional neural network into binary numbers, and programs or erases the corresponding FLASH units according to the binary weights to store the weight matrices in the FLASH storage array .
  • the input image and the quantized image are converted into binary signals, and the binary signals are sent to the input layer of the encoding convolutional neural network and the decoding convolutional neural network.
  • the processor includes a quantization module, which uses JPEG and JPEG2000 standards to quantize the output data of the encoded convolutional neural network.
  • the image compression system based on the FLASH storage and calculation array of this embodiment encodes the original image based on the encoding convolutional neural network of the FLASH storage and calculation array to obtain a feature image, and the quantization module quantizes the feature image to obtain a quantized image.
  • the decoded convolutional neural network of the arithmetic array decodes the quantized image to obtain a compressed image.
  • the hardware implementation of this embodiment stores the weights in the FLASH storage-calculation array, and uses the storage-calculation array to perform calculations, which eliminates random access to the weights in the calculation process, thereby realizing the integration of storage and calculation.
  • the coding convolutional neural network and decoding convolutional neural network models need to be built on the software side, and the number of layers, dimensions, and dimensions of the network model are determined according to the requirements of image compression for speed, accuracy, and energy consumption. Parameters such as the number of channels and the size of the convolution kernel. Co-training the constructed coding convolutional neural network model, decoding convolutional neural network model, and quantization module, obtains coding convolutional neural network and decoding convolutional neural network that meet the requirements of image compression.
  • the second embodiment of the present disclosure provides an image compression method based on a FLASH storage and calculation array, as shown in FIG. 3, including the following steps:
  • the quantized image is decoded by using a decoding convolutional neural network based on FLASH storage and arithmetic array to obtain a compressed image.
  • the network is initialized, and the coding convolutional neural network model and the decoding convolutional neural network model are constructed.
  • the coded convolutional neural network model and the decoding convolutional neural network model are backpropagated, and the weights of the coded convolutional neural network model and the decoding convolutional neural network model are updated.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

本公开提供了一种基于FLASH存算阵列的图像压缩系统和方法,图像压缩系统包括:基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络和量化模块;所述基于FLASH存算阵列的编码卷积神经网络对原始图像进行编码,得到特征图像;所述量化模块对所述特征图像进行量化,得到量化图像;所述基于FLASH存算阵列的解码卷积神经网络对所述量化图像进行解码,得到压缩图像。

Description

基于FLASH存算阵列的图像压缩系统和方法 技术领域
本发明属于半导体器件及集成电路领域,具体是一种基于FLASH存算阵列的图像压缩系统和方法。
背景技术
图像压缩是以减少图像中的时间、空间和频谱等冗余为目的,用较少的比特数有损或者无损的表示原来图像,从而实现高效存储与传输图像数据的图像处理技术。图像压缩分为编码、量化和解码三个部分,编码和解码操作在图像压缩中占极大比重。
深度学习和大数据技术的发展导致非结构化数据,如图像、视频等剧增,图像压缩能够减少图像的不相关性和冗余度,从而实现以低比特率存储或传输图像。传统的图像编码标准如JPEG和JPEG2000,当增加图像压缩比时,量化步长随之增加,会导致每像素比特(BPP)减小、解码图像具有块效应或噪声等问题。
公开内容
本公开提出了一种基于FLASH存算阵列的图像压缩系统和方法,主要解决以下技术问题:(1)基于FLASH的存储计算一体化架构与硬件实现;(2)基于FLASH存算阵列实现图像压缩;(3)基于FLASH存算阵列加速图像压缩。
根据本公开的一个方面,提供了基于FLASH存算阵列的图像压缩系统,包括:基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络和量化模块;
所述基于FLASH存算阵列的编码卷积神经网络对原始图像进行编码,得到特征图像;
所述量化模块对所述特征图像进行量化,得到量化图像;
所述基于FLASH存算阵列的解码卷积神经网络对所述量化图像进行解码,得到压缩图像。
根据本公开的另一个方面,提供了一种基于FLASH存算阵列的图像 压缩方法,包括:
分别将编码卷积神经网络和解码卷积神经网络的权重矩阵写入基于FLASH的存算阵列;输入原始图像;
利用基于FLASH存算阵列的编码卷积神经网络对原始图像进行编码,得到特征图像;
利用量化模块对特征图像进行量化,得到量化图像;
利用基于FLASH存算阵列的解码卷积神经网络对量化图像进行解码,得到压缩图像。
本公开的图像压缩系统和方法基于硬件实现,可以极大减少处理器和内存单元间的数据交换,显著提高编码和解码过程的能效比,减少系统硬件开销和降低能量消耗。
为使本公开的上述目的、特征和优点能更明显易懂,下文特举优选实施例,并配合所附附图,作详细说明如下。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍。应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定。对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。
图1为本发明第一实施例基于FLASH存算阵列的图像压缩系统的示意图。
图2为基于FLASH存算阵列的示意图;
图3本发明第二实施例基于FLASH存算阵列的图像压缩方法的流程图。
具体实施方式
卷积神经网络经多次训练后,能够从图像中提取特征图像,提取到的特征图像再经卷积神经网络处理后得到的压缩图像能够最大限度反映原始图像特征,有效解决块效应和噪声等问题。本发明的基于FLASH的存储计算一体化阵列(存算阵列)的图像压缩系统和方法,能够并行执行图像编码和解码过程中卷积神经网络中的大量矩阵向量乘法运算,从而可以在硬件层面对图像压缩进行加速,同时极大地降低能量和硬件资源消耗, 对图像压缩具有重要意义。
本发明的基于FLASH存算阵列的图像压缩系统和方法,基于CPU/GPU构建并训练用于编码和解码的卷积神经网络,得到卷积神经网络的权重分布。将训练得到的权重编程写入FLASH存算阵列,在硬件层面实现编码和解码卷积神经网络。根据预先设置的压缩比对输入图像进行压缩。本发明的图像压缩系统和方法可以显著提高编码和解码过程的能效比,减少系统硬件开销和降低能量消耗。
下面将结合实施例和实施例中的附图,对本公开实施例中的技术方案进行清楚、完整的描述。显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开第一实施例提供了一种基于FLASH存算阵列的图像压缩系统,如图1所示,包括控制模块、信号产生模块、基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络和处理器。
控制模块连接信号产生模块、基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络和处理器,根据处理器的控制指令,向信号产生模块、基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络输出控制信号,控制图像压缩系统的工作时序。
基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络分别负责图像压缩中的编码和解码运算。基于FLASH存算阵列的编码卷积神经网络是一个多层神经网络,包括:输入层、多个隐藏层以及输出层。上一层的输出作为本层的输入,本层的输出作为下一层的输入。编码卷积神经网络的每一层包括一个基于FLASH的存算阵列。
如图2所示,所述基于FLASH的存算阵列包括:多个FLASH单元、多条字线、多条源线、多条位线、多个减法器。
多个FLASH单元组成的存算阵列,其中,每一列FLASH单元的栅极连接相同的字线(WL),源极连接相同的源极线,每一行FLASH单元的漏极连接相同的位线(BL)。
字线的数量对应于存算阵列的列数,输入数据通过字线输入FLASH单元。
源线的数量对应于存算阵列的列数,源线均接固定的驱动电压V ds,向FLASH单元的源极施加该驱动电压。
位线的数量对应于存算阵列的行数,用于输出FLASH单元漏极的信号,每一行位线叠加该行各列FLASH单元的漏极信号,并将叠加的漏极信号作为输出信号输出。即每一行的FLASH单元的漏极都连接于同一根位线,位线上的总电流值即这一行各列FLASH单元输出值的和。
FLASH单元的阈值电压可以通过编程和擦除进行设置。当对FLASH单元编程时,热电子注入,FLASH单元的阈值电压升高,其存储状态视为“0”,即该FLASH单元存储有数据“0”。当对FLASH单元擦除时,电子隧穿,FLASH单元的阈值电压降低,其存储状态视为“1”,即该FLASH单元存储有数据“1”。由此可见,通过对FLASH单元的编程和擦除,可使FLASH单元存储有“0”和“1”两种数据,通过将卷积神经网络的权值矩阵中的权值转换为二进制数,并用存储状态为“0”的FLASH单元表示二进制权值中的“0”,用存储状态为“1”的FLASH单元表示二进制权值中的“1”,从而多个FLASH单元组成的存算阵列即可表示出权值矩阵。
本实施例的基于FLASH的存算阵列,FLASH单元的源线均接固定的驱动电压V ds。输入数据转换为二进制数,并通过字线输入FLASH单元。对于输入数据中的“0”,0电压通过字线施加于FLASH单元的栅极,漏极输出电流即为输入数据“0”与该FLASH单元存储数据(“0”或“1”)的乘积;对于输入数据中的“1”,V g通过字线施加于FLASH单元的栅极,即为输入数据“1”与FLASH单元存储数据的乘积。将多个FLASH单元的漏极连接在一起输出,“和电流”反映了输入向量和FLASH阵列中所存矩阵相乘后的结果,实现矩阵向量乘法运算。
每一行位线叠加该行各列FLASH单元的漏极信号,并将叠加的漏极信号“和电流”作为输出信号输出,即位线上的总电流值即这一行各列FLASH单元输出信号的和,反映了输入向量和FLASH存算阵列中所存权值矩阵相乘后的结果。
减法器的数量对应于存算阵列行数的一半,每个减法器的正极端和负 极端分别连接相邻两条位线。考虑到FLASH单元不能存储负的权重值,因此将每相邻的两条位线连接于一个减法器上,其正极端连接的位线上的FLASH单元存储正权重值,负极端连接的位线上的FLASH单元存储负权重值,由此实现矩阵向量乘法运算。
编码卷积神经网络的每一层还包括激活单元,减法器的输出端连接激活单元,激活单元对输出信号进行激活操作,激活结果作为该层的输出数据输送给下一层。
基于FLASH存算阵列的解码卷积神经网络与上述编码卷积神经网络的结构相同,在此不再赘述。
信号产生模块具有两方面功能,其一,根据控制模块的输出信号对FLASH存算阵列进行编程,将训练得到的权值依次写入相应的FLASH单元;其二,图像压缩编码和解码阶段,分别将输入图像和经量化后的图像转变为电压信号并添加至FLASH阵列字线上。
即信号产生模块将卷积神经网络的各层权重矩阵中的权值转换为二进制数,并根据二进制权值对相应的FLASH单元编程或擦除,以将权值矩阵存储至FLASH存算阵列中。同时,还将输入图像和经量化后的图像转换为二进制信号,将二进制信号输送给编码卷积神经网络和解码卷积神经网络的输入层。
处理器包括有量化模块,量化模块采用JPEG和JPEG2000等标准,对编码卷积神经网络的输出数据进行量化。
本实施例的基于FLASH存算阵列的图像压缩系统,基于FLASH存算阵列的编码卷积神经网络对原始图像进行编码,得到特征图像,量化模块对特征图像进行量化,得到量化图像,基于FLASH存算阵列的解码卷积神经网络对量化图像进行解码,得到压缩图像。本实施例的这一硬件实施方案将权值存储在FLASH存算阵列中,并利用存算阵列进行计算,消除了计算过程中对权值的随机访问,从而实现了存算一体化。
本实施例中,在实现图像压缩前,需在软件端构建编码卷积神经网络和解码卷积神经网络模型,根据图像压缩对速度、精度和能耗等要求确定网络模型的层数、维度、通道数、卷积核尺寸等参数。对所构建的编码卷积神经网络模型和解码卷积神经网络模型、以及量化模块协同训练,得到 满足图像压缩需求的编码卷积神经网络和解码卷积神经网络。
本公开第二实施例提供了一种基于FLASH存算阵列的图像压缩方法,如图3所示,包括以下步骤:
分别将编码卷积神经网络和解码卷积神经网络的权重矩阵写入基于FLASH的存算阵列;输入原始图像;
利用基于FLASH存算阵列的编码卷积神经网络对原始图像进行编码,得到特征图像;
利用量化模块对特征图像进行量化,得到量化图像;
利用基于FLASH存算阵列的解码卷积神经网络对量化图像进行解码,得到压缩图像。
在进行图像压缩之前,还包括对编码卷积神经网络和解码卷积神经网络的训练步骤:
首先进行网络初始化,构建编码卷积神经网络模型和解码卷积神经网络模型。
然后利用训练数据对编码卷积神经网络模型和解码卷积神经网络模型进行前向传播,并计算网络误差。
接着对编码卷积神经网络模型和解码卷积神经网络模型进行反向传播,对编码卷积神经网络模型和解码卷积神经网络模型的权值进行更新。
最后判断训练是否完成。当训练后的模型达到图像压缩需求,认为训练完成,结束训练步骤,如果训练后的模型尚未达到图像压缩需求,则返回前向传播的步骤,继续进行训练。
以上的详细描述通过使用示意图、流程图和/或示例,已经阐述了上述空净一体机的众多实施例。在这种示意图、流程图和/或示例包含一个或多个功能和/或操作的情况下,本领域技术人员应理解,这种示意图、流程图或示例中的每一功能和/或操作可以通过各种结构、硬件、软件、固件或实质上它们的任意组合来单独和/或共同实现。
除非存在技术障碍或矛盾,本公开的上述各种实施例可以自由组合以形成另外的实施例,这些另外的实施例均在本公开的保护范围中。
虽然结合附图对本公开进行了说明,但是附图中公开的实施例旨在对本公开优选实施方式进行示例性说明,而不能理解为对本公开的一种限制。 附图中的尺寸比例仅仅是示意性的,并不能理解为对本公开的限制。
虽然本公开总体构思的一些实施例已被显示和说明,本领域普通技术人员将理解,在不背离本公开公开构思的原则和精神的情况下,可对这些实施例做出改变,本公开的范围以权利要求和它们的等同物限定。

Claims (10)

  1. 一种基于FLASH存算阵列的图像压缩系统,其特征在于,包括:基于FLASH存算阵列的编码卷积神经网络、基于FLASH存算阵列的解码卷积神经网络和量化模块;
    所述基于FLASH存算阵列的编码卷积神经网络用于对原始图像进行编码,得到特征图像;
    所述量化模块用于对所述特征图像进行量化,得到量化图像;
    所述基于FLASH存算阵列的解码卷积神经网络用于对所述量化图像进行解码,得到压缩图像。
  2. 如权利要求1所述的基于FLASH存算阵列的图像压缩系统,其特征在于,所述编码卷积神经网络和所述解码卷积神经网络的每一层均包括:基于FLASH的存算阵列;所述基于FLASH的存算阵列包括:多个FLASH单元、多条字线、多条源线、多条位线、多个减法器;
    多个FLASH单元组成的存算阵列,每一列FLASH单元的栅极连接相同的字线,源极连接相同的源线,每一行FLASH单元的漏极连接相同的位线;每个减法器的正极端和负极端分别连接相邻两条位线。
  3. 如权利要求2所述的基于FLASH存算阵列的图像压缩系统,其特征在于,
    所述字线的数量对应于所述存算阵列的列数,输入数据通过字线输入FLASH单元;
    所述源线的数量对应于所述存算阵列的列数,所述源线均接固定的驱动电压;
    所述位线的数量对应于所述存算阵列的行数,每一行位线叠加该行各列FLASH单元的漏极信号,并将叠加的漏极信号作为输出信号输出。
  4. 如权利要求2所述的基于FLASH存算阵列的图像压缩系统,其特征在于,所述FLASH单元存储有卷积神经网络的权重值,所述基于FLASH的存算阵列存储卷积神经网络的权重矩阵。
  5. 如权利要求4所述的基于FLASH存算阵列的图像压缩系统,其特征在于,对所述FLASH单元编程,所述FLASH单元的存储状态视为“0”;对所述FLASH单元擦除,所述FLASH单元的存储状态视为“1”。
  6. 如权利要求2所述的基于FLASH存算阵列的图像压缩系统,其特征在于,所述减法器正极端连接的位线上的FLASH单元存储正权重值,其负极端连接的位线上的FLASH单元存储负权重值。
  7. 如权利要求2所述的基于FLASH存算阵列的图像压缩系统,其特征在于,所述编码卷积神经网络和所述解码卷积神经网络的每一层还包括:激活单元;所述减法器的输出端连接激活单元,所述激活单元对输出信号进行激活操作,激活结果作为输出数据输送给下一层。
  8. 如权利要求2所述的基于FLASH存算阵列的图像压缩系统,其特征在于,所述量化模块为中央处理器或微处理器,采用JPEG或JPEG2000标准,对所述特征图像进行量化。
  9. 一种基于FLASH存算阵列的图像压缩方法,其特征在于,包括:
    分别将编码卷积神经网络和解码卷积神经网络的权重矩阵写入基于FLASH的存算阵列,并输入原始图像;
    利用基于FLASH存算阵列的编码卷积神经网络对原始图像进行编码,得到特征图像;
    利用量化模块对特征图像进行量化,得到量化图像;
    利用基于FLASH存算阵列的解码卷积神经网络对量化图像进行解码,得到压缩图像。
  10. 如权利要求9所述的基于FLASH存算阵列的图像压缩方法,其特征在于,还包括:
    进行网络初始化,构建编码卷积神经网络模型和解码卷积神经网络模型;
    利用训练数据对编码卷积神经网络模型和解码卷积神经网络模型进行前向传播,并计算网络误差;
    对编码卷积神经网络模型和解码卷积神经网络模型进行反向传播,对编码卷积神经网络模型和解码卷积神经网络模型的权值进行更新;
    当训练后的模型达到图像压缩需求,训练完成,结束训练步骤;如果训练后的模型尚未达到图像压缩需求,则返回前向传播的步骤,继续进行训练。
PCT/CN2019/130472 2019-08-12 2019-12-31 基于flash存算阵列的图像压缩系统和方法 WO2021027238A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/634,442 US20220321900A1 (en) 2019-08-12 2019-12-31 System and method for compressing image based on flash in-memory computing array

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910738965.3 2019-08-12
CN201910738965.3A CN110475119A (zh) 2019-08-12 2019-08-12 基于flash存算阵列的图像压缩系统和方法

Publications (1)

Publication Number Publication Date
WO2021027238A1 true WO2021027238A1 (zh) 2021-02-18

Family

ID=68510503

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/130472 WO2021027238A1 (zh) 2019-08-12 2019-12-31 基于flash存算阵列的图像压缩系统和方法

Country Status (3)

Country Link
US (1) US20220321900A1 (zh)
CN (1) CN110475119A (zh)
WO (1) WO2021027238A1 (zh)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110475119A (zh) * 2019-08-12 2019-11-19 北京大学 基于flash存算阵列的图像压缩系统和方法
CN110991608B (zh) * 2019-11-25 2021-08-13 恒烁半导体(合肥)股份有限公司 一种卷积神经网络量化计算方法及系统
CN110990060B (zh) * 2019-12-06 2022-03-22 北京瀚诺半导体科技有限公司 一种存算一体芯片的嵌入式处理器、指令集及数据处理方法
CN112992232B (zh) * 2021-04-28 2021-08-17 中科院微电子研究所南京智能技术研究院 一种多位正负单比特存内计算单元、阵列及装置

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201706917A (zh) * 2015-05-21 2017-02-16 咕果公司 用於類神經網路計算的旋轉資料
CN106843809A (zh) * 2017-01-25 2017-06-13 北京大学 一种基于nor flash阵列的卷积运算方法
CN106971372A (zh) * 2017-02-24 2017-07-21 北京大学 一种实现图像卷积的编码型闪存系统和方法
CN108701236A (zh) * 2016-01-29 2018-10-23 快图有限公司 卷积神经网络
CN108805270A (zh) * 2018-05-08 2018-11-13 华中科技大学 一种基于存储器的卷积神经网络系统
CN110062231A (zh) * 2019-05-05 2019-07-26 济南浪潮高新科技投资发展有限公司 基于多层卷积神经网络的图像压缩方法
CN110475119A (zh) * 2019-08-12 2019-11-19 北京大学 基于flash存算阵列的图像压缩系统和方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9916531B1 (en) * 2017-06-22 2018-03-13 Intel Corporation Accumulator constrained quantization of convolutional neural networks
US10559093B2 (en) * 2018-01-13 2020-02-11 Arm Limited Selecting encoding options
US10692570B2 (en) * 2018-07-11 2020-06-23 Sandisk Technologies Llc Neural network matrix multiplication in memory cells
US11074318B2 (en) * 2018-12-14 2021-07-27 Western Digital Technologies, Inc. Hardware accelerated discretized neural network

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201706917A (zh) * 2015-05-21 2017-02-16 咕果公司 用於類神經網路計算的旋轉資料
CN108701236A (zh) * 2016-01-29 2018-10-23 快图有限公司 卷积神经网络
CN106843809A (zh) * 2017-01-25 2017-06-13 北京大学 一种基于nor flash阵列的卷积运算方法
CN106971372A (zh) * 2017-02-24 2017-07-21 北京大学 一种实现图像卷积的编码型闪存系统和方法
CN108805270A (zh) * 2018-05-08 2018-11-13 华中科技大学 一种基于存储器的卷积神经网络系统
CN110062231A (zh) * 2019-05-05 2019-07-26 济南浪潮高新科技投资发展有限公司 基于多层卷积神经网络的图像压缩方法
CN110475119A (zh) * 2019-08-12 2019-11-19 北京大学 基于flash存算阵列的图像压缩系统和方法

Also Published As

Publication number Publication date
CN110475119A (zh) 2019-11-19
US20220321900A1 (en) 2022-10-06

Similar Documents

Publication Publication Date Title
WO2021027238A1 (zh) 基于flash存算阵列的图像压缩系统和方法
US10834415B2 (en) Devices for compression/decompression, system, chip, and electronic device
US20200160565A1 (en) Methods And Apparatuses For Learned Image Compression
CN107046646B (zh) 基于深度自动编码器的视频编解码装置及方法
CN109379598B (zh) 一种基于fpga实现的图像无损压缩方法
CN102138282B (zh) 减小复杂性的ldpc解码器
CN103957015B (zh) 用于ldpc码解码的非均匀量化编码方法及其在解码器的应用
WO2020139976A1 (en) Neural networks and systems for decoding encoded data
CN103546161A (zh) 基于二进制位处理的无损压缩方法
CN112235583A (zh) 基于小波变换的图像编解码方法及装置
CN103929642A (zh) 一种hevc变换系数的熵编码上下文模型偏移值快速计算方法
WO2023207836A1 (zh) 一种图像编码方法、图像解压方法以及装置
CN115664899A (zh) 一种基于图神经网络的信道解码方法及系统
US9858994B2 (en) Memory system with MLC memory cells and partial page compression or reduction
CN104158549A (zh) 一种极性码译码方法及译码装置
CN118056355A (zh) 用于使用神经网络估计经编码数据的位错误率(ber)的系统
Deng et al. Reduced-complexity deep neural network-aided channel code decoder: A case study for BCH decoder
Yuan et al. A sot-mram-based processing-in-memory engine for highly compressed dnn implementation
US10559093B2 (en) Selecting encoding options
Matsuda et al. Lossless coding using predictors and arithmetic code optimized for each image
US20210027168A1 (en) Electronic apparatus and controlling method thereof
CN102164023A (zh) 自适应动态量化ldpc码译码方法
US20230298603A1 (en) Method for encoding and decoding audio signal using normalizing flow, and training method thereof
WO2023040745A1 (zh) 特征图编解码方法和装置
CN108449092A (zh) 一种基于循环压缩的Turbo码译码方法及其装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19941322

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19941322

Country of ref document: EP

Kind code of ref document: A1