CN111696025B - Image processing device and method based on reconfigurable memory computing technology - Google Patents

Image processing device and method based on reconfigurable memory computing technology Download PDF

Info

Publication number
CN111696025B
CN111696025B CN202010526627.6A CN202010526627A CN111696025B CN 111696025 B CN111696025 B CN 111696025B CN 202010526627 A CN202010526627 A CN 202010526627A CN 111696025 B CN111696025 B CN 111696025B
Authority
CN
China
Prior art keywords
pim
unit
reram
image data
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010526627.6A
Other languages
Chinese (zh)
Other versions
CN111696025A (en
Inventor
刘锦辉
赵晨
杜方舟
刘续文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202010526627.6A priority Critical patent/CN111696025B/en
Publication of CN111696025A publication Critical patent/CN111696025A/en
Application granted granted Critical
Publication of CN111696025B publication Critical patent/CN111696025B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an image processing device and method based on reconfigurable memory computing technology, the device of the invention uses a novel ReRAM device and PIM technology to integrate data processing and storage in the same device, and introduces reconfigurable operation technology. The method comprises the steps of designing a memory to calculate a PIM instruction set, compiling a PIM program, compiling a program code by using a high-level integrated HLS tool, putting the program code at a corresponding position, and processing two different image processing tasks in two ways. The invention effectively solves the problem of 'storage wall', and the device has the advantages of large on-chip bus bandwidth, high transmission speed, low power consumption and high integration level. The method of the invention can improve the parallelism of image processing and the flexibility of the system on the premise of not reducing the image resolution, thereby achieving the purpose of rapid image processing.

Description

Image processing device and method based on reconfigurable memory computing technology
Technical Field
The invention belongs to the technical field of physics, and further relates to an image processing device and method based on a reconfigurable memory computing technology in the technical field of image data processing. The invention can be used for image processing to collect image information, preprocess, extract characteristic value and output result.
Background
At present, the image processing is widely applied in various fields, such as remote sensing images, radar signals, scene recognition, target tracking and the like. However, as the image processing algorithm is more and more complex, the requirement on the computing power of the platform is continuously increased, and many application scenes have higher requirements on the real-time performance. An image processing acceleration platform based on a CPU + GPU, a CPU + FPGA or a DSP framework is a common solution at present. The existing image processing algorithm is transplanted to a GPU, an FPGA or a DSP platform, and the great parallelism and the high-performance arithmetic operation capability of the processor are utilized, so that the method has great advantages compared with the prior CPU processing. However, with the improvement of image resolution, the above architecture is generally based on a von neumann architecture, so data is stored in a Memory, and operations are performed in a processor, and frequent reading of data is required, which causes great power consumption waste, and due to the increasing difference between Memory speed and processor, a serious "storage wall" problem is caused, which has severely restricted the performance of the image processing platform, and Memory computing (Process in Memory, PIM) is considered as a novel computing architecture for effectively solving the bottleneck of the "storage wall". PIM tightly couples the computing unit and the storage unit, thereby eliminating the limitation of access bandwidth bottleneck and the overhead generated by biased data migration of the computing unit and the storage unit, and having extremely high storage density and bandwidth. However, conventional PIM is less flexible, has a high design complexity and manufacturing cost, and past PIM can only accomplish specific processing functions, and the lack of programmability and flexible memory interface in these architectures makes it challenging to adapt to future market/technology-induced workload changes, as emerging data-intensive applications may become involved over time. Therefore, PIM is not widely used.
Resistive random access memory (ReRAM) is a type of random access memory that implements non-volatile storage by changing the resistance of a cell. The structure is very simple, the metal oxide is wrapped and clamped between the electrodes on the two sides, the manufacturing process is simplified, and the metal oxide wrapping and clamping device has a great number of excellent characteristics such as no loss of data in power failure, low power consumption, high speed and the like, so that the metal oxide wrapping and clamping device has important significance in the fields of data storage and logic calculation.
An image processing system, method and apparatus are disclosed in patent document "an image processing system, method and apparatus" filed by the company zui technologies and incorporated by reference "(filed date: 2014.02.25, filed number 201410064914.4, published number CN 104869381B). The method directly stores image data acquired by an image sensor in a memory through sampling, and performs image processing through an ISP module. The method has the advantages that when the resolution of the picture is low, a plurality of pictures can be read at one time for processing, and the processing of images with various resolutions can be well completed through sampling. However, the method still has the disadvantage that when the line resolution in the image resolution of the image to be processed is greater than the preset maximum line resolution, the line resolution is determined by adopting the ratio of 1: the video image is sampled according to the proportion of N, wherein N is the quotient of the line resolution in the image resolution of the image to be processed and the preset maximum line resolution, the processing process uses an image compression method, the resolution of the image is reduced, and the increase of the error of the image processing result is inevitably caused. The device comprises an image sensor, a sampling module, an image signal processor ISP module and a memory. The image sensor is connected with the sampling module and the memory and used for sending the acquired video image to the sampling module and directly sending the acquired static image to the memory; the sampling module is connected with the ISP module and used for sampling the received video image and sending the sampled video image to the ISP module; the ISP module is connected with the memory and is used for receiving the video image, processing the video image and sending the processed video image to the memory; and the processing module is further configured to read a static image currently stored in the memory according to the maximum line resolution processed by the ISP module, perform image processing on the static image, and send the processed static image to the memory. The device has the advantages that various images with the resolution lower than the maximum line resolution of the ISP module can be flexibly processed, and the problem of poor size compatibility of the images processed by the ISP module is solved. However, the device still has the disadvantages that the image processing function of the device is single due to the fact that the quantity of image data transmitted and processed among the sensor, the sampling module, the ISP module and the memory is large, the problem of 'storage wall' is faced, and the ISP module belongs to fixed logic, is poor in flexibility and cannot be configured functionally.
South china's intelligent science and technology limited company has disclosed an extensible intelligent image processing acceleration apparatus and acceleration method in its applied patent document (application date: 2017.12.18, application number 201711360637.1, publication number CN 108881709A). The accelerating device comprises a lens, an intelligent image acquisition unit, an intelligent image processing driving unit and a network communication unit. The device has the advantages that the advantages of a heterogeneous system are utilized, the parallelism of hardware such as an FPGA (field programmable gate array) and a DSP (digital signal processor) is utilized to improve the processing performance, and the power consumption is reduced. However, the device still has the defects that the FPGA, the DSP and the like in the image processing module are connected by using an off-chip bus, and the performance is influenced to a certain extent due to the bandwidth limitation problem of the off-chip bus. The method comprises the following steps: generating specific configuration information according to the original data, the image processing task to be executed and the adopted image processing algorithm; loading an image processing algorithm, and setting an accelerated processing process according to specific configuration information; inputting the original data and executing the acceleration processing of the intelligent image; and outputting the result after the acceleration processing. The method has the advantage that the bandwidth requirement and the time delay of memory access in the operation process are reduced by compressing the image data. However, the method still has the defects of large difference of programming languages used in the FPGA and the DSP, complex programming and large workload.
Disclosure of Invention
The present invention is directed to provide an image processing apparatus and method based on reconfigurable memory computing technology, which solve the problem of "memory wall" in the image processing process of the current image processing apparatus and the problems of low parallelism and poor flexibility in data processing.
The idea for realizing the invention is as follows: when the image processing algorithm is realized in actual engineering, a large amount of bandwidth and time are occupied by data transmission, the inherent structure of an image processing device causes low parallelism and poor flexibility, the device and the method of the invention use a novel ReRAM device and a PIM technology to integrate data processing and storage in the same device, and introduce a reconfigurable operation technology, thereby effectively solving the problem of a storage wall, improving the parallelism of image processing and the flexibility of a system, and further achieving the purpose of rapid image processing.
The image processing device comprises an image data acquisition module and an image data processing module, wherein the image data processing module comprises a resistance random access memory-memory calculation ReRAM-PIM unit, a reconfigurable operation unit, a memory calculation PIM control unit, a memory calculation PIM high-speed interface unit and a resistance random access memory-memory calculation ReRAM-PIM high-speed interface; the device comprises a ReRAM-PIM unit, a reconfigurable operation unit, a PIM control unit, an PIM high-speed interface unit, an image data acquisition module, an image data acquisition unit, a ReRAM-PIM high-speed interface and a ReRAM-PIM high-speed interface, wherein the ReRAM-PIM unit is connected with the reconfigurable operation unit through an on-chip high-speed bus; the ReRAM-PIM unit consists of 4G resistance random access memories ReRAM, and the reconfigurable operation unit consists of 125K configurable logic blocks CLB and 400 digital signal processors DSP; wherein:
the image data acquisition unit is used for acquiring natural images;
the PIM high-speed interface unit is based on a PCI-E4.0 protocol, maximally supports 16 channels, is compatible with the PCI-E4.0 and PCI-E3.0 protocols of 16, 8, 4 and 1 channels, and provides a USB3.1gen2 interface for transmitting image data of the image data acquisition unit and the PIM control unit;
the PIM control unit comprises 64 vector registers with 512 bits and is used for receiving the natural images sent by the image data acquisition unit and sending the images to the reconfigurable operation unit through the on-chip high-speed bus according to the executable file compiled and generated in the PIM control unit so as to complete the corresponding image processing task;
the reconfigurable operation unit consists of 125K configurable logic blocks CLB and 400 digital signal processors DSP and is used for receiving image sending data sent by PIM control or preprocessed image data sent by a ReRAM-PIM unit through an on-chip high-speed bus and carrying out corresponding processing work on the image data;
the ReRAM-PIM high-speed interface is used for transmitting image data of the image data acquisition unit and the ReRAM-PIM unit;
the ReRAM-PIM unit comprises 64K resistive random access memories ReRAM in an 8 x 8 cross interconnection structure and 16K routing units, each routing unit has a unique 16-bit address, each routing unit manages 4 resistive random access memories ReRAM in the 8 x 8 cross interconnection structure, and the ReRAM-PIM unit is used for operating routing by accessing corresponding routing addresses and configuring the corresponding ReRAM-PIM unit into two forms of a storage area and an operation area.
The image processing method of the present invention comprises the steps of:
step 1, designing a memory to calculate a PIM instruction set:
setting a PIM instruction set to a fixed-length instruction set, wherein the length of each instruction is set to 32 bits, setting bits 0 to 6 of the instruction to an instruction opcode field, setting bits 7 to 12 to a first source operand register address field, setting bits 13 to 18 to a second source operand register address field, setting bits 19 to 24 to a destination register address field, and setting bits 25 to 31 to a reserved field;
step 2, compiling the PIM program:
performing lexical analysis processing on the source program code according to a preset lexical analysis rule to generate a lexical analysis result;
performing syntactic and semantic analysis on the lexical analysis result according to a preset syntactic and semantic analysis rule to generate a syntactic and semantic analysis result;
according to the PIM instruction set designed in the step 1, carrying out target code analysis on a syntax analysis result to generate a binary executable file;
and 3, writing program codes by using a high-level integrated HLS tool:
analyzing an image processing task, compiling corresponding C or C + + codes by using a high-level comprehensive HLS tool, realizing an operation function needing acceleration, calling an interconnection protocol IP between networks of a ReRAM-PIM unit, configuring the IP, and dividing the ReRAM-PIM unit into a storage area and an operation area with proper sizes according to storage and operation requirements;
automatically generating hardware information and a bit stream file by utilizing the high-level integrated HLS, wherein the hardware information comprises an interface of a reconfigurable operation unit and a configuration file of a ReRAM-PIM unit;
generating a binary executable file for the image processing task according to the step 2;
and 4, putting the program code at a corresponding position:
burning the bit stream file generated in the step 3 into a reconfigurable operation unit, configuring a reconfigurable operation area function, burning an executable file into a PIM control unit, and configuring a PIM control unit function;
initializing a ReRAM-PIM unit through a PIM control unit, configuring a corresponding routing address for a storage area, uniformly addressing ReRAM blocks, and writing an initial value into the ReRAM for an operation area;
and 5, processing two different image processing tasks in two ways:
the image data acquisition module acquires natural images at the acquisition speed of 30 frames/second, wherein each image comprises resolution, bit depth, image size and RGB information of each pixel point, the image size is equal to the product of the resolution and the bit depth, the resolution is at least 1 multiplied by 1, and the bit depth is at least 8 bits;
if each frame of image is processed while the image is collected, the image data collection module transmits image data to the PIM high-speed interface through the off-chip high-speed interface, the PIM controller caches the image data in a cache in the controller, and the PIM control unit sends the image data to the corresponding reconfigurable operation area and the ReRAM-PIM operation area in real time according to the functions configured in the step 4 to complete the corresponding image processing task, and finally returns the processing result to the ReRAM-PIM unit;
if all the images after the collection are processed in the same way, the image data acquisition module transmits the image data to a ReRAM-PIM high-speed interface through an off-chip high-speed bus, stores the image data in a storage area in a ReRAM-PIM unit, when the stored image data meets the data volume required by an operation area, a PIM control unit transmits the image data to a corresponding reconfigurable operation area and the ReRAM-PIM operation area through an on-chip high-speed bus by controlling a ReRAM-PIM route, so as to complete a corresponding image processing task, and finally returns the processing result to the storage area in the ReRAM-PIM unit.
Compared with the prior art, the invention has the following advantages:
firstly, because the ReRAM-PIM unit is connected with the reconfigurable operation unit through the on-chip high-speed bus, the reconfigurable operation unit is connected with the PIM control unit through the on-chip high-speed bus, the PIM control unit is connected with the ReRAM-PIM unit through the on-chip high-speed bus, and the PIM high-speed interface unit is connected with the PIM control unit through the on-chip high-speed bus, the problem that modules such as FPGA, DSP and the like in an image processing module are connected through off-chip buses in the prior art and certain influence is caused to the performance due to the bandwidth limitation of the off-chip buses is solved, so that the on-chip bus bandwidth of the device is large, the transmission speed is high, the power consumption is low, and the integration level is high.
Secondly, the device of the invention adopts a reconfigurable operation unit, receives real-time image data sent by a PIM controller or static image data which is sent by a ReRAM-PIM unit and completes preprocessing through an on-chip high-speed bus, and carries out corresponding processing work on the image data. The problem that an ISP module belongs to fixed logic, is poor in flexibility and cannot be configured in function in the prior art is solved. The device of the invention can be flexibly configured according to the image processing requirement, can effectively improve the performance and reduce the power consumption.
Thirdly, the method of the invention calculates PIM instruction set according to the design memory, and respectively adopts two different modes to process tasks for two different images, thereby overcoming the problem that the image compression method in the prior art is used, the resolution ratio of the image is reduced, and the error of the image processing result is inevitably increased, so that the method of the invention can rapidly obtain the image processing result on the premise of not reducing the image resolution ratio.
Fourthly, the method of the invention utilizes the high-level integrated HLS tool to compile program codes, and overcomes the problems of larger difference of programming languages, complex programming and large workload in the FPGA and the DSP in the prior art, so that the method of the invention effectively reduces the programming complexity and improves the program development speed.
Drawings
FIG. 1 is an electrical schematic of the apparatus of the present invention;
FIGS. 2 (a), (b), and (c) are logic signals, truth tables, and expressions of the RRAM-based memory computing unit of the present invention, respectively;
FIG. 3 is a flow chart of the method of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the attached drawing figures.
The apparatus of the present invention is described in further detail with reference to fig. 1.
The image processing device comprises an image data acquisition module and an image data processing module.
The image data processing module comprises a resistance random access memory-memory calculation ReRAM-PIM unit, a reconfigurable operation unit, a memory calculation PIM control unit, a memory calculation PIM high-speed interface unit and a resistance random access memory-memory calculation ReRAM-PIM high-speed interface.
The device comprises a ReRAM-PIM unit, a reconfigurable operation unit, a PIM control unit, an PIM high-speed interface unit, an image data acquisition unit, a ReRAM-PIM high-speed interface and a ReRAM-PIM high-speed interface, wherein the ReRAM-PIM unit is connected with the reconfigurable operation unit through an on-chip high-speed bus, the PIM control unit is connected with the PIM control unit through an on-chip high-speed bus, the PIM control unit is connected with the ReRAM-PIM unit through an on-chip high-speed bus, the PIM high-speed interface unit is connected with the PIM control unit through an on-chip high-speed bus, the PIM high-speed interface unit is connected with the image data acquisition unit through an off-chip high-speed bus, and the ReRAM-PIM high-speed interface is connected with the ReRAM-PIM unit through an on-chip bus. The ReRAM-PIM unit consists of 4G resistance random access memories ReRAM, and the reconfigurable operation unit consists of 125K configurable logic blocks CLB and 400 digital signal processors DSP; wherein:
the image data acquisition unit is used for acquiring natural images;
the PIM high-speed interface unit is used for connecting the image data acquisition unit and the PIM control unit, the high-speed interface is based on a PCI-E4.0 protocol, supports 16 channels to the maximum extent, can provide communication bandwidth of about 31GB/s to the maximum extent for the external processor and the PIM control unit, and is compatible with the PCI-E4.0 and PCI-E3.0 protocols of 16 channels, 8 channels, 4 channels and 1 channel respectively in order to consider compatibility to the maximum extent, and the high-speed interface provides a USB3.1gen2 interface and can provide theoretical transmission speed of 10GB/s to the maximum extent.
The PIM control unit comprises 64 512-bit vector registers, when the data address in the instruction is 8 bits, the maximum 64 data addresses can be assigned at one time, when the data address in the instruction is 16 bits, the maximum 32 data addresses can be assigned at one time, when the data address in the instruction is 32 bits, the maximum 16 data addresses can be assigned at one time, when the data address in the instruction is 64 bits, the maximum 8 data addresses can be assigned at one time, so that one instruction can complete a plurality of data operations and is used for receiving the natural image sent by the image data acquisition unit, and the image is sent to the reconfigurable operation unit through the on-chip high-speed bus according to the executable file generated by compiling in the PIM control unit to complete the corresponding image processing task.
The reconfigurable operation unit consists of 125K configurable logic blocks CLB and 400 digital signal processors DSP and is used for receiving image sending data sent by PIM control or preprocessed image data sent by a ReRAM-PIM unit through an on-chip high-speed bus and carrying out corresponding processing work on the image data.
The ReRAM-PIM high-speed interface is based on NVMe of a nonvolatile memory host computer controller interface specification, the essence of the interface is realized based on PCI-E technology, the interface is used for completing image data transmission between an image data acquisition unit and the ReRAM-PIM, and the NVMe can make full use of low delay and parallelism of a PCI-E channel and can achieve the data transmission speed of 3.938 GB/S.
The ReRAM-PIM unit comprises 64K RRAM and 16K routing units in an 8 x 8 cross interconnection structure, each routing unit has a unique 16-bit address, each routing unit manages 4 RRAM and 16K routing units in an 8 x 8 cross interconnection structure, a PIM controller accesses the corresponding routing address to control and access a ReRAM-PIM array, a PIM control unit operates the routing to configure the corresponding ReRAM-PIM into a storage area and an operation area, the storage area can store static images by acquiring data of a ReRAM-PIM high-speed interface or acquire picture data of a reconfigurable operation unit through an on-chip bus and store processed real-time picture data, the operation area can complete basic logic operations such as AND, OR, NOT, XOR, shift and the like by writing high and low levels into memristors in the ReRAM-PIM, and the operation area can directly acquire the static image data of the storage area through the routing to complete simple preprocessing of the images.
Referring to fig. 2, the present invention mainly develops a memory reconfigurable architecture based on a resistance random access memory ReRAM crossbar network to replace a conventional memory structure, and can configure the memory reconfigurable architecture into an arithmetic unit or into a common memory as required, thereby truly realizing the concept of memory computation.
The resistive random access memory used in the present invention generally has three signals when performing logic calculations: (1) An access control signal (denoted as a) for performing access control on the memory cell; (2) a write signal (denoted as B); (3) The currently stored data (denoted C) i ). Based on these three signals, the next data (denoted C) is stored i+1 ) Can be expressed as
Figure BDA0002533884560000081
Fig. 2 (a), (b), and (c) show the corresponding logic signals, truth tables, and expressions. It can be seen that the write signal B can be viewed as a logic function select signalDetermining the function of the logic calculation when B is equal to '0', '1', or
Figure BDA0002533884560000082
When the next data C stored in the storage unit i+1 Respectively equal to +>
Figure BDA0002533884560000083
("AND" operation logic), A + C i ("OR" arithmetic logic) OR>
Figure BDA0002533884560000084
("exclusive or logic). The result of the final logical calculation (i.e., C) i+1 ) Directly stored in the memory unit. The logic calculation operation is consistent with the normal read-write operation of the memory.
The method of the present invention is further described below in conjunction with fig. 3.
The image processing method based on the reconfigurable memory computing technology designs a memory computing PIM instruction set, utilizes a high-level integrated HLS tool to compile program codes, and adopts two modes to process two different image processing tasks.
Step 1, designing a memory to calculate a PIM instruction set.
The PIM instruction set is set to a fixed-length instruction set, each instruction is set to 32 bits in length, the instruction is set to an instruction opcode field from 0 to 6 bits, a first source operand register address field from 7 to 12 bits, a second source operand register address field from 13 to 18 bits, a destination register address field from 19 to 24 bits, and a reserved field from 25 to 31 bits.
And 2, compiling the PIM program.
And performing lexical analysis processing on the source program code according to a preset lexical analysis rule to generate a lexical analysis result.
And carrying out syntactic and semantic analysis on the lexical analysis result according to a preset syntactic and semantic analysis rule to generate a syntactic and semantic analysis result.
And (3) according to the PIM instruction set designed in the step (1), carrying out target code analysis on the semantic analysis result to generate a binary executable file.
The source program code comprises at least one of the following source program code: ladder diagram language, functional flow diagram language, instruction sheet language, structured text language, and BASIC language.
The preset lexical analysis rule comprises at least one of the following: an identifier category rule, an annotation category rule, a constant category rule, an operator category rule, a delimiter category rule, or a keyword category rule.
The preset syntax semantic analysis rule is as follows: adopting a program organization unit model to carry out the syntactic semantic analysis of the program organization unit model on the syntactic semantic analysis result of the software model, and generating the syntactic semantic analysis result of the program organization unit model, wherein the definition of the program organization unit model comprises the following steps: definition of functions, functional blocks, and procedures.
And 3, writing program codes by using the high-level comprehensive HLS tool.
The analysis image processing task utilizes a high-level integrated HLS tool to write corresponding C or C + + codes, realizes an operation function needing acceleration, calls an interconnection protocol IP between networks of a ReRAM-PIM unit, configures the IP, and divides the ReRAM-PIM unit into a storage area and an operation area with proper sizes according to storage and operation requirements.
And automatically generating hardware information and a bit stream file by utilizing the high-level integrated HLS, wherein the hardware information comprises an interface of the reconfigurable operation unit and a configuration file of the ReRAM-PIM unit.
And generating a binary executable file for the image processing task according to the step 2.
And 4, putting the program code at a corresponding position.
And (3) burning the bit stream file generated in the step (3) into a reconfigurable operation unit, configuring a reconfigurable operation area function, burning an executable file into a PIM control unit, and configuring the PIM control unit function.
Initializing the ReRAM-PIM unit through the PIM control unit, configuring a corresponding routing address for the storage area, uniformly addressing the ReRAM-PIM unit, and writing an initial value into the ReRAM-PIM unit for the operation area.
And 5, respectively processing tasks in two different modes for the two different images.
The image data acquisition module acquires natural images at an acquisition speed of 30 frames/second, wherein each image comprises resolution, bit depth, image size and RGB information of each pixel point, the image size is equal to the product of the resolution and the bit depth, the resolution is at least 1 multiplied by 1, and the bit depth is at least 8 bits.
If each frame of image is processed while the image is collected, the image data collection module transmits image data to the PIM high-speed interface through the off-chip high-speed interface, the PIM control unit caches the image data in a cache in the controller, and the PIM control unit sends the image data to the corresponding reconfigurable operation area and the ReRAM-PIM operation area in real time according to the functions configured in the step 4 to complete the corresponding image processing task, and finally returns the processing result to the ReRAM-PIM unit.
If all the images after the collection are processed in the same way, the image data acquisition module transmits the image data to a ReRAM-PIM high-speed interface through an off-chip high-speed bus, stores the image data in a storage area in a ReRAM-PIM unit, when the stored image data meets the data volume required by an operation area, a PIM control unit transmits the image data to a corresponding reconfigurable operation area and the ReRAM-PIM operation area through an on-chip high-speed bus by controlling a ReRAM-PIM route, so as to complete a corresponding image processing task, and finally returns the processing result to the storage area in the ReRAM-PIM unit.
The process of the present invention is further described below with reference to examples.
The embodiment of the invention uses VGG-16 as the handwritten digit recognition. The Visual Geometry Group VGG (Visual Geometry Group) Convolutional neural network used in the embodiment of the present invention is a model proposed by oxford university in 2014 (Very Deep computational Networks for Large Scale Image Recognition). The model shows very good results in image classification and target detection tasks, and the VGG-16 is the most popular model in the VGG model and is a model consisting of 16 layers including 13 convolutional layers and 3 full-connection layers. The input data dimension is 224 × 224 × 3. The handwritten digital image has a resolution of 224 x 224.
Step 1, training a network model.
The VGG-16 model is realized by a deep learning framework, namely, a context For Feature Extraction, and the maximum number of parameter quantization bits in model training is 64 bits or 32, 16, 8, 4 and 1 bits according to the accuracy requirement. And the model is trained by handwriting the digital image data set to achieve the required accuracy.
And 2, transplanting the network model.
Because the high-level comprehensive HLS (high-level synthesis) supports C + +, a network model trained on Caffe can be conveniently transplanted to an HLS platform, and a ReRAM-PIM IP is introduced, if the parameter quantization digit in the model is 64 or 32, the ReRAM-PIM unit is configured to be a 64-bit or 32-bit memory mode and is used as a memory of image data and parameter storage of the model; if the quantization bit number of the model parameter is 16, 8, 4 or 1 bit, the ReRAM-PIM unit is configured into a mixed mode, one part of the mixed mode completes data storage, and the other part of the mixed mode is used as an operation unit to complete addition and parallel multiplication of corresponding bit numbers. And generating a bit stream file of the reconfigurable operation area and a configuration file of the ReRAM-PIM unit through the HLS tool.
And 3, importing the PIM configuration file and compiling a controller code.
The PIM configuration file is imported into the PIM control unit, and the PIM control unit automatically generates a reconfigurable operation area, a high-speed communication protocol of a ReRAM-PIM unit, the routing configuration of the ReRAM-PIM unit and the initialization of a memristor in the ReRAM-PIM unit through configuration file information. If the parameter quantization bit number in the model is 64 or 32 bits, configuring the ReRAM-PIM unit as a memory capable of storing 64 or 32 bits of data through a router; and if the quantization bit number of the model parameter is 16, 8, 4 or 1 bit, configuring a part of ReRAM-PIM units into a memory capable of storing 16, 8, 4 or 1 bit data through the router, and writing an initial value into the memristor in the rest of ReRAM-PIM units serving as the operation units through the router to complete the configuration of the operation function. And compiling PIM control unit codes to complete the acquisition, storage and processing of image data. And compiling and generating binary files which can be executed by the PIM control unit.
And 4, programming the code.
Burning and writing the bit stream file generated in the step 3 into the reconfigurable operation area through a JTAG interface; and programming the binary file into the PIM control unit.
And 5, acquiring and processing the image.
Firstly, digital handwriting image data are collected through an image data collecting unit, and the data are stored into a ReRAM-PIM unit through a ReRAM-PIM high-speed interface. Meanwhile, if the quantization digit of the parameter in the model is 64 or 32 bits, the PIM control unit controls the reconfigurable operation area to read the image data stored in the corresponding address of the ReRAM-PIM unit through the on-chip high-speed bus according to a written program, and the reconfigurable operation area classifies the image data through the trained VGG-16 model to complete the identification of the handwritten digit and stores the identification result in the ReRAM-PIM unit; and if the quantization bit number of the model parameter is 16, 8, 4 or 1 bit, the PIM control unit reads the data in the data storage area into the operation area by controlling the ReRAM-PIM route, performs corresponding image classification calculation, and writes the data back into the storage area.

Claims (5)

1. An image processing device based on reconfigurable memory computing technology comprises an image data acquisition module and an image data processing module, and is characterized in that the image data processing module comprises a resistance random access memory-memory computing ReRAM-PIM unit, a reconfigurable operation unit, a memory computing PIM control unit, a memory computing PIM high-speed interface unit and a resistance random access memory-memory computing ReRAM-PIM high-speed interface; the system comprises a ReRAM-PIM unit, a reconfigurable operation unit, a PIM control unit, an PIM high-speed interface unit, an image data acquisition unit, a ReRAM-PIM high-speed interface and a ReRAM-PIM unit, wherein the ReRAM-PIM unit is connected with the reconfigurable operation unit through an on-chip high-speed bus; the ReRAM-PIM unit consists of 4G resistance random access memories ReRAM, and the reconfigurable operation unit consists of 125K configurable logic blocks CLB and 400 digital signal processors DSP; wherein:
the image data acquisition unit is used for acquiring natural images;
the PIM high-speed interface unit is based on a PCI-E4.0 protocol, maximally supports 16 channels, is compatible with the PCI-E4.0 and PCI-E3.0 protocols of 16, 8, 4 and 1 channels, and provides a USB3.1gen2 interface for transmitting image data of the image data acquisition unit and the PIM control unit;
the PIM control unit comprises 64 vector registers with 512 bits and is used for receiving the natural images sent by the image data acquisition unit and sending the images to the reconfigurable operation unit through the on-chip high-speed bus according to the executable file compiled and generated in the PIM control unit so as to complete the corresponding image processing task;
the reconfigurable operation unit consists of 125K configurable logic blocks CLB and 400 digital signal processors DSP and is used for receiving image sending data sent by PIM control or preprocessed image data sent by a ReRAM-PIM unit through an on-chip high-speed bus and carrying out corresponding processing work on the image data;
the ReRAM-PIM high-speed interface is used for transmitting image data of the image data acquisition unit and the ReRAM-PIM unit;
the ReRAM-PIM unit comprises 64K resistive random access memories ReRAM in an 8 x 8 cross interconnection structure and 16K routing units, each routing unit has a unique 16-bit address, each routing unit manages 4 resistive random access memories ReRAM in the 8 x 8 cross interconnection structure, and the ReRAM-PIM unit is used for operating routing by accessing corresponding routing addresses and configuring the corresponding ReRAM-PIM unit into two forms of a storage area and an operation area.
2. An image processing method based on reconfigurable memory computing technology is characterized in that a memory is designed to compute a PIM instruction set, program codes are written by utilizing a high-level integrated HLS tool, and two different image processing tasks are processed in two ways, wherein the method comprises the following steps:
step 1, designing a memory to calculate a PIM instruction set:
setting a PIM instruction set to a fixed-length instruction set, setting the length of each instruction to be 32 bits, setting 0 to 6 bits of the instruction to be an instruction operation code field, setting 7 to 12 bits to be a first source operand register address field, setting 13 to 18 bits to be a second source operand register address field, setting 19 to 24 bits to be a destination register address field, and setting 25 to 31 bits to be a reserved field;
step 2, compiling the PIM program:
performing lexical analysis processing on the source program code according to a preset lexical analysis rule to generate a lexical analysis result;
performing semantic syntax analysis on the lexical analysis result according to a preset semantic analysis rule to generate a semantic analysis result;
according to the PIM instruction set designed in the step 1, carrying out target code analysis on semantic grammar analysis results to generate a binary executable file;
and 3, writing program codes by using a high-level integrated HLS tool:
analyzing an image processing task, compiling corresponding C or C + + codes by using a high-level comprehensive HLS tool, realizing an operation function needing acceleration, calling an interconnection protocol IP between networks of a ReRAM-PIM unit, configuring the IP, and dividing the ReRAM-PIM unit into a storage area and an operation area with proper sizes according to storage and operation requirements;
automatically generating hardware information and a bit stream file by utilizing the high-level comprehensive HLS, wherein the hardware information comprises an interface of a reconfigurable operation unit and a configuration file of a ReRAM-PIM unit;
generating a binary executable file for the image processing task according to the step 2;
and 4, putting the program code at a corresponding position:
burning the bit stream file generated in the step 3 into a reconfigurable operation unit, configuring a reconfigurable operation area function, burning an executable file into a PIM control unit, and configuring a PIM control unit function;
initializing a ReRAM-PIM unit through a PIM control unit, configuring a corresponding routing address for a storage area, uniformly addressing ReRAM blocks, and writing an initial value into the ReRAM for an operation area;
and 5, respectively processing tasks of two different modes for two different images:
the image data acquisition module acquires natural images at the acquisition speed of 30 frames per second, wherein each image comprises resolution, bit depth, image size and RGB information of each pixel point, the image size is equal to the product of the resolution and the bit depth, the resolution is at least 1 multiplied by 1, and the bit depth is at least 8 bits;
if each frame of image is processed while the image is collected, the image data collection module transmits image data to the PIM high-speed interface through the off-chip high-speed interface, the PIM controller caches the image data in a cache in the controller, and the PIM control unit sends the image data to the corresponding reconfigurable operation area and the ReRAM-PIM operation area in real time according to the functions configured in the step 4 to complete the corresponding image processing task, and finally returns the processing result to the ReRAM-PIM unit;
if all the images after the collection are processed in the same way, the image data acquisition module transmits the image data to a ReRAM-PIM high-speed interface through an off-chip high-speed bus, stores the image data in a storage area in a ReRAM-PIM unit, when the stored image data meets the data volume required by an operation area, a PIM control unit transmits the image data to a corresponding reconfigurable operation area and the ReRAM-PIM operation area through an on-chip high-speed bus by controlling a ReRAM-PIM route, so as to complete a corresponding image processing task, and finally returns the processing result to the storage area in the ReRAM-PIM unit.
3. The image processing method based on reconfigurable memory computing technology of claim 2, wherein the source program code in step 2 includes at least one of the following source program codes: ladder diagram language, functional flow diagram language, instruction sheet language, structured text language, and BASIC language.
4. The image processing method based on the reconfigurable memory computing technology according to claim 2, wherein the preset lexical analysis rule in step 2 includes at least one of: an identifier category rule, an annotation category rule, a constant category rule, an operator category rule, a delimiter category rule, or a keyword category rule.
5. The image processing method based on the reconfigurable memory computing technology according to claim 2, wherein the preset syntax semantic analysis rule in step 2 is: adopting a program organization unit model to carry out syntactic semantic analysis of the program organization unit model on syntactic semantic analysis results of the software model, and generating syntactic semantic analysis results of the program organization unit model, wherein the definition of the program organization unit model comprises the following steps: definition of functions, functional blocks, and procedures.
CN202010526627.6A 2020-06-11 2020-06-11 Image processing device and method based on reconfigurable memory computing technology Active CN111696025B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010526627.6A CN111696025B (en) 2020-06-11 2020-06-11 Image processing device and method based on reconfigurable memory computing technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010526627.6A CN111696025B (en) 2020-06-11 2020-06-11 Image processing device and method based on reconfigurable memory computing technology

Publications (2)

Publication Number Publication Date
CN111696025A CN111696025A (en) 2020-09-22
CN111696025B true CN111696025B (en) 2023-03-24

Family

ID=72480322

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010526627.6A Active CN111696025B (en) 2020-06-11 2020-06-11 Image processing device and method based on reconfigurable memory computing technology

Country Status (1)

Country Link
CN (1) CN111696025B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112184536B (en) * 2020-09-24 2022-09-30 成都海光集成电路设计有限公司 Method, apparatus, device and medium for processing image data based on GEMM
CN114945916A (en) * 2020-10-27 2022-08-26 北京苹芯科技有限公司 Apparatus and method for matrix multiplication using in-memory processing
CN112700810B (en) * 2020-12-22 2023-06-30 电子科技大学 CMOS sense-memory integrated circuit structure integrating memristors
CN112990444B (en) * 2021-05-13 2021-09-24 电子科技大学 Hybrid neural network training method, system, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665049A (en) * 2012-03-29 2012-09-12 中国科学院半导体研究所 Programmable visual chip-based visual image processing system
CN104112053A (en) * 2014-07-29 2014-10-22 中国航天科工集团第三研究院第八三五七研究所 Design method of reconfigurable architecture platform oriented image processing
CN110418061A (en) * 2019-08-26 2019-11-05 Oppo广东移动通信有限公司 Image processing method, image processor, camera arrangement and electronic equipment
CN110751676A (en) * 2019-10-21 2020-02-04 中国科学院空间应用工程与技术中心 Heterogeneous computing system and method based on target detection and readable storage medium
CN111028134A (en) * 2019-11-29 2020-04-17 杭州依图医疗技术有限公司 Image processing method, apparatus, system and medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001069919A1 (en) * 2000-03-10 2001-09-20 Datacube, Inc. Image processing system using an array processor
US7073158B2 (en) * 2002-05-17 2006-07-04 Pixel Velocity, Inc. Automated system for designing and developing field programmable gate arrays
US10375196B2 (en) * 2016-07-29 2019-08-06 Microsoft Technology Licensing, Llc Image transformation in hybrid sourcing architecture
CN110058883B (en) * 2019-03-14 2023-06-16 梁磊 CNN acceleration method and system based on OPU

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102665049A (en) * 2012-03-29 2012-09-12 中国科学院半导体研究所 Programmable visual chip-based visual image processing system
CN104112053A (en) * 2014-07-29 2014-10-22 中国航天科工集团第三研究院第八三五七研究所 Design method of reconfigurable architecture platform oriented image processing
CN110418061A (en) * 2019-08-26 2019-11-05 Oppo广东移动通信有限公司 Image processing method, image processor, camera arrangement and electronic equipment
CN110751676A (en) * 2019-10-21 2020-02-04 中国科学院空间应用工程与技术中心 Heterogeneous computing system and method based on target detection and readable storage medium
CN111028134A (en) * 2019-11-29 2020-04-17 杭州依图医疗技术有限公司 Image processing method, apparatus, system and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
可重构系统原型设计及动态重构技术实现;高鑫等;《科技创新与应用》;20160528(第15期);全文 *
基于OpenCL与FPGA异构模式的Sobel算法研究;鲍云峰等;《计算机测量与控制》;20180125(第01期);全文 *
基于PCI总线的电视图像处理仿真系统设计;董雪峰等;《现代电子技术》;20101015(第20期);全文 *

Also Published As

Publication number Publication date
CN111696025A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN111696025B (en) Image processing device and method based on reconfigurable memory computing technology
US11836081B2 (en) Methods and systems for handling data received by a state machine engine
US9535861B2 (en) Methods and systems for routing in a state machine
US9817678B2 (en) Methods and systems for detection in a state machine
US9866218B2 (en) Boolean logic in a state machine lattice
US10909452B2 (en) Methods and systems for power management in a pattern recognition processing system
US10671295B2 (en) Methods and systems for using state vector data in a state machine engine
US20180137416A1 (en) Methods and systems for data analysis in a state machine
US9454322B2 (en) Results generation for state machine engines
JP6109186B2 (en) Counter operation in a state machine grid
US10789182B2 (en) System and method for individual addressing
US10430210B2 (en) Systems and devices for accessing a state machine
CN113435570A (en) Programmable convolutional neural network processor, method, device, medium, and terminal
US20220261257A1 (en) Systems and devices for accessing a state machine
CN117764123A (en) Neural network acceleration system, testing device and electronic equipment thereof
Gokhale Nn-X-a hardware accelerator for convolutional neural networks
CN114897128A (en) Chisel-based parameterized hardware neural network and construction method thereof
CN116361229A (en) AI reasoning method and system based on RISC-V and in-memory calculation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant