WO2021232843A1 - 图像数据存储方法、图像数据处理方法、系统及相关装置 - Google Patents

图像数据存储方法、图像数据处理方法、系统及相关装置 Download PDF

Info

Publication number
WO2021232843A1
WO2021232843A1 PCT/CN2021/073790 CN2021073790W WO2021232843A1 WO 2021232843 A1 WO2021232843 A1 WO 2021232843A1 CN 2021073790 W CN2021073790 W CN 2021073790W WO 2021232843 A1 WO2021232843 A1 WO 2021232843A1
Authority
WO
WIPO (PCT)
Prior art keywords
image data
channel
storage
dynamic random
random access
Prior art date
Application number
PCT/CN2021/073790
Other languages
English (en)
French (fr)
Inventor
蒋东东
赵雅倩
董刚
李仁刚
刘海威
杨宏斌
Original Assignee
浪潮电子信息产业股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浪潮电子信息产业股份有限公司 filed Critical 浪潮电子信息产业股份有限公司
Priority to US17/926,966 priority Critical patent/US20230196500A1/en
Priority to EP21808636.1A priority patent/EP4156079A4/en
Publication of WO2021232843A1 publication Critical patent/WO2021232843A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F5/00Methods or arrangements for data conversion without changing the order or content of the data handled
    • G06F5/06Methods or arrangements for data conversion without changing the order or content of the data handled for changing the speed of data flow, i.e. speed regularising or timing, e.g. delay lines, FIFO buffers; over- or underrun control therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/76Television signal recording
    • H04N5/907Television signal recording using static stores, e.g. storage tubes or semiconductor memories
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]

Definitions

  • This application relates to the field of deep learning technology, and in particular to an image data storage method, an image data processing method, an image data processing system, an electronic device, and a storage medium.
  • Convolutional Neural Networks is a type of feedforward neural network that includes convolutional calculations and has a deep structure. Convolutional neural networks are widely used in computer vision, image processing, natural language processing and other fields.
  • Convolutional neural networks usually use a 3*3 size convolution kernel to extract image features.
  • the main implementation method is: cache the image data to the FPGA's off-chip DDR( In Double Data Rate (Double Data Rate) memory, only small 3*3 data is read each time for convolution, and small-range DDR is read multiple times.
  • the above-mentioned image feature extraction method needs to perform multiple address jumps and read and write small pieces of data with non-contiguous addresses.
  • the DDR read and write rate is low, and the read and write capabilities of the DDR cannot be fully released, making it a bottleneck in the image processing speed.
  • the purpose of this application is to provide an image data storage method, an image data processing method, system, an electronic device, and a storage medium, which can increase the processing rate of image data.
  • the image data storage method includes:
  • the image data is sequentially stored in the dynamic random access memory according to a preset storage format, so that the adjacent image data in the dynamic random access memory have continuous storage addresses.
  • storing the image data in the dynamic random access memory in sequence according to a preset storage format includes:
  • the method further includes:
  • the target data is determined according to the data read instruction; wherein, the target data is multi-channel parallel image data;
  • the target data is transferred to the first-in first-out memory of the FPGA.
  • This application also provides an image data processing method, which includes:
  • the step of storing image data in a dynamic random access memory according to a preset storage format includes:
  • the storage starting position includes channel height coordinates and channel width coordinates
  • reading a preset number of multi-channel parallel image data from the dynamic random access memory includes:
  • the FPGA After the first-in-first-out memory of the FPGA is ready, read a preset number of multi-channel parallel new image data according to the next-round memory read address, and store the multi-channel parallel new image data to The first-in first-out memory of the FPGA.
  • reading a preset number of multi-channel parallel image data according to the current-round memory read address includes:
  • a preset number of multi-channel parallel third image data is read.
  • the multi-channel parallel image data is specifically 3*11 multi-channel image data
  • the performing a convolution operation on the target image data in the first-in first-out memory to obtain image feature data includes:
  • a 3*3 convolution check is used to perform a convolution operation on the 9*9 multi-channel image data to obtain the image feature data.
  • the process of converting the 3*11 multi-channel image data in the first-in first-out memory into 9*9 multi-channel image data further includes:
  • the control state machine executes the simultaneous reading operation of parity data, so as to remove the invalid interval generated when the 3*11 multi-channel image data is converted into the 9*9 multi-channel image data.
  • the process of reading a preset number of multi-channel parallel image data from the dynamic random access memory further includes:
  • This application also provides an image data processing system, which includes:
  • a storage module configured to sequentially store image data in a dynamic random access memory according to a preset storage format, so that the adjacent image data in the dynamic random access memory have continuous storage addresses;
  • a reading module configured to read a preset number of multi-channel parallel image data from the dynamic random access memory, and store the multi-channel parallel image data in the first-in first-out memory of the FPGA;
  • the convolution module is configured to perform a convolution operation on the target image data in the first-in first-out memory to obtain image feature data.
  • the present application also provides a storage medium on which a computer program is stored, and when the computer program is executed, the steps performed by the above-mentioned image data processing method and image data storage method are implemented.
  • the present application also provides an electronic device, including a memory and a processor, the memory is stored with a computer program, and when the processor invokes the computer program in the memory, the foregoing image data processing method and image data storage method are executed. A step of.
  • This application provides an image data processing method, which includes sequentially storing image data in a dynamic random access memory according to a preset storage format, so that the adjacent image data in the dynamic random access memory have continuous storage addresses;
  • the dynamic random access memory reads a preset number of multi-channel parallel image data, and stores the multi-channel parallel image data in the first-in first-out memory of the FPGA;
  • the data performs a convolution operation to obtain image feature data.
  • This application first stores the image data in a dynamic random access memory in sequence according to a preset storage format, so that adjacent image data in the dynamic random access memory have continuous storage addresses.
  • the data in the dynamic random access memory is read, the required data can be read sequentially through commands. Because the continuous storage of image data can avoid the storage address jump operation, the read and write rate of the dynamic random access memory is improved.
  • the read image data is stored in the first-in-first-out memory of the FPGA.
  • the image data in the image data performs convolution operation to reduce the delay of read and write operations and improve the efficiency of data storage.
  • This application is based on the large capacity of dynamic random access memory, fast continuous read and write speed, and the small read and write delay of first-in first-out memory.
  • Multi-channel parallel image data is sent to the first-in first-out memory, which reduces the read and write delay of the image data processing flow and improves the processing rate of the image data.
  • This application also provides an image data storage method, an image data processing system, an electronic device, and a storage medium, which have the above-mentioned beneficial effects, and will not be repeated here.
  • FIG. 1 is a flowchart of an image data processing method provided by an embodiment of the application
  • FIG. 2 is a schematic diagram of a three-dimensional convolution calculation process performed in a convolutional neural network provided by an embodiment of the application;
  • FIG. 3 is a schematic diagram of the principle of storing image data in a dynamic random access memory according to an embodiment of the application
  • FIG. 4 is a schematic diagram of a principle of reading multi-channel parallel image data provided by an embodiment of the application.
  • FIG. 5 is a schematic diagram of the principle of a calculation management method for reading the start address of a dynamic random access memory provided by an embodiment of the application;
  • FIG. 6 is a schematic flowchart of a control state machine when DDR data is read according to an embodiment of the application
  • FIG. 7 is a schematic diagram of data reading provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a data conversion provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of a principle of neutral neutral elimination provided by an embodiment of the application.
  • FIG. 10 is a flowchart of an image data storage method provided by an embodiment of the application.
  • FIG. 11 is a schematic structural diagram of an image data processing system provided by an embodiment of this application.
  • FIG. 12 is a schematic structural diagram of a storage medium provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of an electronic device provided by an embodiment of this application.
  • FIG. 1 is a flowchart of an image data processing method provided by an embodiment of the application.
  • S101 sequentially store image data in a dynamic random access memory according to a preset storage format, so that adjacent image data in the dynamic random access memory have continuous storage addresses;
  • FIG. 2 is a schematic diagram of a three-dimensional convolution calculation process performed in a convolutional neural network provided by an embodiment of the application.
  • the three-dimensional convolution calculation process performed in a convolutional neural network is as follows: A color image is 6 ⁇ 6 ⁇ 3, where 3 refers to three color channels, which can be imagined as a stack of three 6 ⁇ 6 images. In order to detect the edge or other features of the image, the color image is convolved with a three-dimensional filter.
  • the dimension of the filter is 3 ⁇ 3 ⁇ 3, that is, the filter also has three layers, corresponding to red, green, and blue. Three channels.
  • the first 6 of the 6 ⁇ 6 ⁇ 3 color image represents the image height
  • the second 6 represents the width
  • this 3 represents the number of channels.
  • the filter also has height, width and number of channels, and the number of channels of the image is equal to the number of channels of the filter, and a 4 ⁇ 4 ⁇ 1 image can be obtained by convolution operation.
  • multiple output channels can be included. For example, after the original image and two convolution kernels are phase-convolved, feature data of two output channels can be obtained.
  • This application first stores the image data that needs to be processed (as shown in Figure 2 is a 6 ⁇ 6 ⁇ 3 color image that performs convolution processing) into a dynamic random access memory in sequence according to a preset storage format.
  • the dynamic random access memory is the off-chip DDR of the FPGA.
  • S102 Read a preset number of multi-channel parallel image data from the dynamic random access memory, and store the multi-channel parallel image data in the first-in first-out memory of the FPGA;
  • this embodiment can read a preset number of multi-channel parallel image data from the dynamic random access memory according to the preset period, because the dynamic random access memory is stored in S101. Is continuous image data, so in this step, multi-channel parallel image data can be obtained through one data reading operation.
  • the convolution operation usually performs a convolution operation on multiple lines of image data.
  • a preset number of data reading operations may be performed to obtain a preset number of multi-channel parallel image data. After a preset number of multi-channel parallel image data is obtained, it can be stored in the first-in first-out memory of the FPGA.
  • the first-in first-out memory of the FPGA is the FIFO (First Input First Output) memory in the RAM (Random Access Memory) resource inside the FPGA.
  • FIFO First Input First Output
  • RAM Random Access Memory
  • the above-mentioned process of reading a preset number of multi-channel parallel image data from the dynamic random access memory may include: determining the current round of memory reading The address is taken, and a preset number of multi-channel parallel image data is read according to the current-round memory read address.
  • this embodiment can also calculate the next round of memory read addresses according to the current round of memory read addresses; after the FPGA's first-in first-out memory is ready, read according to the next round of memory read addresses Fetching a preset number of multi-channel parallel new image data, and storing the multi-channel parallel new image data in the first-in first-out memory of the FPGA.
  • S103 Perform a convolution operation on the target image data in the first-in first-out memory to obtain image feature data.
  • the FPGA can read out the N*N data at any position within one cycle, and use it for the back-end convolution calculation to obtain the image feature data.
  • storing the multi-channel parallel image data in the first-in first-out memory of the FPGA in S102 is equivalent to the input data to the FPGA.
  • the convolution operation on the target image data in S103 is equivalent to the output data of the FPGA.
  • This embodiment may be appropriate Adjust the rate of data reading in S102 and the rate of convolution operation in S103, so that the amount of data inside the FPGA is in a relatively stable state.
  • the image data is sequentially stored in the dynamic random access memory according to the preset storage format, so that adjacent image data in the dynamic random access memory have continuous storage addresses.
  • the data in the dynamic random access memory is read, the required data can be read sequentially through commands. Because the continuous storage of image data can avoid the storage address jump operation, the read and write rate of the dynamic random access memory is improved.
  • the read image data is stored in the first-in-first-out memory of the FPGA.
  • the image data in the image data performs convolution operation to reduce the delay of read and write operations and improve the efficiency of data storage.
  • This embodiment is based on the characteristics of large capacity of dynamic random access memory, fast continuous read and write speed, and small read and write delay of first-in first-out memory.
  • FIG. 3 is a schematic diagram of the principle of storing image data in a dynamic random access memory provided by an embodiment of the application.
  • CH is the number of channels
  • W is the channel width
  • H is the channel height.
  • the number of image channels shown is 512
  • the channel width is 12, and the channel height is 6.
  • the image data are sequentially stored in the dynamic random access memory according to the preset storage format, including the following process: determining the storage starting position of the dynamic random access memory, and sequentially storing the image data from the storage starting position to the dynamic random access memory along the channel direction.
  • the storage starting position includes the channel height coordinates and the channel width coordinates; judging whether the channel width coordinates of the storage starting position are greater than the maximum width; if so, in all the channel directions corresponding to the storage starting position
  • the storage starting position of the storage starting position is sequentially stored in the dynamic random access memory along the channel direction; if not, when all the channel directions corresponding to the storage starting position have been stored, the channel width coordinates of the storage starting position Add 1 to obtain a new storage start position, and store the remaining image data from the new storage start position to the dynamic random access memory sequentially along the channel direction.
  • this embodiment writes the input channel data into the DDR according to the preset storage format, and the numbers in the squares in the figure represent the address value of the image data in the DDR.
  • the channel (CH) direction is fixed at 512.
  • DDR storage is first performed according to the channel number direction, and the corresponding address is 0-511. If the real input channel is less than 512, the corresponding address position is given a value of 0.
  • W width
  • W height
  • H direction When the W direction is also finished, store it in the H direction.
  • the length of W and H can be customized length (such as 7 ⁇ 512).
  • the process of reading a preset number of multi-channel parallel image data from the dynamic random access memory may include: determining the current round of memory read addresses, and according to the current round of memory read addresses Read a preset number of multi-channel parallel image data; calculate the next round of memory read addresses according to the current round of memory read addresses; after the FPGA's first-in first-out memory is ready, according to the next round of The memory read address reads a preset number of multi-channel parallel new image data, and stores the multi-channel parallel new image data to the first-in first-out memory of the FPGA.
  • this embodiment may also use the current round of memory read address as the first starting address, and according to the first starting address Calculate the second start address and the third start address with the data read length; read a preset number of multi-channel parallel first image data according to the first start address; read according to the second start address A preset number of multi-channel parallel second image data; read a preset number of multi-channel parallel third image data according to the third start address.
  • DDR can read 11 data of all channels in one command, and the burst length is sufficient to maintain the read efficiency of DDR above 50%. Please refer to FIG.
  • the process of reading multi-channel parallel image data from DDR can include: reading the first group of 11 data of all channels, and sending a command to DDR, for example: the starting address is h(0)*w(0)* 512, read length is 512*11; read the second group of 11 data after completion, send a command to DDR, start address is h(1)*w(0)*512, read length is 512*11 ; After completion, read the third group of 11 data, send a command to DDR, the starting address is h(2)*w(0)*512.
  • the read length is 512*11; the read data are stored in 3 groups of FIFOs in FPGA, each group has 512 FIFOs. After reading, if stride is 1 (settable), the next new starting address is the previous starting address + 512*9, and the next starting address will be updated by calculation.
  • FIG. 5 is a schematic diagram of the principle of a calculation management method for reading the start address of a dynamic random access memory provided by an embodiment of the application. Assume that the coordinates of the three sets of data in the W and H directions are the values in Table 1:
  • Group coordinate group1 (x, y 1 ) group2 (x, y 2 ) group3 (x, y 3 )
  • the coordinates in the W direction in Table 1 are consistent. Taking Stride (step length) as an example, the new address calculation management method is shown in Table 2. By using 3 multipliers + shift and complement 0, the 500M high-speed clock requirement can be met.
  • FIG. 6 is a schematic diagram of a flow chart for realizing the control state machine when DDR data is read according to an embodiment of the application. Since each write data is written in the order of group1, group2, and group3, The read port is three groups of FIFOs in parallel. In order to reduce the fan-out of RTL (Register Transfer Level, register conversion level circuit), the ready status of group3 FIFO is only judged once when group1 is read. In this embodiment, an address update multiplier can also be set. Under the 500MHz clock, the safe calculation cycles of the address update multiplier are ⁇ 3.
  • the timing requirements will not be met, or it will be necessary to wait for 3 additional calculation cycles, resulting in the entire state
  • the machine delay is increased by 3 clocks. Therefore, it is necessary to design a separate calculation unit in advance to calculate all the parameters required for the next cycle, such as the DDR start address, burst length, etc. Calculate all the values needed after the start before starting, and back up the lock register for use by the state machine at the beginning.
  • the execution time of the entire state machine is used to independently carry out all the values required for the next cycle state, so that the timing requirements of the multiplier under 500MHz can be met. It can also reduce the lut (look-up table) level determined by the state machine ⁇ 4 without causing additional system delay.
  • FIG. 7 is a schematic diagram of data reading provided by an embodiment of the application.
  • the process of calculating the image feature data may include: storing the 3*11 in the first-in first-out memory. Convert the multi-channel image data of 9*9 into 9*9 multi-channel image data; use a 3*3 convolution check to perform a convolution operation on the 9*9 multi-channel image data to obtain the image feature data.
  • the process of converting the 3*11 multi-channel image data in the first-in first-out memory into 9*9 multi-channel image data it further includes: controlling the state machine to perform a simultaneous reading operation of parity data, In order to remove the invalid interval generated when the 3*11 multi-channel image data is converted into the 9*9 multi-channel image data.
  • FIG. 8 is a schematic diagram of a data conversion provided by an embodiment of the application.
  • the FPGA back-end algorithm can change 11 data into 3 9*9s to facilitate 3*3 convolution.
  • the 11 data read here, even if they are continuous, will become 3 9*9s.
  • an invalid interval of 2 cycles is generated, and the invalid interval is a cycle where no convolution operation is required.
  • the state machine design of simultaneous reading of parity can eliminate the intermediate gap and realize continuous output of data.
  • Figure 9 is a schematic diagram of an intermediate neutral elimination principle provided by an embodiment of the application.
  • 11 data of two consecutive channels can be read at the same time, and then 7 clock cycles are waited for 11- >9 After the data change, delay the data of the second channel for 9 cycles, and then splicing with the data of the first channel to complete the elimination of the intermediate interval.
  • the maximum RAM utilization rate is only 15% of VU7 (xilinx Ultrascale plus Virtex 7 FPGA, a kind of FPGA board). It will not cause any pressure on the back-end DSP (Digital Signal Processing) convolutional array wiring.
  • VU7 xilinx Ultrascale plus Virtex 7 FPGA, a kind of FPGA board
  • This application may also provide a method for storing multi-dimensional convolution feature data in DDR.
  • Multi-channel feature data can be read at the same time, suitable for back-end extraction processing, and the DDR reading efficiency is not less than 50%.
  • This application can realize the minimum resource calculation of the starting address of the feature data by changing the configuration parameters. Using three multipliers, it can work safely under a 500M clock without causing additional system delays to the system.
  • the control process of fast reading image data in the above embodiment includes: address parameter calculation and control state machine two-wire coordinated operation to avoid the judgment that the calculation establishment hold time is not satisfied during state transition, Lut cascade ⁇ 4, meets the operating conditions of 500M clock, The required RAM resources do not exceed 15% of VU7.
  • the present invention makes full use of the advantages of large DDR capacity, low price, fast continuous read and write speed, and low read and write delay of FPGA-RAM, combines the two advantages, and designs a method for continuously reading feature data with a 500MHz clock ( Lut series ⁇ 4), feature width, height, and arbitrarily settable ( ⁇ 512), RAM resource utilization rate is less than 15%, and implemented on FPGA using RTL.
  • LUT is Look Up Table (lookup table)
  • This embodiment designs a high-speed, multi-channel, and low-resource hardware architecture by combining the characteristics of fast continuous reading and writing of DDR and small FPGA RAM resources, which can realize image data under the control of different configuration parameters under a 500MHz clock. Continuous readout, and the resource utilization rate does not exceed 15%. Can be applied to neural network calculations.
  • This embodiment proposes a multi-dimensional convolution, multi-channel, high-speed and low-capacity data reading method, which can fully meet the common ResNet50 common convolution model extraction requirements, and multi-modules can be arbitrarily expanded under the condition of sufficient hardware resources to improve the parallelism of data processing , To speed up the calculation.
  • the embodiment of the present application also provides an image data storage method, as shown in FIG. 10, which specifically includes the following steps:
  • Step S1 receiving an image storage instruction
  • Step S2 Determine image data and dynamic random access memory according to the image storage instruction
  • Step S3 The image data is sequentially stored in the dynamic random access memory according to a preset storage format, so that the adjacent image data in the dynamic random access memory have continuous storage addresses.
  • the image storage instruction in this embodiment may be an instruction issued by a user, or an instruction generated during image data processing.
  • the image data is sequentially stored in the dynamic random access memory according to the preset storage format, so that adjacent image data in the dynamic random access memory have continuous storage addresses.
  • the required data can be read by sequential commands. Since the continuous storage of image data can avoid the storage address jump operation, the read and write rate of the dynamic random access memory is improved.
  • the processing rate of the image data can be increased.
  • the process of sequentially storing image data in the dynamic random access memory in accordance with the preset storage format in step S3 may be: determining the storage starting position of the dynamic random access memory, and storing the image data from the storage The starting position is sequentially stored in the dynamic random access memory along the channel direction; wherein, the storage starting position includes channel height coordinates and channel width coordinates; judging whether the channel width coordinates of the storage starting position are greater than the maximum width; if so , When all the channel directions corresponding to the storage starting position have been stored, the channel height coordinates of the storage starting position are added by 1, and the channel width coordinates of the storage starting position are set to 0 to obtain a new Store the starting position, and store the remaining image data from the new storage starting position to the dynamic random access memory sequentially along the channel direction; if not, all the channel directions corresponding to the storage starting position are all stored At this time, the channel width coordinate of the storage starting position is added by 1 to obtain a new storage starting position, and the remaining image
  • the target data is determined according to the data read instruction; wherein, the target data It is multi-channel parallel image data; the target data is transmitted to the first-in first-out memory of the FPGA.
  • An image data processing system 400 is also provided in an embodiment of the present application. As shown in FIG. 11, the system 400 may include:
  • the storage module 401 is configured to sequentially store image data in a dynamic random access memory according to a preset storage format, so that the adjacent image data in the dynamic random access memory have continuous storage addresses;
  • the reading module 402 is configured to read a preset number of multi-channel parallel image data from the dynamic random access memory, and store the multi-channel parallel image data in the first-in first-out memory of the FPGA;
  • the convolution module 403 is configured to perform a convolution operation on the target image data in the first-in first-out memory to obtain image feature data.
  • the image data is sequentially stored in the dynamic random access memory according to the preset storage format, so that adjacent image data in the dynamic random access memory have continuous storage addresses.
  • the data in the dynamic random access memory is read, the required data can be read sequentially through commands. Because the continuous storage of image data can avoid the storage address jump operation, the read and write rate of the dynamic random access memory is improved.
  • the read image data is stored in the first-in-first-out memory of the FPGA.
  • the image data in the image data performs convolution operation to reduce the delay of read and write operations and improve the efficiency of data storage.
  • This embodiment is based on the characteristics of large capacity of dynamic random access memory, fast continuous read and write speed, and small read and write delay of first-in first-out memory.
  • the storage module is used to determine the storage starting position of the dynamic random access memory, and sequentially store image data from the storage starting position to the dynamic random access memory along the channel direction;
  • the storage starting position includes the channel height Coordinates and channel width coordinates; also used to determine whether the channel width coordinates of the storage starting position are greater than the maximum width; if so, when all the channel directions corresponding to the storage starting position have been stored, the storage The channel height coordinate of the starting position is increased by 1, and the channel width coordinate of the storage starting position is set to 0 to obtain a new storage starting position, and the remaining image data is sequentially from the new storage starting position along the channel direction Store to the dynamic random access memory; if not, when all the channel directions corresponding to the storage start position are stored, add 1 to the channel width coordinate of the storage start position to obtain a new storage start position , Storing the remaining image data to the dynamic random access memory sequentially along the channel direction from the new storage starting position.
  • the reading module is used to determine the current-round memory read address, and read a preset number of multi-channel parallel image data according to the current-round memory read address; and is also used to read the current-round memory read address according to the Calculate the next-round memory read address; also used to read a preset number of new multi-channel parallel image data according to the next-round memory read address after the first-in-first-out memory of the FPGA is ready, and Storing the multi-channel parallel new image data to the first-in first-out memory of the FPGA.
  • the reading module is configured to use the current-round memory read address as the first start address, and calculate the second start address and the third start address according to the first start address and the data read length; and Used to read a preset number of multi-channel parallel first image data according to the first start address; also used to read a preset number of multi-channel parallel second image data according to the second start address; It is also used to read a preset number of third image data in parallel with multiple channels according to the third start address;
  • the multi-channel parallel image data is specifically 3*11 multi-channel image data
  • the corresponding convolution module is used to convert the 3*11 multi-channel image data in the first-in first-out memory into 9*9 multi-channel image data; it is also used to check the 3*3 convolution
  • the 9*9 multi-channel image data performs a convolution operation to obtain the image feature data.
  • the interval elimination module is used to control the state machine to perform the simultaneous reading operation of parity data during the process of converting the 3*11 multi-channel image data in the first-in first-out memory into 9*9 multi-channel image data, In order to remove the invalid interval generated when the 3*11 multi-channel image data is converted into the 9*9 multi-channel image data.
  • the bit-compensation module is used to determine whether the data amount of the read multi-channel parallel image data is a preset value in the process of reading a preset number of multi-channel parallel image data from the dynamic random access memory If not, add zeros after the read multi-channel parallel image data to make the amount of data equal to the preset value.
  • the present application also provides a storage medium 601 on which a computer program 610 is stored.
  • the storage medium may include: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the present application also provides an electronic device 501, which may include a memory 510 and a processor 520.
  • the memory 510 stores a computer program 511, and the processor 520 calls the computer in the memory 510.
  • the steps provided in the above-mentioned embodiments can be implemented.
  • the electronic device may also include various network interfaces, power supplies and other components.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Signal Processing (AREA)
  • Neurology (AREA)
  • Image Processing (AREA)

Abstract

提供一种图像数据存储方法、图像数据处理方法、系统及相关装置。该图像数据处理方法包括以下步骤:将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至FPGA的先进先出存储器;对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。该方法能够提高图像数据的处理速率。

Description

图像数据存储方法、图像数据处理方法、系统及相关装置
本申请要求于2020年05月22日提交中国国家知识产权局,申请号为202010442519.0,发明名称为“图像数据存储方法、图像数据处理方法、系统及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及深度学习技术领域,特别涉及一种图像数据存储方法、一种图像数据处理方法、一种图像数据处理系统、一种电子设备及一种存储介质。
背景技术
卷积神经网络(Convolutional Neural Networks,CNN)是一类包含卷积计算且具有深度结构的前馈神经网络,卷积神经网络被广泛应用于计算机视觉、图像处理、自然语言处理等领域。
卷积神经网络通常使用3*3大小的卷积核实现对于图像特征的提取,在现有的FPGA CNN卷积数据提取方案中,主要实现方式为:将图像数据缓存到FPGA的片外DDR(Double Data Rate,双倍速率)存储器中,每次只读取小3*3数据进行卷积,利用多次读取小范围DDR。但是,上述图像特征提取方法需要执行多次地址跳转以及非连续地址的小段数据读写,DDR读写速率较低,无法完全释放DDR的读写能力,使其成为图像处理速度的瓶颈。
因此,如何提高图像数据的处理速率是本领域技术人员目前需要解决的技术问题。
发明内容
本申请的目的是提供一种图像数据存储方法、一种图像数据处理方法、系统、一种电子设备及一种存储介质,能够提高图像数据的处理速率。
为解决上述技术问题,本申请提供一种图像数据存储方法,该图像数据存储方法包括:
接收图像存储指令;
根据所述图像存储指令确定图像数据和动态随机存储器;
将所述图像数据按照预设存储格式依次存储至所述动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址。
可选的,将所述图像数据按照预设存储格式依次存储至所述动态随机存储器,包括:
确定动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;其中,所述存储起始位置包括通道高度坐标和通道宽度坐标;
判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;
若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;
若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起始位置的通道宽度坐标加1得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
可选的,在将所述图像数据按照预设存储格式依次存储至所述动态随机存储器之后,还包括:
若接收到数据读取指令,则根据所述数据读取指令确定目标数据;其中,所述目标数据为多通道并行的图像数据;
将所述目标数据传输至FPGA的先进先出存储器。
本申请还提供一种图像数据处理方法,该图像数据处理方法包括:
将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;
从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至FPGA的先进先出存储器;
对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
可选的,所述将图像数据按照预设存储格式依次存储至动态随机存储器,包括:
确定所述动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;所述存储起始位置包括通道高度坐标和通道宽度坐标;
判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;
若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;
若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起始位置的通道宽度坐标加1得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
可选的,从所述动态随机存储器中读取预设数量的多通道并行的图像数据包括:
确定本轮存储器读取地址,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据;
相应的,还包括:
根据所述本轮存储器读取地址计算下一轮存储器读取地址;
在所述FPGA的先进先出存储器准备就绪后,根据所述下一轮存储器读取地址读取预设数量的多通道并行的新图像数据,并将所述多通道并行的新图像数据存储至所述FPGA的先进先出存储器。
可选的,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据包括:
将所述本轮存储器读取地址作为第一起始地址,并根据所述第一起始地址与数据读取长度计算第二起始地址和第三起始地址;
根据所述第一起始地址读取预设数量的多通道并行的第一图像数据;
根据所述第二起始地址读取预设数量的多通道并行的第二图像数据;
根据所述第三起始地址读取预设数量的多通道并行的第三图像数据。
可选的,所述多通道并行的图像数据具体为3*11的多通道图像数据;
相应的,所述对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据,包括:
将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据;
利用3*3的卷积核对所述9*9的多通道图像数据执行卷积操作,得到所述图像特征数据。
可选的,在将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据过程中,还包括:
控制状态机执行奇偶数据同时读取操作,以便去除所述3*11的多通道图像数据转化为所述9*9的多通道图像数据时产生的无效间隔。
可选的,在从所述动态随机存储器中读取预设数量的多通道并行的图像数据的过程中,还包括:
判断读取的所述多通道并行的图像数据的数据量是否为预设值;
若否,则在读取的所述多通道并行的图像数据后补零以使数据量等于所述预设值。
本申请还提供了一种图像数据处理系统,该图像数据处理系统包括:
存储模块,用于将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;
读取模块,用于从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至FPGA的先进先出存储器;
卷积模块,用于对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
本申请还提供了一种存储介质,其上存储有计算机程序,所述计算机程序执行时实现上述图像数据处理方法和图像数据存储方法执行的步骤。
本申请还提供了一种电子设备,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现上述图像数据处理方法和图像数据存储方法执行的步骤。
本申请提供了一种图像数据处理方法,包括将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至FPGA的先进先出存储器;对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
本申请首先将图像数据按照预设存储格式依次存储至动态随机存储器中,使得动态随机存储器中相邻的图像数据具有连续的存储地址。在对动态随机存储器中的数据进行数据读取时,可以通过命令依次读取所需的数据,由于图像数据连续存储能够避免存储地址跳转操作,提高了对动态随机存储器的读写速率。在从动态随机存储器读取到多通道并行的图像数据后,将读取得到的图像数据存储至FPGA的先进先出存储器,先进先出存储器具有读写延迟小的特点,因此对先进先出存储器中的图像数据执行卷积操作降低读写操作延迟,提高数据存储效率。本申请基于动态随机存储器容量大、连续读写速度快的特点,以及先进先出存储器读写延迟小的特点,先将全部的图像数据顺序存储至动态随机存储器,再从动态随机存储器中读取多通道并行的图像数据至先进先出存储器,降低了图像数据处理流程的读写延时,提高了图像数据的处理速率。本申请同时还提供了一种图像数据存储方法、一种图像数据处理系统、一种电子设备和一种存储介质,具有上述有益效果,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例,下面将对实施例中所需要使用的附图做简单的介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例所提供的一种图像数据处理方法的流程图;
图2为本申请实施例所提供的一种卷积神经网络中所进行的三维卷积计算过程示意图;
图3为本申请实施例所提供的一种图像数据存储至动态随机存储器的原理示意图;
图4为本申请实施例所提供的一种读取多通道并行的图像数据的原理示意图;
图5为本申请实施例所提供的一种读动态随机存储器的起始地址进行计算管理方式原理示意图;
图6为本申请实施例所提供的一种实现DDR数据读取时控制状态机的流程示意图;
图7为本申请实施例所提供的一种数据读取示意图;
图8为本申请实施例所提供的一种数据转换示意图;
图9为本申请实施例所提供的一种中间空挡消除原理示意图;
图10为本申请实施例所提供的一种图像数据存储方法的流程图;
图11为本申请实施例所提供的一种图像数据处理系统的结构示意图;
图12为本申请实施例所提供的一种存储介质的结构示意图;
图13为本申请实施例所提供的一种电子设备的结构示意图。
具体实施方式
为使本申请实施例的目的、技术方案和优点更加清楚,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
下面请参见图1,图1为本申请实施例所提供的一种图像数据处理方法的流程图。
具体步骤可以包括:
S101:将图像数据按照预设存储格式依次存储至动态随机存储器,以使动态随机存储器中相邻的所述图像数据具有连续的存储地址;
其中,本实施例可以应用于包括卷积神经网络的图像处理设备,该存储设备中可以由FPGA(Field Programmable Gate Array,现场可编程逻辑门阵列)执行相关的图像处理操作。请参见图2,图2为本申请实施例所提供的一种卷积神经网络中所进行的三维卷积计算过程示意图,卷积神经网络中所进行的三维卷积计算过程如下所述:假设一个彩色图像是6×6×3,这里的3指的是三个颜色通道,可以想象成三个6×6图像的堆叠。为了检测图像的边缘或者其他的特征,把该彩色图像与一个三维的过滤器相卷积,过滤器的维度是3×3×3,即该过滤器也有三层,分别对应红、绿、蓝三个通道。6×6×3彩色图像的第一个6代表图像高度,第二个6代表宽度,这个3代表通道的数目。同样过滤器也有高,宽和通道数,并且图像的通道数和过滤器的通道数相等,通过卷积操作可以得到一个4×4×1的图像。当然,为了提取多种特征,输出通道可以包括多个,例如可以将原图像和2个卷积核相卷后,可以得到2个输出通道的特征数据。
本申请首先将需要处理的图像数据(如图2中为执行卷积处理的6×6×3的彩色图像)按照预设存储格式依次存储至动态随机存储器中。动态随机存储器即FPGA的片外DDR,通过将图像数据按照预设格式存储能够使动态随机存储器中相邻存储地址中存储的图像数据是连续的。由于整体较大,需要多个存储地址存储图像数据,在本步骤之前可以存在将原始图像转化为连续的图像数据的操作,本实施例中所提到的图像数据连续的相邻存储地址对应的图像数据在图像数据对应的原图中也是连续的。通过按照预设格式进行存储,能够使相邻的所述图像数据具有连续的存储地址。
S102:从动态随机存储器中读取预设数量的多通道并行的图像数据,并将多通道并行的图像数据存储至FPGA的先进先出存储器;
其中,在将图像数据存储至动态随机存储器后,本实施例可以按照预设周期从所述动态随机存储器中读取预设数量的多通道并行的图像数据,由于在S101中动态随机存储器中存储的为连续的图像数据,因此在本步骤 中可以通过一次数据读取操作得到多通道并行的图像数据。卷积操作通常对多行图像数据执行卷积操作,在本实施例中可以执行预设数量次数据读取操作得到预设数量的多通道并行的图像数据。在得到预设数量的多通道并行的图像数据后,可以将其存储至FPGA的先进先出存储器。FPGA的先进先出存储器即FPGA内部RAM(Random Access Memory,随机存取存储器)资源中的FIFO(First Input First Output,先入先出)存储器。在从所述动态随机存储器中读取预设数量的多通道并行的图像数据的过程中,还可以判断读取的所述多通道并行的图像数据的数据量是否为预设值;若否,则在读取的所述多通道并行的图像数据后补零以使数据量等于所述预设值。
由于需要多次从动态随机存储器中读取数据,因此作为一种可行的实施方式,上述从动态随机存储器中读取预设数量的多通道并行的图像数据的过程可以包括:确定本轮存储器读取地址,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据。相应的,本实施例还可以根据所述本轮存储器读取地址计算下一轮存储器读取地址;在所述FPGA的先进先出存储器准备就绪后,根据所述下一轮存储器读取地址读取预设数量的多通道并行的新图像数据,并将所述多通道并行的新图像数据存储至所述FPGA的先进先出存储器。
S103:对先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
其中,在将图像数据读取到FPGA的先入先出存储器后,FPGA可以在1个周期内读出任意位置的N*N数据,用于后端的卷积计算得到图像特征数据。本实施例中S102中将多通道并行的图像数据存储至FPGA的先进先出存储器相当于对FPGA的输入数据,S103中对目标图像数据执行卷积操作相当于FPGA输出数据,本实施例可以适当调节S102中数据读取的速率和S103中卷积操作的速率,使得FPGA内部的数据量处于相对稳定状态。
本实施例首先将图像数据按照预设存储格式依次存储至动态随机存储器中,使得动态随机存储器中相邻的图像数据具有连续的存储地址。在对动态随机存储器中的数据进行数据读取时,可以通过命令依次读取所需的数据,由于图像数据连续存储能够避免存储地址跳转操作,提高了对动态随机存储器的读写速率。在从动态随机存储器读取到多通道并行的图像数据后,将读取得到的图像数据存储至FPGA的先进先出存储器,先进先出存储器具有读写延迟小的特点,因此对先进先出存储器中的图像数据执行卷积操作降低读写操作延迟,提高数据存储效率。本实施例基于动态随机存储器容量大、连续读写速度快的特点,以及先进先出存储器读写延迟小的特点,先将全部的图像数据顺序存储至动态随机存储器,再从动态随机存储器中读取多通道并行的图像数据至先进先出存储器,降低了图像数据处理流程的读写延时,提高了图像数据的处理速率。
请参见图3,图3为本申请实施例所提供的一种图像数据存储至动态随机存储器的原理示意图,图3中的CH为通道数,W为通道宽度,H为通道高度,图中所示的图像通道数为512,通道宽度为12,通道高度为6。将图像数据按照预设存储格式依次存储至动态随机存储器,包括以下过程:确定所述动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;所述存储起始位置包括通道高度坐标和通道宽度坐标;判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起 始位置的通道宽度坐标加1得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
如图3所示,本实施例将输入通道数据按照预设存储格式写入DDR中,图中方格内的数字代表了图像数据在DDR中的地址值。通道(CH)方向固定为512,DDR中先按照通道数方向进行存储,对应地址为0-511,如果真实输入通道小于512,则对应的地址位置给0值。通道(CH)方向完成后,按照宽度(W)方向进行数据存储。当W方向也完毕后,再按照H方向进行存储。W和H的长度可以为自定义长度(如7~512)。
作为对于上述实施例的进一步介绍,从所述动态随机存储器中读取预设数量的多通道并行的图像数据的过程可以包括:确定本轮存储器读取地址,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据;根据所述本轮存储器读取地址计算下一轮存储器读取地址;在所述FPGA的先进先出存储器准备就绪后,根据所述下一轮存储器读取地址读取预设数量的多通道并行的新图像数据,并将所述多通道并行的新图像数据存储至所述FPGA的先进先出存储器。具体的,当多通道并行的图像数据具体为3*11的多通道图像数据时,本实施例还可以将所述本轮存储器读取地址作为第一起始地址,并根据所述第一起始地址与数据读取长度计算第二起始地址和第三起始地址;根据所述第一起始地址读取预设数量的多通道并行的第一图像数据;根据所述第二起始地址读取预设数量的多通道并行的第二图像数据;根据所述第三起始地址读取预设数量的多通道并行的第三图像数据。在图3对应实施方式的数据存储方式的基础上,DDR一次命令可以读取出所有通道的11个数据,burst长度足够使DDR的读取效率维持在50%以上。请参见图4,图4为本申请实施例所提供的一种读取多通道并行的图像数据的原理示意图。从DDR中读取多通道并行的图像数据的过程可以包括:读取所有通道的第一组11个数据,给DDR发送一次命令,例:起始地址为h(0)*w(0)*512,读取长度为512*11;完成后读取第二组 11个数据,给DDR发送一次命令,起始地址为h(1)*w(0)*512,读取长度为512*11;完成后读取第三组11个数据,给DDR发送一次命令,起始地址为h(2)*w(0)*512。读取长度为512*11;读出的数据分别存储到3组FPGA内的FIFO中,每组有512个FIFO。读完后,假如stride为1(可设),则下一组新的起始地址为上一组起始地址+512*9,下一组起始地址会跟随计算更新。
请参见图5,图5为本申请实施例所提供的一种读动态随机存储器的起始地址进行计算管理方式原理示意图,设3组数据在W和H方向的坐标为表1中的值:
表1起始地址坐标表
坐标
group1 (x,y 1)
group2 (x,y 2)
group3 (x,y 3)
表1中W方向的坐标一致。以Stride(步长)为1举例,则新的地址计算管理方式如表2所示,通过使用3个乘法器+移位补0可以满足500M高速时钟要求。
表2地址计算关系表
x=0 x!=0
group1 add1=(y 1×w)<<9 add1=add1+(9<<9)
group2 add2=(y 2×w)<<9 add1=add1+(9<<9)
group3 add3=(y 3×w)<<9 add1=add1+(9<<9)
请参见图6,图6为本申请实施例所提供的一种实现DDR数据读取时控制状态机的流程示意图,由于每次写入数据都是按照group1、group2,group3的顺序写入,但是读出端口是三组FIFO并行读出,为了降低RTL(Register Transfer Level,寄存器转换级电路)的扇出,只在开始读取group1时判断一次group3 FIFO的ready状态即可。本实施例还可以设置地 址更新乘法器。在500MHz时钟下,地址更新乘法器的安全计算周期为≥3个,如果只在换行判断时临时计算需要更新的参数,则要么无法满足时序要求,要么需要额外等待3个计算周期,造成整个状态机延迟增加3个时钟,因此,此处需要在单独设计提前计算单元,用于计算下一循环所需的所有参数,如DDR起始地址、burst(突发)长度等。开始前先将开始后需要的所有数值计算出来,开始时锁定寄存器备份给状态机使用。同时利用整个状态机执行的时间,独立进行下一个循环状态所需要的所有数值,这样可以满足500MHz下乘法器的时序要求。也可以降低状态机判断的lut(查找表)层≤4,同时也不会造成额外的系统延迟。
当3组FIFO中数据准备好后,可以按照通道读出3*11个数据,因为输入通道数可设,所以只需要读取需要的输入通道数,如当输入通道为3时,只需要读取3个通道的3*11,读取方式如下图7所示,图7为本申请实施例所提供的一种数据读取示意图。
作为一种可行的实施方式,当多通道并行的图像数据具体为3*11的多通道图像数据时,计算图像特征数据的过程可以包括:将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据;利用3*3的卷积核对所述9*9的多通道图像数据执行卷积操作,得到所述图像特征数据。进一步的,在将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据过程中,还包括:控制状态机执行奇偶数据同时读取操作,以便去除所述3*11的多通道图像数据转化为所述9*9的多通道图像数据时产生的无效间隔。
具体的,请参见图8,图8为本申请实施例所提供的一种数据转换示意图。FPGA的后端算法可以会将11个数据变为3个9*9,用来方便3*3的卷积,此处读出的11个数据,即使连续,也会在变为3个9*9时,产生一个2周期的无效间隔,无效间隔为无需进行卷积操作的周期。本实施例可 以通过奇偶同时读取的状态机设计,来消除中间的空档实现数据的连续输出。请参见图9,图9为本申请实施例所提供的一种中间空挡消除原理示意图,本实施例可以先同时读出连续2个通道的11个数据,然后等待7个时钟周期,进行11->9数据变化后,延迟第二个通道的数据9个周期,再与第一个通道的数据进行拼接,即完成了中间间隔的消除。
设真实的输入通道数为Cin,DDR时钟F_ddr为250MHz,后端卷积时钟F_dsp为500MHz,每组FIFO的个数/64为N(上述例子为N=512/64=8)则两端数据带宽平衡公式为:
Figure PCTCN2021073790-appb-000001
当N=1时(FIFO个数为64),则Cin≥12。只要真实输入通道数Cin足够大,数据可以按照500MHz的时钟无效率损失地进行数据传输和运行,当数据通道Cin≥12,可适当修改DDR的存储格式即可(如CH=64),无需更改RTL设计。如果更小,则不属于多输入通道的条件,本实施例也可在一定损失效率情况下使用。当每组FIFO为512个时,为了实现流水操作,FIFO深度需要可实现乒乓功能,深度为11*2=22个,可以实现当读取feature(特征)数据时,不会造成FIFO后端没有数据而读取暂停的情况。此时的最大RAM利用率也只为VU7(xilinx Ultrascale plus Virtex 7 FPGA,一种FPGA板卡)的15%。不会给后端的DSP(Digital Signal Processing,数字信号处理)卷积阵列布线造成任何压力。
本申请还可以提供一种多维卷积feature数据在DDR中的存储方法。可以同时读出多通道feature数据,适合后端提取处理,并且DDR读取效率不小于50%。本申请可以通过配置参数的变化,实现对feature数据起始地址的最小资源计算,使用3个乘法器,可以安全工作在500M时钟下,不会给系统造成额外的系统延迟。上述实施例速读取图像数据的控制过程包括:地址参数计算和控制状态机双线配合运行,避免状态转换时判断计算建立 保持时间不满足,Lut级联≤4,满足500M时钟的运行条件,需要的RAM资源不超过VU7的15%。本实施例本发明充分利用DDR容量大、价格低、连续读写速度快,FPGA-RAM读写延迟小的优点,将两个优点合并,设计了一种500MHz时钟连续读取feature数据的方法(lut级数≤4),feature宽、高、任意可设(≤512),RAM资源利用率小于15%,并采用RTL在FPGA上进行了实现。LUT为Look Up Table(查找表)
本实施例通过结合DDR连续读写快、FPGA RAM资源小的特点,设计出了一种高速、多通道、低资源的硬件架构,可以在不同的配置参数控制下,在500MHz时钟下实现图像数据的连续读出,并且资源利用率不超过15%。可以应用于神经网络计算。本实施例提出多维卷积多通道高速低容量数据读取方法,可完全满足常见的ResNet50常见的卷积模型提取需求,在硬件资源充足的情况下可以任意扩展多模块,提高数据处理的并行度,加快计算的速度。
本申请实施例还提供一种图像数据存储方法,如图10所示,具体包括以下步骤:
步骤S1:接收图像存储指令;
步骤S2:根据所述图像存储指令确定图像数据和动态随机存储器;
步骤S3:将所述图像数据按照预设存储格式依次存储至所述动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址。
其中,本实施例中的图像存储指令可以为用户下发的指令,也可以为图像数据处理过程中产生的指令。将图像数据按照预设存储格式依次存储至动态随机存储器中,使得动态随机存储器中相邻的图像数据具有连续的存储地址。在对动态随机存储器中的数据进行数据读取时,可以通过依次命令读取所需的数据,由于图像数据连续存储能够避免存储地址跳转操作, 提高了对动态随机存储器的读写速率。在对通过上述方法存储的图像数据执行图像处理操作时,能够提高图像数据的处理速率。
作为对于上述实施例的进一步介绍,步骤S3中将图像数据按照预设存储格式依次存储至所述动态随机存储器的过程可以为:确定动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;其中,所述存储起始位置包括通道高度坐标和通道宽度坐标;判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起始位置的通道宽度坐标加1得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
进一步的,在将所述图像数据按照预设存储格式依次存储至所述动态随机存储器之后,若接收到数据读取指令,则根据所述数据读取指令确定目标数据;其中,所述目标数据为多通道并行的图像数据;将所述目标数据传输至FPGA的先进先出存储器。
本申请实施例还提供的一种图像数据处理系统400,如图11所示,该系统400可以包括:
存储模块401,用于将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;
读取模块402,用于从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至FPGA的先进先出存储器;
卷积模块403,用于对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
本实施例首先将图像数据按照预设存储格式依次存储至动态随机存储器中,使得动态随机存储器中相邻的图像数据具有连续的存储地址。在对动态随机存储器中的数据进行数据读取时,可以通过命令依次读取所需的数据,由于图像数据连续存储能够避免存储地址跳转操作,提高了对动态随机存储器的读写速率。在从动态随机存储器读取到多通道并行的图像数据后,将读取得到的图像数据存储至FPGA的先进先出存储器,先进先出存储器具有读写延迟小的特点,因此对先进先出存储器中的图像数据执行卷积操作降低读写操作延迟,提高数据存储效率。本实施例基于动态随机存储器容量大、连续读写速度快的特点,以及先进先出存储器读写延迟小的特点,先将全部的图像数据顺序存储至动态随机存储器,再从动态随机存储器中读取多通道并行的图像数据至先进先出存储器,降低了图像数据处理流程的读写延时,提高了图像数据的处理速率。
进一步的,存储模块用于确定所述动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;所述存储起始位置包括通道高度坐标和通道宽度坐标;还用于判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起始位置的通道宽度坐标加1得到新的 存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
进一步的,读取模块用于确定本轮存储器读取地址,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据;还用于根据所述本轮存储器读取地址计算下一轮存储器读取地址;还用于在所述FPGA的先进先出存储器准备就绪后,根据所述下一轮存储器读取地址读取预设数量的多通道并行的新图像数据,并将所述多通道并行的新图像数据存储至所述FPGA的先进先出存储器。
进一步的,读取模块用于将所述本轮存储器读取地址作为第一起始地址,并根据所述第一起始地址与数据读取长度计算第二起始地址和第三起始地址;还用于根据所述第一起始地址读取预设数量的多通道并行的第一图像数据;还用于根据所述第二起始地址读取预设数量的多通道并行的第二图像数据;还用于根据所述第三起始地址读取预设数量的多通道并行的第三图像数据;
进一步的,所述多通道并行的图像数据具体为3*11的多通道图像数据;
相应的卷积模块用于将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据;还用于利用3*3的卷积核对所述9*9的多通道图像数据执行卷积操作,得到所述图像特征数据。
进一步的,还包括:
间隔消除模块,用于在将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据过程中,控制状态机执行奇偶数据同时读取操作,以便去除所述3*11的多通道图像数据转化为所述9*9的多通道图像数据时产生的无效间隔。
进一步的,还包括:
补位模块,用于在从所述动态随机存储器中读取预设数量的多通道并行的图像数据的过程中,判断读取的所述多通道并行的图像数据的数据量 是否为预设值;若否,则在读取的所述多通道并行的图像数据后补零以使数据量等于所述预设值。
由于系统部分的实施例与方法部分的实施例相互对应,因此系统部分的实施例请参见方法部分的实施例的描述,这里暂不赘述。
如图12所示,本申请还提供了一种存储介质601,其上存有计算机程序610,该计算机程序610被执行时可以实现上述实施例所提供的步骤。该存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
如图13所示,本申请还提供了一种电子设备501,可以包括存储器510和处理器520,所述存储器510中存有计算机程序511,所述处理器520调用所述存储器510中的计算机程序511时,可以实现上述实施例所提供的步骤。当然所述电子设备还可以包括各种网络接口,电源等组件。
说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的系统而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。应当指出,对于本技术领域的普通技术人员来说,在不脱离本申请原理的前提下,还可以对本申请进行若干改进和修饰,这些改进和修饰也落入本申请权利要求的保护范围内。
还需要说明的是,在本说明书中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且 还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的状况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (13)

  1. 一种图像数据存储方法,其特征在于,包括:
    接收图像存储指令;
    根据所述图像存储指令确定图像数据和动态随机存储器;
    将所述图像数据按照预设存储格式依次存储至所述动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址。
  2. 根据权利要求1所述图像数据存储方法,其特征在于,将所述图像数据按照预设存储格式依次存储至所述动态随机存储器,包括:
    确定动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;其中,所述存储起始位置包括通道高度坐标和通道宽度坐标;
    判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;
    若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;
    若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起始位置的通道宽度坐标加1得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
  3. 根据权利要求1所述图像数据存储方法,其特征在于,在将所述图像数据按照预设存储格式依次存储至所述动态随机存储器之后,还包括:
    若接收到数据读取指令,则根据所述数据读取指令确定目标数据;其中,所述目标数据为多通道并行的图像数据;
    将所述目标数据传输至现场可编程逻辑门阵列的先进先出存储器。
  4. 一种图像数据处理方法,其特征在于,包括:
    将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;
    从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至现场可编程逻辑门阵列的先进先出存储器;
    对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
  5. 根据权利要求4所述图像数据处理方法,其特征在于,所述将图像数据按照预设存储格式依次存储至动态随机存储器,包括:
    确定所述动态随机存储器的存储起始位置,将图像数据从所述存储起始位置沿通道方向依次存储至所述动态随机存储器;其中,所述存储起始位置包括通道高度坐标和通道宽度坐标;
    判断所述存储起始位置的通道宽度坐标是否大于宽度最大值;
    若是,在当所述存储起始位置对应的所有通道方向均存储完毕时,将所述存储起始位置的通道高度坐标加1,并将所述存储起始位置的通道宽度坐标置0得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器;
    若否,在当所述存储起始位置对应的所有通道方向均存储完毕时,则将所述存储起始位置的通道宽度坐标加1得到新的存储起始位置,将剩余的图像数据从所述新的存储起始位置沿通道方向依次存储至所述动态随机存储器。
  6. 根据权利要求4所述图像数据处理方法,其特征在于,从所述动态随机存储器中读取预设数量的多通道并行的图像数据包括:
    确定本轮存储器读取地址,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据;
    相应的,还包括:
    根据所述本轮存储器读取地址计算下一轮存储器读取地址;
    在所述现场可编程逻辑门阵列的先进先出存储器准备就绪后,根据所述下一轮存储器读取地址读取预设数量的多通道并行的新图像数据,并将所述多通道并行的新图像数据存储至所述现场可编程逻辑门阵列的先进先出存储器。
  7. 根据权利要求6所述图像数据处理方法,其特征在于,根据所述本轮存储器读取地址读取预设数量的多通道并行的图像数据包括:
    将所述本轮存储器读取地址作为第一起始地址,并根据所述第一起始地址与数据读取长度计算第二起始地址和第三起始地址;
    根据所述第一起始地址读取预设数量的多通道并行的第一图像数据;
    根据所述第二起始地址读取预设数量的多通道并行的第二图像数据;
    根据所述第三起始地址读取预设数量的多通道并行的第三图像数据。
  8. 根据权利要求7所述图像数据处理方法,其特征在于,所述多通道并行的图像数据具体为3*11的多通道图像数据;
    相应的,所述对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据,包括:
    将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据;
    利用3*3的卷积核对所述9*9的多通道图像数据执行卷积操作,得到所述图像特征数据。
  9. 根据权利要求8所述图像数据处理方法,其特征在于,在将所述先进先出存储器中的所述3*11的多通道图像数据转化为9*9的多通道图像数据过程中,还包括:
    控制状态机执行奇偶数据同时读取操作,以便去除所述3*11的多通道图像数据转化为所述9*9的多通道图像数据时产生的无效间隔。
  10. 根据权利要求4至9任一项所述图像数据处理方法,其特征在于,在从所述动态随机存储器中读取预设数量的多通道并行的图像数据的过程中,还包括:
    判断读取的所述多通道并行的图像数据的数据量是否为预设值;
    若否,则在读取的所述多通道并行的图像数据后补零以使数据量等于所述预设值。
  11. 一种图像数据处理系统,其特征在于,包括:
    存储模块,用于将图像数据按照预设存储格式依次存储至动态随机存储器,以使所述动态随机存储器中相邻的所述图像数据具有连续的存储地址;
    读取模块,用于从所述动态随机存储器中读取预设数量的多通道并行的图像数据,并将所述多通道并行的图像数据存储至现场可编程逻辑门阵列的先进先出存储器;
    卷积模块,用于对所述先进先出存储器中的所述目标图像数据执行卷积操作,得到图像特征数据。
  12. 一种电子设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器调用所述存储器中的计算机程序时实现如权利要求1至10任一项方法的步骤。
  13. 一种存储介质,其特征在于,所述存储介质中存储有计算机程序,所述计算机程序被处理器加载并执行时,实现如上权利要求1至10任一项方法的步骤。
PCT/CN2021/073790 2020-05-22 2021-01-26 图像数据存储方法、图像数据处理方法、系统及相关装置 WO2021232843A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/926,966 US20230196500A1 (en) 2020-05-22 2021-01-26 Image data storage method, image data processing method and system, and related apparatus
EP21808636.1A EP4156079A4 (en) 2020-05-22 2021-01-26 IMAGE DATA STORAGE METHOD, IMAGE DATA PROCESSING METHOD AND SYSTEM AND ASSOCIATED APPARATUS

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010442519.0A CN111583095B (zh) 2020-05-22 2020-05-22 图像数据存储方法、图像数据处理方法、系统及相关装置
CN202010442519.0 2020-05-22

Publications (1)

Publication Number Publication Date
WO2021232843A1 true WO2021232843A1 (zh) 2021-11-25

Family

ID=72110954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/073790 WO2021232843A1 (zh) 2020-05-22 2021-01-26 图像数据存储方法、图像数据处理方法、系统及相关装置

Country Status (4)

Country Link
US (1) US20230196500A1 (zh)
EP (1) EP4156079A4 (zh)
CN (1) CN111583095B (zh)
WO (1) WO2021232843A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460350A (zh) * 2022-09-02 2022-12-09 白犀牛智达(北京)科技有限公司 一种基于fpga的图像处理方法和系统
CN117196931A (zh) * 2023-11-08 2023-12-08 苏州元脑智能科技有限公司 面向传感器阵列的数据处理方法、fpga及电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111583095B (zh) * 2020-05-22 2022-03-22 浪潮电子信息产业股份有限公司 图像数据存储方法、图像数据处理方法、系统及相关装置
CN113706366B (zh) * 2021-07-30 2024-02-27 浪潮电子信息产业股份有限公司 一种图像特征数据的提取方法、系统及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160342888A1 (en) * 2015-05-20 2016-11-24 Nec Laboratories America, Inc. Memory efficiency for convolutional neural networks operating on graphics processing units
CN109800867A (zh) * 2018-12-17 2019-05-24 北京理工大学 一种基于fpga片外存储器的数据调用方法
CN110674927A (zh) * 2019-09-09 2020-01-10 之江实验室 一种用于脉动阵列结构的数据重组方法
CN110826707A (zh) * 2018-08-10 2020-02-21 北京百度网讯科技有限公司 应用于卷积神经网络的加速方法和硬件加速器
CN111583095A (zh) * 2020-05-22 2020-08-25 浪潮电子信息产业股份有限公司 图像数据存储方法、图像数据处理方法、系统及相关装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101340580A (zh) * 2008-08-15 2009-01-07 上海龙晶微电子有限公司 视频硬件解码器的片外动态存储器的地址映射方法
CN102279802A (zh) * 2010-06-13 2011-12-14 中兴通讯股份有限公司 提高同步动态随机存储控制器的读操作效率的方法和装置
CN104077233B (zh) * 2014-06-18 2017-04-05 百度在线网络技术(北京)有限公司 多通道卷积层处理方法和装置
US10664405B2 (en) * 2017-11-03 2020-05-26 Google Llc In-memory distributed cache
CN109992542B (zh) * 2017-12-29 2021-11-30 深圳云天励飞技术有限公司 一种数据搬运方法、相关产品及计算机存储介质
CN109992541B (zh) * 2017-12-29 2021-09-14 深圳云天励飞技术有限公司 一种数据搬运方法、计算装置及计算机存储介质
CN109086867B (zh) * 2018-07-02 2021-06-08 武汉魅瞳科技有限公司 一种基于fpga的卷积神经网络加速系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160342888A1 (en) * 2015-05-20 2016-11-24 Nec Laboratories America, Inc. Memory efficiency for convolutional neural networks operating on graphics processing units
CN110826707A (zh) * 2018-08-10 2020-02-21 北京百度网讯科技有限公司 应用于卷积神经网络的加速方法和硬件加速器
CN109800867A (zh) * 2018-12-17 2019-05-24 北京理工大学 一种基于fpga片外存储器的数据调用方法
CN110674927A (zh) * 2019-09-09 2020-01-10 之江实验室 一种用于脉动阵列结构的数据重组方法
CN111583095A (zh) * 2020-05-22 2020-08-25 浪潮电子信息产业股份有限公司 图像数据存储方法、图像数据处理方法、系统及相关装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4156079A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115460350A (zh) * 2022-09-02 2022-12-09 白犀牛智达(北京)科技有限公司 一种基于fpga的图像处理方法和系统
CN115460350B (zh) * 2022-09-02 2024-01-12 白犀牛智达(北京)科技有限公司 一种基于fpga的图像处理方法和系统
CN117196931A (zh) * 2023-11-08 2023-12-08 苏州元脑智能科技有限公司 面向传感器阵列的数据处理方法、fpga及电子设备
CN117196931B (zh) * 2023-11-08 2024-02-09 苏州元脑智能科技有限公司 面向传感器阵列的数据处理方法、fpga及电子设备

Also Published As

Publication number Publication date
US20230196500A1 (en) 2023-06-22
CN111583095B (zh) 2022-03-22
EP4156079A1 (en) 2023-03-29
EP4156079A4 (en) 2024-03-27
CN111583095A (zh) 2020-08-25

Similar Documents

Publication Publication Date Title
WO2021232843A1 (zh) 图像数据存储方法、图像数据处理方法、系统及相关装置
CN108805266B (zh) 一种可重构cnn高并发卷积加速器
WO2018196863A1 (zh) 卷积加速和计算处理方法、装置、电子设备及存储介质
JP6767660B2 (ja) プロセッサ、情報処理装置及びプロセッサの動作方法
CN109284475B (zh) 一种矩阵卷积计算装置及矩阵卷积计算方法
WO2019084788A1 (zh) 用于神经网络的运算装置、电路及相关方法
JP6340481B2 (ja) データキャッシング方法、装置及び記憶媒体
CN107680028B (zh) 用于缩放图像的处理器和方法
WO2016070668A1 (zh) 一种实现数据格式转换的方法、装置及计算机存储介质
WO2023065983A1 (zh) 计算装置、神经网络处理设备、芯片及处理数据的方法
JP2020042774A (ja) 人工知能推論演算装置
JP2009507423A5 (zh)
CN111626405A (zh) 一种cnn加速方法、加速装置及计算机可读存储介质
CN112836813A (zh) 一种用于混合精度神经网络计算的可重构脉动阵列系统
CN108701102A (zh) 直接存储器访问控制器、数据读取方法和数据写入方法
WO2024114505A1 (zh) 一种通用、可配置的图像滤波计算多行输出系统和方法
CN111814972B (zh) 一种基于fpga的神经网络卷积运算加速方法
CN103020014A (zh) 一种大点数fft的实现方法
CN109800867B (zh) 一种基于fpga片外存储器的数据调用方法
JP2022518640A (ja) データ処理方法、装置、機器、記憶媒体及びプログラム製品
WO2019114044A1 (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN115860080A (zh) 计算核、加速器、计算方法、装置、设备、介质及系统
CN113138748B (zh) 一种基于FPGA的支持8bit和16bit数据的可配置的CNN乘法累加器
WO2021082723A1 (zh) 运算装置
US20130328903A1 (en) Efficient cache preloading

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21808636

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021808636

Country of ref document: EP

Effective date: 20221222

NENP Non-entry into the national phase

Ref country code: DE