CN113655956B - Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4 - Google Patents

Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4 Download PDF

Info

Publication number
CN113655956B
CN113655956B CN202110842225.1A CN202110842225A CN113655956B CN 113655956 B CN113655956 B CN 113655956B CN 202110842225 A CN202110842225 A CN 202110842225A CN 113655956 B CN113655956 B CN 113655956B
Authority
CN
China
Prior art keywords
data
ddr4
channel
fpga
buffer zone
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110842225.1A
Other languages
Chinese (zh)
Other versions
CN113655956A (en
Inventor
梁文豪
陈岚
许端
王述良
程建伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan Jimu Intelligent Technology Co ltd
Original Assignee
Wuhan Jimu Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan Jimu Intelligent Technology Co ltd filed Critical Wuhan Jimu Intelligent Technology Co ltd
Priority to CN202110842225.1A priority Critical patent/CN113655956B/en
Publication of CN113655956A publication Critical patent/CN113655956A/en
Application granted granted Critical
Publication of CN113655956B publication Critical patent/CN113655956B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • G06F3/0613Improving I/O performance in relation to throughput
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0629Configuration or reconfiguration of storage systems
    • G06F3/0635Configuration or reconfiguration of storage systems by changing the path, e.g. traffic rerouting, path reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0673Single storage device
    • G06F3/0679Non-volatile semiconductor memory device, e.g. flash memory, one time programmable memory [OTP]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The embodiment of the application provides a method and a system for storing and reading high-bandwidth multi-path data based on FPGA and DDR4, which are used for inputting and outputting display logic of videos with various traditional complex requirements, simplifying and abstracting the input logic into an FPGA multi-path data access unit, an FPGA multi-path data extraction unit and an FPGA multi-path data matrix configuration unit.

Description

Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4
Technical Field
The application relates to the field of video data storage reading and writing, in particular to a method and a system for a high-bandwidth multi-channel data storage reading unit based on FPGA and DDR 4.
Background
In the vision fields of automatic driving, security monitoring, video live broadcasting, video conference, machine vision and the like, there is a general demand that a plurality of camera sensors acquire input multipath high-bandwidth real-time video data, and after image enhancement, scaling, splicing and synthesizing, deep learning and other algorithm processing, video display functions are completed on a plurality of display devices. Future demands place great technical demands on the bandwidth capabilities and scalability of flexible matrix store-and-read-and-forward of extremely multi-channel high bandwidth data streams.
However, the prior art is generally a solution for a dedicated chip or FPGA to address the storage and reading of 2-4 channel video, and the storage and reading solution for multiple high-bandwidth video streams with 8-16 channels or more is not basically involved. In addition, in the prior art, various video access and output display requirements are needed, the design schemes are mostly special design processing, one-time modification of the requirements often leads to larger modification of the design schemes, the universality is not good, and the video access and output display requirements are not suitable for rapid and efficient customization.
CN 201811643205-a multimedia control system supporting a multi-display function, the invention provides a multimedia control system supporting a multi-display function, the invention only covers 16 paths of video reading of DDR and performs arbitrary four combinations to synthesize 4 paths of video, and the scheme is displayed on 4 displays through HDMI. And the scene of multi-channel DDR video simultaneous storage and buffer spot check frame processing is not covered. The invention discloses a CN 201911183242-domestic processor and domestic FPGA multi-channel 4K high-definition video comprehensive display method, relates to a domestic processor and domestic FPGA multi-channel 4K high-definition video comprehensive display method, solves the problems of storage and reading synthetic display of 4-channel videos, and is insufficient in number of video channels capable of supporting reading and writing and insufficient in video format.
As can be seen from the above description, although the existing designs of some inventions utilize the technical features of the custom circuit of the FPGA and the DDR4 high bandwidth, to some extent, storage or reading of the multi-channel video is also realized, in the upper limit of the number of configurable channels, arbitrary video formats can be supported, frame rate adaptation of the input/output video data streams, DDR control efficiency of storing and reading the multi-channel data matrix, and expandable IP are not deeply optimized and studied.
Disclosure of Invention
In view of this, the embodiments of the present application provide a method and system for high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, which combines flexible circuit customization of FPGA and high bandwidth throughput of DDR4, and satisfies the highest 16-way storage and 16-way reading, at least partially solving the problems existing in the prior art. The control processing of any video matrix stream can be supported under the condition that the total channel bandwidth capability can be satisfied.
In a first aspect, an embodiment of the present application provides an apparatus for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, the apparatus comprising: an FPGA and a number of DDR4 granule combinations connected thereto;
the FPGA is internally provided with a configurable circuit unit, and comprises a writing channel and a reading channel which can be configured into 1-16 channels by parameters, wherein each writing channel is connected with an input buffer zone, each reading channel is connected with an output buffer zone, the input buffer zone is connected with a DDR4 page data writing buffer zone through a two-stage input data selection pipeline, the output buffer zone is connected with a DDR4 page data reading buffer zone through a two-stage output data selection pipeline, the DDR4 page data writing buffer zone is connected with an arbitration controller through DDR4 user layer writing data control logic, the DDR4 page data reading buffer zone is connected with an arbitration controller through DDR4 user layer reading data control logic, the arbitration controller is connected with DDR4 user layer command address control logic, and the DDR4 page data writing buffer zone, the DDR4 page data reading buffer zone and the DDR4 user layer command address control logic are connected with DDR4 particles through a DDR4 physical layer control kernel.
According to a specific implementation manner of the embodiment of the application, the clock rate of the input variable video stream of each writing channel can be independently configured, and the channel bit width of each writing channel can be configured by parameters;
the clock rate of the output variable video stream of each read channel can be independently configured, and the channel bit width of each write read channel can be configured by parameters.
According to a specific implementation manner of the embodiment of the application, the depth of each input buffer zone can be configured according to requirements, and is configured to isolate and convert variable-rate input video, and the variable-bit-width and-rate cross-clock-domain video data are uniformly formatted into a data stream of DDR4 user clock domain.
According to a specific implementation manner of the embodiment of the application, each input buffer is configured to initiate a low-priority page moving request signal and a high-priority page moving request signal for the DDR4 page data writing buffer according to the depth of a current buffer data waterline, and wait for a channel occupation signal of the arbitration controller to respond to the request; when the channel occupying signal is valid, the dma control logic in the input buffer area moves the data to be stored to the DDR4 page data writing buffer area through a two-stage input data selecting pipeline.
According to a specific implementation manner of the embodiment of the application, each input buffer zone can be configured with parameters of the number of rows and the number of columns of the written video frames, the number of the video buffers can be configured with parameters, the base address of the written video buffer zone ddr can be configured with parameters, and each input buffer zone carries out corresponding operation on the input buffer zone according to an input control signal mark.
According to a specific implementation manner of the embodiment of the application, the depth of each output buffer zone can be configured according to requirements, and is used for isolating and converting variable-rate output video and converting data streams of DDR4 user clock domains into variable-bit-width and variable-rate cross-clock domain video data.
According to a specific implementation manner of the embodiment of the application, each output buffer is configured to initiate two page moving request signals with low and high different priorities for the DDR4 page data read buffer according to the depth of the current buffer data waterline, and wait for the DDR4 user arbitration controller to respond to the channel occupation signal of the request; when the channel occupation signal is valid, the video output buffer area waits for new data to be received in the DDR 4-page data read-out buffer area, and at the moment, the internal dma control logic of the video output buffer area can move the data to be read from the DDR 4-page data read-out buffer area to the video output buffer area of the channel through a two-stage output data selection pipeline.
According to a specific implementation manner of the embodiment of the application, each output buffer zone can be configured with parameters of the number of rows and the number of columns of the read video frames, and can be configured with parameters of the DDR base address of the read video buffer zone, and each output buffer zone carries out corresponding operation on the output buffer zone according to an output control signal mark.
According to a specific implementation manner of the embodiment of the application, the arbitration controller divides the received request signals into 4 groups according to different channels and grades, namely a high-priority write signal group, a high-priority read signal group, a low-priority write signal group and a low-priority read signal group with the priority grade from high to low, wherein the request signals of the same group have the same priority, and the arbitration controller performs arbitration response on the request signals in the group in a round robin mode.
According to a specific implementation manner of the embodiment of the present application, the DDR4 user plane write data control logic directly interfaces with the service data of one of the write channels in an effective working mode, and decouples the multi-channel user data write control layer.
According to a specific implementation manner of the embodiment of the application, the DDR4 user plane read data control logic directly interfaces with the service data of one of the read channels in an effective operation mode, and decouples the multi-channel user data read control layer.
According to a specific implementation manner of the embodiment of the present application, the DDR4 user plane command address control logic directly interfaces with the service data of one of the write channel or the read channel in an effective working mode, and decouples the multi-channel user data write control layer or the read control layer.
According to a specific implementation manner of the embodiment of the application, the DDR4 physical layer control kernel converts data and control signal buses of the DDR4 user layer into physical layer signal buses required by the DDR4 granule.
According to a specific implementation manner of the embodiment of the application, the apparatus further includes a second controller configured to control the write channel to sequentially map the memory region memory arrangement and the mapping resolution of the channel write pointer and the channel write base address in the DDR4 granule, and to control the read channel to arbitrarily asymmetrically map the memory region memory arrangement and the mapping resolution of the channel read pointer and the channel read base address in the DDR4 granule.
According to a specific implementation manner of the embodiment of the application, the device further comprises a DDR4 efficiency monitoring unit, and the percentage of availability and the percentage of actual usage of the DDR4 physical layer control kernel in a unit of a period of time can be counted through parameter configuration.
In a second aspect, an embodiment of the present application provides a method for a high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4, where the method is based on the device for a high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR 4.
In a third aspect, an embodiment of the present application provides a system for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, the system comprising: the FPGA multi-path data matrix configuration unit is respectively connected with the FPGA multi-path data access unit and the FPGA multi-path data extraction unit;
the device of the high-bandwidth multi-channel data storage and reading unit based on the FPGA and the DDR4 is configured to convert storage and cross reading of multi-channel input data into multi-channel video output data, and internally configures output channels of the multi-channel video output data with input channels of the multi-channel input data in butt joint.
The FPGA multipath data access unit is configured to decouple the front-stage input video data;
the FPGA multipath data extraction unit is configured to decouple the output video data of the later stage.
Through the data matrix scheme of the application, 16-way storage and 16-way reading of data can be performed at maximum. The system abstracts the input and output display requirements of the multi-channel complex input video into three units, namely the FPGA multi-channel data access unit, the FPGA multi-channel data extraction unit and the FPGA multi-channel data matrix configuration unit, fuses and accesses different video input data and converts and synthesizes different video output data by utilizing the characteristics of an FPGA universal glue logic circuit, and couples the complex to-be-solved to the two units of the FPGA multi-channel data access unit and the FPGA multi-channel data extraction unit.
The FPGA multi-path data matrix configuration unit has the characteristics of multi-path large data transmission quantity, is not easily influenced by external complex demand variation in logic design, is often only required to be designed once, and is high-efficiency and repeatedly used by repeatedly customizing configuration parameters and modifying internal matrix cross connection relations; thus, the design difficulty of the complex video input and output requirements is greatly reduced, and the efficiency of quickly customizing new requirements is also improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an apparatus of a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to an embodiment of the present application;
FIG. 2 is a map of memory region memory arrangement within DDR granules in accordance with an embodiment of the present application;
fig. 3 is an exemplary diagram of a system application scenario in an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present application will become apparent to those skilled in the art from the present disclosure, when the following description of the embodiments is taken in conjunction with the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present application, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the application by way of illustration, and only the components related to the application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
The embodiment combines flexible circuit customization of the FPGA and high bandwidth throughput of ddr4, proposes a data matrix scheme with maximum 16-way storage and 16-way reading, and can support control processing of any video matrix stream (cvbs, 720p,1080p,4k, etc.) under the condition that the total channel bandwidth capability can be met.
Next, with reference to the accompanying drawings, the apparatus and system for high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to the embodiments of the present application will be specifically described.
Referring to fig. 1, an apparatus for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to an embodiment of the present application includes: an FPGA and a number of DDR4 granule combinations connected thereto; in this embodiment, the description is based on the MPSOC series FPGAXCZU5EV FPGA of xilox company and the 4 MT40A512M16HADDR4 particle combination scheme of MICRON company, and in practical application, a proper FPGA chip and DDR particle combination scheme can be selected according to the requirement and cost trade-off.
The FPGA is internally provided with a configurable circuit unit, and comprises a writing channel and a reading channel which can be configured into 1-16 channels by parameters, wherein each writing channel is connected with an input buffer zone, each reading channel is connected with an output buffer zone, the input buffer zone is connected with a DDR4 page data writing buffer zone through a two-stage input data selection pipeline, the output buffer zone is connected with a DDR4 page data reading buffer zone through a two-stage output data selection pipeline, the DDR4 page data writing buffer zone is connected with an arbitration controller through DDR4 user layer writing data control logic, the DDR4 page data reading buffer zone is connected with an arbitration controller through DDR4 user layer reading data control logic, the arbitration controller is connected with DDR4 user layer command address control logic, and the DDR4 page data writing buffer zone, the DDR4 page data reading buffer zone and the DDR4 user layer command address control logic are connected with DDR4 particles through a DDR4 physical layer control kernel.
As shown in FIG. 1, a configurable circuit unit based on DDR4 maximum 16 channel writing/maximum 16 channel reading is designed in an FPGA, and comprises parameters which can be configured into 1-16 channel writing, 1-16 channel reading, clock rate of each path of input variable video stream can be independently configured, and channel bit width can be configured into 8bit/16bit/32bit and the like. The clock rate of each output variable video stream can be independently configured, and the channel bit width can be selected as 8bit/16bit/32bit and the like by parameter configuration.
As shown in FIG. 1, the depth of the input buffers 1-16 within the FPGA can be flexibly configured as required, the buffers serve to isolate and convert variable rate input video, and the variable bit-width and rate cross-clock domain video data is uniformly formatted into 512-bit wide data streams of the DDR4 user clock domain.
As shown in FIG. 1, the depth of the output buffers 1-16 within the FPGA can be flexibly configured as required, and the buffers serve to isolate and convert variable rate output video, converting 512 bit wide data streams of the DDR4 user clock domain into variable bit wide and rate output video data across the clock domain.
As shown in FIG. 1, each input buffer in the FPGA can initiate a low and high two different priority page move request req signal to the DDR4 page data write buffer according to the depth of the current buffer data pipeline, and wait for the DDR4 user to arbitrate the channel occopy signal of the request.
As shown in fig. 1, each output buffer within the FPGA may initiate a low and high two different priority page move request req signal for the DDR4 page data read out buffer according to the depth of the current buffer pipeline, and wait for the DDR4 user arbitration controller to respond to the requested channel ocupy signal. When the channel ocupy signal is active, the video output buffer will wait for new data to be received inside the DDR4 page data read out buffer, at which time the internal dma control logic of the video output buffer will move the data to be read from the DDR4 page data read out buffer to the video output buffer of the channel through the data selection pipeline of two stages 512 mex to 2048.
As shown in fig. 1, the input buffer area of each variable input video in the FPGA can configure the number of rows and columns of the video frame by parameters, can configure the number of video buffers by parameters (default, 4 rotating buffers per channel), can configure the ddr base address of the video buffers by parameters, and performs reset control on the input buffer area by using control signal marks such as valid video frames and valid rows, and the like, and performs operations such as updating the rotating buffer write pointer mark, writing data valid information mark, buffer write address switching, and the like.
As shown in fig. 1, in each output buffer area of variable output video in the FPGA, the number of rows and columns of video frames can be configured by parameters, the ddr base address of the video buffer area to be read can be configured by parameters, and the output buffer area is reset and controlled by control signal marks such as video frame request and row request, and the operations such as updating the rotating buffer area reading pointer mark, reading data valid information mark, buffer area reading address switching are performed.
As shown in fig. 1, a 16-channel write/16-channel read arbitration controller in the FPGA receives two request signals of high-level and low-level of 16 write channels and two request signals of high-level and low-level of 16 read channels, and total 16×2+16×2=64 request signals; the arbitration control logic divides the 64 request signals into 4 groups, the high priority write signals of 16 channels are denoted as group 1, the high priority read signals of 16 channels are denoted as group 2, the low priority write signals of 16 channels are denoted as group 3, the low priority read signals of 16 channels are denoted as group 4, each group is 16 request signals; the request signals of the same group have the same priority, the arbitration controller performs arbitration response on the request signals in the group in a round robin mode, and the request signals of different groups are arranged from high to low in priority, wherein the priority of the request signals in the group 1 is higher than the priority of the request signals in the group 2, higher than the priority of the request signals in the group 3 is higher than the priority of the request signals in the group 4. After responding to a certain signal request in a plurality of request signals, the arbitration controller can send out a ocupy signal to the write or read channel which obtains the DDR control right, and inform the DDR4 user layer write data control logic or the DDR4 user layer read data control logic, input or output two-stage pipeline selection circuit logic, switch data and control channels and control and interface with the read/write channel buffer circuit.
As shown in fig. 1, the DDR4 user layer write data control logic in the FPGA directly interfaces with the service data of one of the 16 write channels in the effective working mode, and the multichannel user data write control layer is decoupled, which is equivalent to one-time independent and complete DDR4 page data write data logic.
And the DDR4 user layer read data control logic in the FPGA directly interfaces with the service data of one of the 16 read channels in an effective working mode, and the multichannel user data read control layer is decoupled, which is equivalent to one-time independent and complete DDR4 page data read data logic.
As shown in fig. 1, the DDR4 user layer command address control logic in the FPGA directly interfaces with the service control signal of one of the 16 write channels/16 read channels in the active mode, and decouples the multi-channel user write/read control layer, which is equivalent to an independent and complete DDR4 page write/read control logic.
As shown in fig. 1, DDR4 MIG PHY is the DDR4 physical layer control IPCORE of xilinux, converting the data and control signal bus of the DDR4 user layer into the physical layer signal bus required by DDR4 granules.
For this embodiment, DDR4 IPCORE has a highest supported grain control clock rate of 1066Mhz, and the clock rate of the DDR4 user layer is fixed to 1/4 of the grain control rate, i.e., 266.7Mhz; the memory mapping mode of the controller is ROW-COLUMN-BANK, and each writing/reading data amount unit is performed according to an integral multiple of 4 DDR4 granular pages, namely, the minimum writing/reading is 4 times 16 kbit=64 kbi. In the design of this embodiment, the theoretical maximum transmission capacity of the combination of the DDR4 MIG PHY and the DDR4 granule of Xilinx is 1066mx 2 x 64 bits/S91% = 124.16768Gbits/S, where 91% is the theoretical efficiency of continuous single channel writing or reading provided by the DDR4 MIG PHY of Xilinx. The actual 16 write channel/16 read channel can reduce the use efficiency to a certain extent for the shared multiplexing control and arbitration flow and other treatments of the same DDR IPCORE, the actual DDR efficiency utilization can reach about 90% of the xilinx theoretical value 91%, namely the total efficiency is about 81%, and the total data bandwidth transmission capacity is 1066Mx2 x 64bits/S x 81% = 110.52288Gbits/S.
In general, the bandwidth occupancy of 4kp30 video data of YUV422 and 16bits is 3840×2160×16bits×30/s= 3.981312Gbits/S, the bandwidth occupancy of 1080p60 video data of YUV422 and 16bits is 1920×1080×60/s= 1.990656Gbits/S, and assuming that 16 input video channels (11 4kp30 video inputs, 5 1080p60 video inputs) and 16 output video channels (11 4kp30 video outputs, 5 1080p60 video outputs) are configured, the total bandwidth usage is the maximum available bandwidth of (11×3.981312+5×1.990656) 2= 107.495424Gbits/S <110.52288Gbits/S design. Therefore, the design scheme of the embodiment has strong supporting capability for storing and reading the multichannel high-bandwidth large data matrix.
According to a specific implementation manner of the embodiment of the present application, the device further includes a second controller, such as the M-channel write/N-channel read-round memory area base address arbitrary cross mapping controller & spot check frame pointer flow controller described in fig. 1, where the number of the write data channels/read data channels is configurable to 1-16, and it is assumed that M write channels and N read channels are actually configured, and the "M-channel write/N-channel read-round memory area base address arbitrary cross mapping controller & spot check frame pointer flow controller" in the FPGA in fig. 1 completes the mapping analysis of the mapping of m×4 frame memory areas on the DDR4 memory granules shown in fig. 2. Any way of asymmetrically mapping the round robin read pointers and M x 4 memory regions of N read channels on the DDR4 memory granule shown in fig. 2, and mapping resolution of channel read pointers and channel read base addresses, which are user parametrically configurable, is accomplished. Thus, the actual memory size within the DDR4 granule is only related to the number of write channels, and by default M4 memory sizes (default round robin memory size per channel) are used.
For different frame rate matching cases when the read and write are asynchronous, such as 1080p60 frame data for the write channel and 1080p30 frame data for the read channel, this is accomplished by different rate adaptation of the frame store pointer and the read pointer. The adaptation problems of full stop, empty stop, pointer step adjustment and the like of the read-write pointer are required to be controlled according to specific requirements. In addition, the values of M and N are not directly related, and can be arbitrarily designated as a numerical value between 1 and 16 and a configuration interconnection cross relation.
As shown in FIG. 1, the efficiency monitoring unit inside the FPGA can be configured with parameters to count a time interval unit, count the percentage of the utilization of the DDR4 PHY controller and the percentage of the actual utilization of the controller.
In a second aspect, an embodiment of the present application provides a method for a high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4, where the method is based on the device for a high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR 4.
In a third aspect, an embodiment of the present application provides a system for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, the system comprising: the FPGA multi-path data matrix configuration unit is respectively connected with the FPGA multi-path data access unit and the FPGA multi-path data extraction unit;
the device of the high-bandwidth multi-channel data storage and reading unit based on the FPGA and the DDR4 is configured to convert storage and cross reading of multi-channel input data into multi-channel video output data, and internally configures output channels of the multi-channel video output data with input channels of the multi-channel input data in butt joint.
The FPGA multipath data access unit is configured to decouple the front-stage input video data;
the FPGA multipath data extraction unit is configured to decouple the output video data of the later stage.
In this embodiment, as shown in the application scenario example of the present system in fig. 3, it is assumed that three videos are input from the FPGA chip, video 1 is a fisheye camera input video, video 2 is a camera input video, and video 3 is a coded background video to be synthesized that is input from an external portal. There are two external display devices, the display device 1 needs to display the original video of the fisheye camera 1 and the video after de-distortion simultaneously, and the background video of the network input. The display device 2 needs to display the video enlarged by the camera 2 and the background video input by the network at the same time.
In this embodiment, as shown in the application scenario example of the system in fig. 3, the FPGA multi-path data access unit completes decoding of network input video data, scaling of video input by the camera 2, splitting of video of the fisheye camera 1 into two paths of video of original video and de-distorted video, and sending 4 paths of video data to the FPGA multi-path data matrix configuration unit altogether.
As shown in fig. 3, in an application scenario example of the design scheme of the embodiment, the FPGA multi-path data matrix configuration unit completes storage and cross-reading of 4 paths of video input data and converts the data into 5 paths of video output data, i.e. configures m=4 and n=5, and internally configures that input channel 1 is abutted against output channel 2, input channel 2 is abutted against output channel 1, input channel 3 is abutted against output channel 3, input channel 4 is abutted against output channel 4, and input channel 4 is abutted against output channel 5.
As shown in fig. 3, in the application scenario example of the design scheme of this embodiment, the FPGA multi-path data extraction unit completes the respective synthesis of 5 paths of video output data. The synthesizer 1 synthesizes the data of the video output channels 1,2 and 4 into one path of video data, and outputs the video data to the display equipment 1 outside the FPGA. The synthesizer 2 synthesizes the data of the video output channels 3 and 5 into one path of video data, and outputs the video data to the display equipment 2 outside the FPGA.
The embodiment is only an application scene example of the design scheme, according to the template, the FPGA multi-path data access unit, the FPGA multi-path data extraction unit and the FPGA multi-path data matrix configuration unit can be flexibly designed according to actual requirements.
Through verification and test, the embodiment can simultaneously support 11 paths of 4kp30+5 paths of 1080p60 video input and 11 paths of 4kp30+5 paths of 1080p60 video output, and has strong support and expansion capability for multi-channel large-bandwidth video data storage, reading and writing.
The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".
It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.
The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the disclosure are intended to be covered by the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (8)

1. An apparatus of a high bandwidth multi-way data storage read unit based on FPGA and DDR4, the apparatus comprising: an FPGA and a number of DDR4 granule combinations connected thereto; it is characterized in that the method comprises the steps of,
the FPGA is internally provided with a configurable circuit unit, and comprises a writing channel and a reading channel which can be configured into 1-16 channels by parameters, wherein each writing channel is connected with an input buffer zone, each reading channel is connected with an output buffer zone, the input buffer zone is connected with a DDR4 page data writing buffer zone through a two-stage input data selection pipeline, the output buffer zone is connected with a DDR4 page data reading buffer zone through a two-stage output data selection pipeline, the DDR4 page data writing buffer zone is connected with an arbitration controller through DDR4 user layer writing data control logic, the DDR4 page data reading buffer zone is connected with an arbitration controller through DDR4 user layer reading data control logic, the arbitration controller is connected with DDR4 user layer command address control logic, and the DDR4 page data writing buffer zone, the DDR4 page data reading buffer zone and the DDR4 user layer command address control logic are connected with DDR4 particles through a DDR4 physical layer control kernel;
the depth of each input buffer zone can be configured according to the requirement, is configured to isolate and convert variable-rate input video, and uniformly formats the variable-bit-width and-rate cross-clock domain video data into a DDR4 user clock domain data stream;
each input buffer area is configured to initiate a low-priority page moving request signal and a high-priority page moving request signal for the DDR 4-page data writing buffer area according to the depth of a current buffer area data waterline, and wait for a channel occupation signal of the arbitration controller to respond to the request; when the channel occupation signal is valid, dma control logic in the input buffer zone moves data to be stored to the DDR4 page data writing buffer zone through a two-stage input data selection pipeline;
each input buffer zone can be configured with parameters of the number of rows and the number of columns of the written video frames, the number of the video buffers can be configured with parameters, the base address of the written video buffer zone ddr can be configured with parameters, and each input buffer zone carries out corresponding operation on the input buffer zone according to an input control signal mark;
the device also comprises a second controller configured to control the write channel to sequentially map the memory arrangement of the memory area and the mapping resolution of the channel write pointer and the channel write base address in the DDR4 granule, and to control the read channel to arbitrarily asymmetrically map the memory arrangement of the memory area and the mapping resolution of the channel read pointer and the channel read base address in the DDR4 granule;
the device also comprises a DDR4 efficiency monitoring unit, wherein the percentage of the available rate and the percentage of the actual utilization rate of the DDR4 physical layer control kernel in a time interval unit can be counted through parameter configuration.
2. The device of the high-bandwidth multi-way data storage and reading unit based on the FPGA and the DDR4 according to claim 1, wherein the clock rate of the input variable video stream of each write channel can be independently configured, and the channel bit width of each write channel can be configured;
the clock rate of the output variable video stream of each read channel can be independently configured, and the channel bit width of each read channel can be configured by parameters.
3. The apparatus of claim 1, wherein the depth of each output buffer is configurable as needed for isolating and converting variable rate output video and converting data streams of DDR4 user clock domains into variable bit-width and rate cross-clock domain video data;
each output buffer area is configured to initiate a low-priority page moving request signal and a high-priority page moving request signal for the DDR4 page data read buffer area according to the depth of a current buffer area data waterline, and wait for a channel occupation signal of a DDR4 user arbitration controller to respond to the request; when the channel occupation signal is valid, the output buffer zone waits for new data to be received in the DDR4 page data read-out buffer zone, and at the moment, the internal dma control logic of the output buffer zone moves the data to be read from the DDR4 page data read-out buffer zone to the output buffer zone of the channel through a two-stage output data selection pipeline;
and each output buffer zone can be configured with parameters of the number of rows and the number of columns of the read video frames, and can be configured with parameters of the DDR base address of the read video buffer zone, and each output buffer zone carries out corresponding operation on the output buffer zone according to an output control signal mark.
4. The device of claim 1, wherein the arbitration controller classifies the received request signals into 4 groups according to channel and class, namely, high-priority write signal groups, high-priority read signal groups, low-priority write signal groups and low-priority read signal groups with priority classes from high to low, wherein the request signals in the same group have the same priority, and the arbitration controller performs arbitration response on the request signals in the group according to a round robin mode.
5. The device of claim 1, wherein the DDR4 user plane write data control logic, in an active mode, directly interfaces with the service data of one of the write channels, decoupling the multi-channel user data write control layer;
the DDR4 user layer read data control logic is used for directly butting and reading service data of one channel of the channels in an effective working mode and decoupling the multi-channel user data read control layer;
and the DDR4 user layer command address control logic is directly connected with the business data of one of the writing channel or the reading channel in an effective working mode, and is decoupled from the multi-channel user data writing control layer or the reading control layer.
6. The apparatus of claim 1, wherein the DDR4 physical layer control core converts DDR4 user layer data and control signal buses to physical layer signal buses required by the DDR4 granule.
7. A method of high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, characterized in that the method is based on an arrangement of high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to any of the previous claims 1-6.
8. A system of high bandwidth multi-way data storage and reading units based on FPGA and DDR4, the system comprising: the FPGA multi-path data matrix configuration unit is respectively connected with the FPGA multi-path data access unit and the FPGA multi-path data extraction unit;
the FPGA multiple data matrix configuration unit is based on the device of the FPGA and DDR4 based high bandwidth multiple data storage and reading unit of any one of the preceding claims 1-6, configured to convert the storage and cross reading of multiple input data into multiple video output data, and internally configure output channels of the multiple video output data with input channels of the multiple input data being butted;
the FPGA multipath data access unit is configured to decouple the front-stage input video data;
the FPGA multipath data extraction unit is configured to decouple the output video data of the later stage.
CN202110842225.1A 2021-07-26 2021-07-26 Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4 Active CN113655956B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110842225.1A CN113655956B (en) 2021-07-26 2021-07-26 Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110842225.1A CN113655956B (en) 2021-07-26 2021-07-26 Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4

Publications (2)

Publication Number Publication Date
CN113655956A CN113655956A (en) 2021-11-16
CN113655956B true CN113655956B (en) 2024-02-02

Family

ID=78478124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110842225.1A Active CN113655956B (en) 2021-07-26 2021-07-26 Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4

Country Status (1)

Country Link
CN (1) CN113655956B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114237496B (en) * 2021-12-01 2022-05-13 苏州浪潮智能科技有限公司 Method and device for optimizing memory read-write efficiency of multi-channel system and computer equipment
CN116668985A (en) * 2023-06-25 2023-08-29 成都飞机工业(集团)有限责任公司 Low bit error rate method for wireless transmission of multi-source multi-node acquisition sensing data
CN117573044B (en) * 2024-01-18 2024-04-30 西安智多晶微电子有限公司 Method and device for expanding DDRC bit width by splicing

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6138176A (en) * 1997-11-14 2000-10-24 3Ware Disk array controller with automated processor which routes I/O data according to addresses and commands received from disk drive controllers
US6720968B1 (en) * 1998-12-11 2004-04-13 National Instruments Corporation Video acquisition system including a virtual dual ported memory with adaptive bandwidth allocation
CN102012791A (en) * 2010-10-15 2011-04-13 中国人民解放军国防科学技术大学 Flash based PCIE (peripheral component interface express) board for data storage
CN105975416A (en) * 2016-04-28 2016-09-28 西安电子科技大学 GPFA-based multichannel different-speed data transmission system
CN106371790A (en) * 2016-10-12 2017-02-01 深圳市捷视飞通科技股份有限公司 FPGA-based double-channel video multi-image segmentation display method and device
CN106445869A (en) * 2016-09-20 2017-02-22 烟台大学 FPGA (field programmable gate array) and PCIe (peripheral component interface express) based high-speed data exchange architecture
CN111143257A (en) * 2019-12-02 2020-05-12 深圳市奥拓电子股份有限公司 DDR arbitration controller, video cache device and video processing system
CN111782578A (en) * 2020-05-29 2020-10-16 西安电子科技大学 Cache control method, system, storage medium, computer equipment and application
CN112073650A (en) * 2020-09-16 2020-12-11 中航华东光电有限公司 DDR3 video cache control method based on FPGA
CN113076066A (en) * 2021-04-14 2021-07-06 湖南兴天电子科技有限公司 High-capacity high-speed storage device and operation method thereof

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9852779B2 (en) * 2014-03-12 2017-12-26 Futurewei Technologies, Inc. Dual-port DDR4-DIMMs of SDRAM and NVRAM for SSD-blades and multi-CPU servers
US20150261446A1 (en) * 2014-03-12 2015-09-17 Futurewei Technologies, Inc. Ddr4-onfi ssd 1-to-n bus adaptation and expansion controller

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6138176A (en) * 1997-11-14 2000-10-24 3Ware Disk array controller with automated processor which routes I/O data according to addresses and commands received from disk drive controllers
US6720968B1 (en) * 1998-12-11 2004-04-13 National Instruments Corporation Video acquisition system including a virtual dual ported memory with adaptive bandwidth allocation
CN102012791A (en) * 2010-10-15 2011-04-13 中国人民解放军国防科学技术大学 Flash based PCIE (peripheral component interface express) board for data storage
CN105975416A (en) * 2016-04-28 2016-09-28 西安电子科技大学 GPFA-based multichannel different-speed data transmission system
CN106445869A (en) * 2016-09-20 2017-02-22 烟台大学 FPGA (field programmable gate array) and PCIe (peripheral component interface express) based high-speed data exchange architecture
CN106371790A (en) * 2016-10-12 2017-02-01 深圳市捷视飞通科技股份有限公司 FPGA-based double-channel video multi-image segmentation display method and device
CN111143257A (en) * 2019-12-02 2020-05-12 深圳市奥拓电子股份有限公司 DDR arbitration controller, video cache device and video processing system
CN111782578A (en) * 2020-05-29 2020-10-16 西安电子科技大学 Cache control method, system, storage medium, computer equipment and application
CN112073650A (en) * 2020-09-16 2020-12-11 中航华东光电有限公司 DDR3 video cache control method based on FPGA
CN113076066A (en) * 2021-04-14 2021-07-06 湖南兴天电子科技有限公司 High-capacity high-speed storage device and operation method thereof

Also Published As

Publication number Publication date
CN113655956A (en) 2021-11-16

Similar Documents

Publication Publication Date Title
CN113655956B (en) Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4
KR101032550B1 (en) Memory system with both single and consolidated commands
CN1661583B (en) Protocol conversion and arbitration circuit and system, and method for converting and arbitrating signals
US9247157B2 (en) Audio and video data multiplexing for multimedia stream switch
US5812789A (en) Video and/or audio decompression and/or compression device that shares a memory interface
CN110083555A (en) The common die of memory devices with stand-alone interface path is realized
US8464006B2 (en) Method and apparatus for data transmission between processors using memory remapping
CN101350924A (en) Encoding multi-media signal
US20230161516A1 (en) Control method for requesting status of flash memory, flash memory system
US7558285B2 (en) Data processing system and data interfacing method thereof
US6826776B1 (en) Method and apparatus for determining signal path
WO2023107218A1 (en) High-bandwidth memory module architecture
CN112073650A (en) DDR3 video cache control method based on FPGA
US20060200606A1 (en) Bus connection method and apparatus
US7515158B2 (en) Modularly configurable memory system for LCD TV system
US7523250B2 (en) Semiconductor memory system and semiconductor memory chip
US20070041587A1 (en) Digital audio broadcasting modem interface system for receiving multi-channel and its working method
US7606983B2 (en) Sequential ordering of transactions in digital systems with multiple requestors
US20200264831A1 (en) Multi-core audio processor with phase coherency
US8605098B2 (en) Memory structure for optimized image processing
KR100819968B1 (en) Semiconductor memory system and semiconductor memory chip
JPH05282191A (en) Video field memory device for multiple system
KR100469284B1 (en) Device for controling buffer bank of digital TV
JP2002312233A (en) Signal processing device
US7623544B2 (en) Data processing system, access controlling method, access controlling apparatus and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant