CN113655956B

CN113655956B - Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4

Info

Publication number: CN113655956B
Application number: CN202110842225.1A
Authority: CN
Inventors: 梁文豪; 陈岚; 许端; 王述良; 程建伟
Original assignee: Wuhan Jimu Intelligent Technology Co ltd
Current assignee: Wuhan Jimu Intelligent Technology Co ltd
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2024-02-02
Anticipated expiration: 2041-07-26
Also published as: CN113655956A

Abstract

The embodiment of the application provides a method and a system for storing and reading high-bandwidth multi-path data based on FPGA and DDR4, which are used for inputting and outputting display logic of videos with various traditional complex requirements, simplifying and abstracting the input logic into an FPGA multi-path data access unit, an FPGA multi-path data extraction unit and an FPGA multi-path data matrix configuration unit.

Description

Method and system for high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4

Technical Field

The application relates to the field of video data storage reading and writing, in particular to a method and a system for a high-bandwidth multi-channel data storage reading unit based on FPGA and DDR 4.

Background

In the vision fields of automatic driving, security monitoring, video live broadcasting, video conference, machine vision and the like, there is a general demand that a plurality of camera sensors acquire input multipath high-bandwidth real-time video data, and after image enhancement, scaling, splicing and synthesizing, deep learning and other algorithm processing, video display functions are completed on a plurality of display devices. Future demands place great technical demands on the bandwidth capabilities and scalability of flexible matrix store-and-read-and-forward of extremely multi-channel high bandwidth data streams.

However, the prior art is generally a solution for a dedicated chip or FPGA to address the storage and reading of 2-4 channel video, and the storage and reading solution for multiple high-bandwidth video streams with 8-16 channels or more is not basically involved. In addition, in the prior art, various video access and output display requirements are needed, the design schemes are mostly special design processing, one-time modification of the requirements often leads to larger modification of the design schemes, the universality is not good, and the video access and output display requirements are not suitable for rapid and efficient customization.

CN 201811643205-a multimedia control system supporting a multi-display function, the invention provides a multimedia control system supporting a multi-display function, the invention only covers 16 paths of video reading of DDR and performs arbitrary four combinations to synthesize 4 paths of video, and the scheme is displayed on 4 displays through HDMI. And the scene of multi-channel DDR video simultaneous storage and buffer spot check frame processing is not covered. The invention discloses a CN 201911183242-domestic processor and domestic FPGA multi-channel 4K high-definition video comprehensive display method, relates to a domestic processor and domestic FPGA multi-channel 4K high-definition video comprehensive display method, solves the problems of storage and reading synthetic display of 4-channel videos, and is insufficient in number of video channels capable of supporting reading and writing and insufficient in video format.

As can be seen from the above description, although the existing designs of some inventions utilize the technical features of the custom circuit of the FPGA and the DDR4 high bandwidth, to some extent, storage or reading of the multi-channel video is also realized, in the upper limit of the number of configurable channels, arbitrary video formats can be supported, frame rate adaptation of the input/output video data streams, DDR control efficiency of storing and reading the multi-channel data matrix, and expandable IP are not deeply optimized and studied.

Disclosure of Invention

In view of this, the embodiments of the present application provide a method and system for high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, which combines flexible circuit customization of FPGA and high bandwidth throughput of DDR4, and satisfies the highest 16-way storage and 16-way reading, at least partially solving the problems existing in the prior art. The control processing of any video matrix stream can be supported under the condition that the total channel bandwidth capability can be satisfied.

In a first aspect, an embodiment of the present application provides an apparatus for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, the apparatus comprising: an FPGA and a number of DDR4 granule combinations connected thereto;

the FPGA is internally provided with a configurable circuit unit, and comprises a writing channel and a reading channel which can be configured into 1-16 channels by parameters, wherein each writing channel is connected with an input buffer zone, each reading channel is connected with an output buffer zone, the input buffer zone is connected with a DDR4 page data writing buffer zone through a two-stage input data selection pipeline, the output buffer zone is connected with a DDR4 page data reading buffer zone through a two-stage output data selection pipeline, the DDR4 page data writing buffer zone is connected with an arbitration controller through DDR4 user layer writing data control logic, the DDR4 page data reading buffer zone is connected with an arbitration controller through DDR4 user layer reading data control logic, the arbitration controller is connected with DDR4 user layer command address control logic, and the DDR4 page data writing buffer zone, the DDR4 page data reading buffer zone and the DDR4 user layer command address control logic are connected with DDR4 particles through a DDR4 physical layer control kernel.

According to a specific implementation manner of the embodiment of the application, the clock rate of the input variable video stream of each writing channel can be independently configured, and the channel bit width of each writing channel can be configured by parameters;

the clock rate of the output variable video stream of each read channel can be independently configured, and the channel bit width of each write read channel can be configured by parameters.

According to a specific implementation manner of the embodiment of the application, the depth of each input buffer zone can be configured according to requirements, and is configured to isolate and convert variable-rate input video, and the variable-bit-width and-rate cross-clock-domain video data are uniformly formatted into a data stream of DDR4 user clock domain.

According to a specific implementation manner of the embodiment of the application, each input buffer is configured to initiate a low-priority page moving request signal and a high-priority page moving request signal for the DDR4 page data writing buffer according to the depth of a current buffer data waterline, and wait for a channel occupation signal of the arbitration controller to respond to the request; when the channel occupying signal is valid, the dma control logic in the input buffer area moves the data to be stored to the DDR4 page data writing buffer area through a two-stage input data selecting pipeline.

According to a specific implementation manner of the embodiment of the application, each input buffer zone can be configured with parameters of the number of rows and the number of columns of the written video frames, the number of the video buffers can be configured with parameters, the base address of the written video buffer zone ddr can be configured with parameters, and each input buffer zone carries out corresponding operation on the input buffer zone according to an input control signal mark.

According to a specific implementation manner of the embodiment of the application, the depth of each output buffer zone can be configured according to requirements, and is used for isolating and converting variable-rate output video and converting data streams of DDR4 user clock domains into variable-bit-width and variable-rate cross-clock domain video data.

According to a specific implementation manner of the embodiment of the application, each output buffer is configured to initiate two page moving request signals with low and high different priorities for the DDR4 page data read buffer according to the depth of the current buffer data waterline, and wait for the DDR4 user arbitration controller to respond to the channel occupation signal of the request; when the channel occupation signal is valid, the video output buffer area waits for new data to be received in the DDR 4-page data read-out buffer area, and at the moment, the internal dma control logic of the video output buffer area can move the data to be read from the DDR 4-page data read-out buffer area to the video output buffer area of the channel through a two-stage output data selection pipeline.

According to a specific implementation manner of the embodiment of the application, each output buffer zone can be configured with parameters of the number of rows and the number of columns of the read video frames, and can be configured with parameters of the DDR base address of the read video buffer zone, and each output buffer zone carries out corresponding operation on the output buffer zone according to an output control signal mark.

According to a specific implementation manner of the embodiment of the application, the arbitration controller divides the received request signals into 4 groups according to different channels and grades, namely a high-priority write signal group, a high-priority read signal group, a low-priority write signal group and a low-priority read signal group with the priority grade from high to low, wherein the request signals of the same group have the same priority, and the arbitration controller performs arbitration response on the request signals in the group in a round robin mode.

According to a specific implementation manner of the embodiment of the present application, the DDR4 user plane write data control logic directly interfaces with the service data of one of the write channels in an effective working mode, and decouples the multi-channel user data write control layer.

According to a specific implementation manner of the embodiment of the application, the DDR4 user plane read data control logic directly interfaces with the service data of one of the read channels in an effective operation mode, and decouples the multi-channel user data read control layer.

According to a specific implementation manner of the embodiment of the present application, the DDR4 user plane command address control logic directly interfaces with the service data of one of the write channel or the read channel in an effective working mode, and decouples the multi-channel user data write control layer or the read control layer.

According to a specific implementation manner of the embodiment of the application, the DDR4 physical layer control kernel converts data and control signal buses of the DDR4 user layer into physical layer signal buses required by the DDR4 granule.

According to a specific implementation manner of the embodiment of the application, the apparatus further includes a second controller configured to control the write channel to sequentially map the memory region memory arrangement and the mapping resolution of the channel write pointer and the channel write base address in the DDR4 granule, and to control the read channel to arbitrarily asymmetrically map the memory region memory arrangement and the mapping resolution of the channel read pointer and the channel read base address in the DDR4 granule.

According to a specific implementation manner of the embodiment of the application, the device further comprises a DDR4 efficiency monitoring unit, and the percentage of availability and the percentage of actual usage of the DDR4 physical layer control kernel in a unit of a period of time can be counted through parameter configuration.

In a second aspect, an embodiment of the present application provides a method for a high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR4, where the method is based on the device for a high-bandwidth multi-channel data storage and reading unit based on FPGA and DDR 4.

In a third aspect, an embodiment of the present application provides a system for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, the system comprising: the FPGA multi-path data matrix configuration unit is respectively connected with the FPGA multi-path data access unit and the FPGA multi-path data extraction unit;

the device of the high-bandwidth multi-channel data storage and reading unit based on the FPGA and the DDR4 is configured to convert storage and cross reading of multi-channel input data into multi-channel video output data, and internally configures output channels of the multi-channel video output data with input channels of the multi-channel input data in butt joint.

The FPGA multipath data access unit is configured to decouple the front-stage input video data;

the FPGA multipath data extraction unit is configured to decouple the output video data of the later stage.

Through the data matrix scheme of the application, 16-way storage and 16-way reading of data can be performed at maximum. The system abstracts the input and output display requirements of the multi-channel complex input video into three units, namely the FPGA multi-channel data access unit, the FPGA multi-channel data extraction unit and the FPGA multi-channel data matrix configuration unit, fuses and accesses different video input data and converts and synthesizes different video output data by utilizing the characteristics of an FPGA universal glue logic circuit, and couples the complex to-be-solved to the two units of the FPGA multi-channel data access unit and the FPGA multi-channel data extraction unit.

The FPGA multi-path data matrix configuration unit has the characteristics of multi-path large data transmission quantity, is not easily influenced by external complex demand variation in logic design, is often only required to be designed once, and is high-efficiency and repeatedly used by repeatedly customizing configuration parameters and modifying internal matrix cross connection relations; thus, the design difficulty of the complex video input and output requirements is greatly reduced, and the efficiency of quickly customizing new requirements is also improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an apparatus of a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to an embodiment of the present application;

FIG. 2 is a map of memory region memory arrangement within DDR granules in accordance with an embodiment of the present application;

fig. 3 is an exemplary diagram of a system application scenario in an embodiment of the present application.

Detailed Description

Embodiments of the present application are described in detail below with reference to the accompanying drawings.

Other advantages and effects of the present application will become apparent to those skilled in the art from the present disclosure, when the following description of the embodiments is taken in conjunction with the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present application, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.

It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the application by way of illustration, and only the components related to the application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.

In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.

The embodiment combines flexible circuit customization of the FPGA and high bandwidth throughput of ddr4, proposes a data matrix scheme with maximum 16-way storage and 16-way reading, and can support control processing of any video matrix stream (cvbs, 720p,1080p,4k, etc.) under the condition that the total channel bandwidth capability can be met.

Next, with reference to the accompanying drawings, the apparatus and system for high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to the embodiments of the present application will be specifically described.

Referring to fig. 1, an apparatus for a high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to an embodiment of the present application includes: an FPGA and a number of DDR4 granule combinations connected thereto; in this embodiment, the description is based on the MPSOC series FPGAXCZU5EV FPGA of xilox company and the 4 MT40A512M16HADDR4 particle combination scheme of MICRON company, and in practical application, a proper FPGA chip and DDR particle combination scheme can be selected according to the requirement and cost trade-off.

As shown in FIG. 1, a configurable circuit unit based on DDR4 maximum 16 channel writing/maximum 16 channel reading is designed in an FPGA, and comprises parameters which can be configured into 1-16 channel writing, 1-16 channel reading, clock rate of each path of input variable video stream can be independently configured, and channel bit width can be configured into 8bit/16bit/32bit and the like. The clock rate of each output variable video stream can be independently configured, and the channel bit width can be selected as 8bit/16bit/32bit and the like by parameter configuration.

As shown in FIG. 1, the depth of the input buffers 1-16 within the FPGA can be flexibly configured as required, the buffers serve to isolate and convert variable rate input video, and the variable bit-width and rate cross-clock domain video data is uniformly formatted into 512-bit wide data streams of the DDR4 user clock domain.

As shown in FIG. 1, the depth of the output buffers 1-16 within the FPGA can be flexibly configured as required, and the buffers serve to isolate and convert variable rate output video, converting 512 bit wide data streams of the DDR4 user clock domain into variable bit wide and rate output video data across the clock domain.

As shown in FIG. 1, each input buffer in the FPGA can initiate a low and high two different priority page move request req signal to the DDR4 page data write buffer according to the depth of the current buffer data pipeline, and wait for the DDR4 user to arbitrate the channel occopy signal of the request.

As shown in fig. 1, each output buffer within the FPGA may initiate a low and high two different priority page move request req signal for the DDR4 page data read out buffer according to the depth of the current buffer pipeline, and wait for the DDR4 user arbitration controller to respond to the requested channel ocupy signal. When the channel ocupy signal is active, the video output buffer will wait for new data to be received inside the DDR4 page data read out buffer, at which time the internal dma control logic of the video output buffer will move the data to be read from the DDR4 page data read out buffer to the video output buffer of the channel through the data selection pipeline of two stages 512 mex to 2048.

As shown in fig. 1, the input buffer area of each variable input video in the FPGA can configure the number of rows and columns of the video frame by parameters, can configure the number of video buffers by parameters (default, 4 rotating buffers per channel), can configure the ddr base address of the video buffers by parameters, and performs reset control on the input buffer area by using control signal marks such as valid video frames and valid rows, and the like, and performs operations such as updating the rotating buffer write pointer mark, writing data valid information mark, buffer write address switching, and the like.

As shown in fig. 1, in each output buffer area of variable output video in the FPGA, the number of rows and columns of video frames can be configured by parameters, the ddr base address of the video buffer area to be read can be configured by parameters, and the output buffer area is reset and controlled by control signal marks such as video frame request and row request, and the operations such as updating the rotating buffer area reading pointer mark, reading data valid information mark, buffer area reading address switching are performed.

As shown in fig. 1, a 16-channel write/16-channel read arbitration controller in the FPGA receives two request signals of high-level and low-level of 16 write channels and two request signals of high-level and low-level of 16 read channels, and total 16×2+16×2=64 request signals; the arbitration control logic divides the 64 request signals into 4 groups, the high priority write signals of 16 channels are denoted as group 1, the high priority read signals of 16 channels are denoted as group 2, the low priority write signals of 16 channels are denoted as group 3, the low priority read signals of 16 channels are denoted as group 4, each group is 16 request signals; the request signals of the same group have the same priority, the arbitration controller performs arbitration response on the request signals in the group in a round robin mode, and the request signals of different groups are arranged from high to low in priority, wherein the priority of the request signals in the group 1 is higher than the priority of the request signals in the group 2, higher than the priority of the request signals in the group 3 is higher than the priority of the request signals in the group 4. After responding to a certain signal request in a plurality of request signals, the arbitration controller can send out a ocupy signal to the write or read channel which obtains the DDR control right, and inform the DDR4 user layer write data control logic or the DDR4 user layer read data control logic, input or output two-stage pipeline selection circuit logic, switch data and control channels and control and interface with the read/write channel buffer circuit.

As shown in fig. 1, the DDR4 user layer write data control logic in the FPGA directly interfaces with the service data of one of the 16 write channels in the effective working mode, and the multichannel user data write control layer is decoupled, which is equivalent to one-time independent and complete DDR4 page data write data logic.

And the DDR4 user layer read data control logic in the FPGA directly interfaces with the service data of one of the 16 read channels in an effective working mode, and the multichannel user data read control layer is decoupled, which is equivalent to one-time independent and complete DDR4 page data read data logic.

As shown in fig. 1, the DDR4 user layer command address control logic in the FPGA directly interfaces with the service control signal of one of the 16 write channels/16 read channels in the active mode, and decouples the multi-channel user write/read control layer, which is equivalent to an independent and complete DDR4 page write/read control logic.

As shown in fig. 1, DDR4 MIG PHY is the DDR4 physical layer control IPCORE of xilinux, converting the data and control signal bus of the DDR4 user layer into the physical layer signal bus required by DDR4 granules.

For this embodiment, DDR4 IPCORE has a highest supported grain control clock rate of 1066Mhz, and the clock rate of the DDR4 user layer is fixed to 1/4 of the grain control rate, i.e., 266.7Mhz; the memory mapping mode of the controller is ROW-COLUMN-BANK, and each writing/reading data amount unit is performed according to an integral multiple of 4 DDR4 granular pages, namely, the minimum writing/reading is 4 times 16 kbit=64 kbi. In the design of this embodiment, the theoretical maximum transmission capacity of the combination of the DDR4 MIG PHY and the DDR4 granule of Xilinx is 1066mx 2 x 64 bits/S91% = 124.16768Gbits/S, where 91% is the theoretical efficiency of continuous single channel writing or reading provided by the DDR4 MIG PHY of Xilinx. The actual 16 write channel/16 read channel can reduce the use efficiency to a certain extent for the shared multiplexing control and arbitration flow and other treatments of the same DDR IPCORE, the actual DDR efficiency utilization can reach about 90% of the xilinx theoretical value 91%, namely the total efficiency is about 81%, and the total data bandwidth transmission capacity is 1066Mx2 x 64bits/S x 81% = 110.52288Gbits/S.

In general, the bandwidth occupancy of 4kp30 video data of YUV422 and 16bits is 3840×2160×16bits×30/s= 3.981312Gbits/S, the bandwidth occupancy of 1080p60 video data of YUV422 and 16bits is 1920×1080×60/s= 1.990656Gbits/S, and assuming that 16 input video channels (11 4kp30 video inputs, 5 1080p60 video inputs) and 16 output video channels (11 4kp30 video outputs, 5 1080p60 video outputs) are configured, the total bandwidth usage is the maximum available bandwidth of (11×3.981312+5×1.990656) 2= 107.495424Gbits/S <110.52288Gbits/S design. Therefore, the design scheme of the embodiment has strong supporting capability for storing and reading the multichannel high-bandwidth large data matrix.

According to a specific implementation manner of the embodiment of the present application, the device further includes a second controller, such as the M-channel write/N-channel read-round memory area base address arbitrary cross mapping controller & spot check frame pointer flow controller described in fig. 1, where the number of the write data channels/read data channels is configurable to 1-16, and it is assumed that M write channels and N read channels are actually configured, and the "M-channel write/N-channel read-round memory area base address arbitrary cross mapping controller & spot check frame pointer flow controller" in the FPGA in fig. 1 completes the mapping analysis of the mapping of m×4 frame memory areas on the DDR4 memory granules shown in fig. 2. Any way of asymmetrically mapping the round robin read pointers and M x 4 memory regions of N read channels on the DDR4 memory granule shown in fig. 2, and mapping resolution of channel read pointers and channel read base addresses, which are user parametrically configurable, is accomplished. Thus, the actual memory size within the DDR4 granule is only related to the number of write channels, and by default M4 memory sizes (default round robin memory size per channel) are used.

For different frame rate matching cases when the read and write are asynchronous, such as 1080p60 frame data for the write channel and 1080p30 frame data for the read channel, this is accomplished by different rate adaptation of the frame store pointer and the read pointer. The adaptation problems of full stop, empty stop, pointer step adjustment and the like of the read-write pointer are required to be controlled according to specific requirements. In addition, the values of M and N are not directly related, and can be arbitrarily designated as a numerical value between 1 and 16 and a configuration interconnection cross relation.

As shown in FIG. 1, the efficiency monitoring unit inside the FPGA can be configured with parameters to count a time interval unit, count the percentage of the utilization of the DDR4 PHY controller and the percentage of the actual utilization of the controller.

In this embodiment, as shown in the application scenario example of the present system in fig. 3, it is assumed that three videos are input from the FPGA chip, video 1 is a fisheye camera input video, video 2 is a camera input video, and video 3 is a coded background video to be synthesized that is input from an external portal. There are two external display devices, the display device 1 needs to display the original video of the fisheye camera 1 and the video after de-distortion simultaneously, and the background video of the network input. The display device 2 needs to display the video enlarged by the camera 2 and the background video input by the network at the same time.

In this embodiment, as shown in the application scenario example of the system in fig. 3, the FPGA multi-path data access unit completes decoding of network input video data, scaling of video input by the camera 2, splitting of video of the fisheye camera 1 into two paths of video of original video and de-distorted video, and sending 4 paths of video data to the FPGA multi-path data matrix configuration unit altogether.

As shown in fig. 3, in an application scenario example of the design scheme of the embodiment, the FPGA multi-path data matrix configuration unit completes storage and cross-reading of 4 paths of video input data and converts the data into 5 paths of video output data, i.e. configures m=4 and n=5, and internally configures that input channel 1 is abutted against output channel 2, input channel 2 is abutted against output channel 1, input channel 3 is abutted against output channel 3, input channel 4 is abutted against output channel 4, and input channel 4 is abutted against output channel 5.

As shown in fig. 3, in the application scenario example of the design scheme of this embodiment, the FPGA multi-path data extraction unit completes the respective synthesis of 5 paths of video output data. The synthesizer 1 synthesizes the data of the video output channels 1,2 and 4 into one path of video data, and outputs the video data to the display equipment 1 outside the FPGA. The synthesizer 2 synthesizes the data of the video output channels 3 and 5 into one path of video data, and outputs the video data to the display equipment 2 outside the FPGA.

The embodiment is only an application scene example of the design scheme, according to the template, the FPGA multi-path data access unit, the FPGA multi-path data extraction unit and the FPGA multi-path data matrix configuration unit can be flexibly designed according to actual requirements.

Through verification and test, the embodiment can simultaneously support 11 paths of 4kp30+5 paths of 1080p60 video input and 11 paths of 4kp30+5 paths of 1080p60 video output, and has strong support and expansion capability for multi-channel large-bandwidth video data storage, reading and writing.

The units involved in the embodiments of the present application may be implemented by software, or may be implemented by hardware. The name of the unit does not in any way constitute a limitation of the unit itself, for example the first acquisition unit may also be described as "unit acquiring at least two internet protocol addresses".

It should be understood that portions of the present disclosure may be implemented in hardware, software, firmware, or a combination thereof.

The foregoing is merely specific embodiments of the disclosure, but the protection scope of the disclosure is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the disclosure are intended to be covered by the protection scope of the disclosure. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims

1. An apparatus of a high bandwidth multi-way data storage read unit based on FPGA and DDR4, the apparatus comprising: an FPGA and a number of DDR4 granule combinations connected thereto; it is characterized in that the method comprises the steps of,

the FPGA is internally provided with a configurable circuit unit, and comprises a writing channel and a reading channel which can be configured into 1-16 channels by parameters, wherein each writing channel is connected with an input buffer zone, each reading channel is connected with an output buffer zone, the input buffer zone is connected with a DDR4 page data writing buffer zone through a two-stage input data selection pipeline, the output buffer zone is connected with a DDR4 page data reading buffer zone through a two-stage output data selection pipeline, the DDR4 page data writing buffer zone is connected with an arbitration controller through DDR4 user layer writing data control logic, the DDR4 page data reading buffer zone is connected with an arbitration controller through DDR4 user layer reading data control logic, the arbitration controller is connected with DDR4 user layer command address control logic, and the DDR4 page data writing buffer zone, the DDR4 page data reading buffer zone and the DDR4 user layer command address control logic are connected with DDR4 particles through a DDR4 physical layer control kernel;

the depth of each input buffer zone can be configured according to the requirement, is configured to isolate and convert variable-rate input video, and uniformly formats the variable-bit-width and-rate cross-clock domain video data into a DDR4 user clock domain data stream;

each input buffer area is configured to initiate a low-priority page moving request signal and a high-priority page moving request signal for the DDR 4-page data writing buffer area according to the depth of a current buffer area data waterline, and wait for a channel occupation signal of the arbitration controller to respond to the request; when the channel occupation signal is valid, dma control logic in the input buffer zone moves data to be stored to the DDR4 page data writing buffer zone through a two-stage input data selection pipeline;

each input buffer zone can be configured with parameters of the number of rows and the number of columns of the written video frames, the number of the video buffers can be configured with parameters, the base address of the written video buffer zone ddr can be configured with parameters, and each input buffer zone carries out corresponding operation on the input buffer zone according to an input control signal mark;

the device also comprises a second controller configured to control the write channel to sequentially map the memory arrangement of the memory area and the mapping resolution of the channel write pointer and the channel write base address in the DDR4 granule, and to control the read channel to arbitrarily asymmetrically map the memory arrangement of the memory area and the mapping resolution of the channel read pointer and the channel read base address in the DDR4 granule;

the device also comprises a DDR4 efficiency monitoring unit, wherein the percentage of the available rate and the percentage of the actual utilization rate of the DDR4 physical layer control kernel in a time interval unit can be counted through parameter configuration.

2. The device of the high-bandwidth multi-way data storage and reading unit based on the FPGA and the DDR4 according to claim 1, wherein the clock rate of the input variable video stream of each write channel can be independently configured, and the channel bit width of each write channel can be configured;

the clock rate of the output variable video stream of each read channel can be independently configured, and the channel bit width of each read channel can be configured by parameters.

3. The apparatus of claim 1, wherein the depth of each output buffer is configurable as needed for isolating and converting variable rate output video and converting data streams of DDR4 user clock domains into variable bit-width and rate cross-clock domain video data;

each output buffer area is configured to initiate a low-priority page moving request signal and a high-priority page moving request signal for the DDR4 page data read buffer area according to the depth of a current buffer area data waterline, and wait for a channel occupation signal of a DDR4 user arbitration controller to respond to the request; when the channel occupation signal is valid, the output buffer zone waits for new data to be received in the DDR4 page data read-out buffer zone, and at the moment, the internal dma control logic of the output buffer zone moves the data to be read from the DDR4 page data read-out buffer zone to the output buffer zone of the channel through a two-stage output data selection pipeline;

and each output buffer zone can be configured with parameters of the number of rows and the number of columns of the read video frames, and can be configured with parameters of the DDR base address of the read video buffer zone, and each output buffer zone carries out corresponding operation on the output buffer zone according to an output control signal mark.

4. The device of claim 1, wherein the arbitration controller classifies the received request signals into 4 groups according to channel and class, namely, high-priority write signal groups, high-priority read signal groups, low-priority write signal groups and low-priority read signal groups with priority classes from high to low, wherein the request signals in the same group have the same priority, and the arbitration controller performs arbitration response on the request signals in the group according to a round robin mode.

5. The device of claim 1, wherein the DDR4 user plane write data control logic, in an active mode, directly interfaces with the service data of one of the write channels, decoupling the multi-channel user data write control layer;

the DDR4 user layer read data control logic is used for directly butting and reading service data of one channel of the channels in an effective working mode and decoupling the multi-channel user data read control layer;

and the DDR4 user layer command address control logic is directly connected with the business data of one of the writing channel or the reading channel in an effective working mode, and is decoupled from the multi-channel user data writing control layer or the reading control layer.

6. The apparatus of claim 1, wherein the DDR4 physical layer control core converts DDR4 user layer data and control signal buses to physical layer signal buses required by the DDR4 granule.

7. A method of high bandwidth multi-way data storage and reading unit based on FPGA and DDR4, characterized in that the method is based on an arrangement of high bandwidth multi-way data storage and reading unit based on FPGA and DDR4 according to any of the previous claims 1-6.

8. A system of high bandwidth multi-way data storage and reading units based on FPGA and DDR4, the system comprising: the FPGA multi-path data matrix configuration unit is respectively connected with the FPGA multi-path data access unit and the FPGA multi-path data extraction unit;

the FPGA multiple data matrix configuration unit is based on the device of the FPGA and DDR4 based high bandwidth multiple data storage and reading unit of any one of the preceding claims 1-6, configured to convert the storage and cross reading of multiple input data into multiple video output data, and internally configure output channels of the multiple video output data with input channels of the multiple input data being butted;