CN118018665A

CN118018665A - Multichannel image acquisition and processing system based on ZYNQ

Info

Publication number: CN118018665A
Application number: CN202410251481.7A
Authority: CN
Inventors: 张经纬; 王建民
Original assignee: Harbin University of Science and Technology
Current assignee: Harbin University of Science and Technology
Priority date: 2024-03-06
Filing date: 2024-03-06
Publication date: 2024-05-10

Abstract

The invention discloses a multichannel image acquisition and processing system based on ZYNQ. The system adopts ZYNQ-7000 series Soc integrated with FPGA and ARM as a main control chip, an acquisition channel consists of an Ethernet port and an optical fiber communication port, video image data acquired by different channels are stored in different areas in DDR3, a command is sent out by a test upper computer to determine which path of video is read out, and the read video image data is transmitted to the test upper computer through PCIe communication after being processed by a scaling algorithm; the system comprises a data writing module, a data reading module and an image scaling module, wherein the data writing module, the data reading module and the image scaling module are designed, the data writing module and the data reading module are used for writing and reading video data by using an AXI_DMA core which is designed, and the image scaling module is used for realizing image scaling by using an improved bilinear interpolation algorithm. The system designed by the invention has small occupied resources, is simpler and more flexible to use, can realize the reading and writing of the DDR3 at the PS end, and can prevent the tearing of the video transmission process and the video reading and writing conflict.

Description

Multichannel image acquisition and processing system based on ZYNQ

Technical Field

The invention relates to the technical field of image acquisition and processing, in particular to a ZYNQ-based multichannel image acquisition and processing system.

Background

The image acquisition is the first step of image processing, and the image acquisition is the process of converting the analog image acquisition into a digital image through equipment and the process of processing the digital image through a computer is called image processing. The method has wide application in the fields of remote sensing mapping, security monitoring, industrial instruments, aerospace, medical imaging, machine vision and the like, and the image acquisition and the image processing are widely applied. Such as continuous positioning and measurement of objects in remote sensing mapping, snapshot and marking of illegal behaviors in security monitoring, real-time shooting of tumors by an endoscope in medical treatment, and the like. With the development of image information towards high frame rate and high pixel, continuous high frame rate and high pixel image acquisition can form a video stream, the data volume contained in the video stream is large, the bandwidth of the carried information is high, and the device bandwidth for acquiring, storing and processing the video data has higher requirements, so how to realize real-time, high-speed and stable transmission of the video image data, and the real-time processing of the acquired video image data has become an increasingly focused subject at present. Meanwhile, the number of channels for video acquisition is also an important research object, for example, in the field of video monitoring, if one video is monitored, an acquisition card is used, and then huge cost expenditure is brought. If the real-time acquisition and transmission of the multipath videos are realized on one acquisition card, the utilization efficiency of the video monitoring system can be greatly improved.

Traditional transmission technologies, such as serial ports, RS485, hundred mega networks and the like, can only transmit data with small data volume and low transmission bandwidth requirements. For high frame rate, high pixel video image transmission, high speed interface technologies such as PCIe bus, optical fiber, gigabit ethernet, USB3.0, etc. are required. The gigabit Ethernet communication transmission speed is high, the synchronous experience of transmitting image data can be realized without blocking, and the gigabit Ethernet communication transmission speed has the advantages of high bandwidth, low delay and high stability. The optical fiber communication has the advantages of wide bandwidth, large transmission capacity, small loss, good electromagnetic interference resistance and the like, and can meet the requirement of high-speed data transmission. In order to receive high-speed image data over fibre channel and gigabit ethernet channels, a high-speed data bus must be selected for communication with the computer. PCIe is a brand new bus interface, and compared with ISA and PCI bus, PCIe bus speed is improved greatly, and PCIe2.0 single channel rate is 500MB/s. The advantages of gigabit Ethernet communication, optical fiber communication and PCIe bus are utilized to realize multichannel high-speed image data acquisition and transmission, so that the method has great application value.

Image processing includes aspects such as image enhancement, compression, restoration, recognition, encoding, segmentation, description, and the like, and image scaling belongs to an image compression technology, wherein image scaling is to change the resolution of an image, and by reducing the image, an image with high resolution can be converted into an image with low resolution. By enlarging the image, it is possible to convert the low resolution image into a high resolution image. In practical applications, image scaling plays an irreplaceable role. For example, in the medical field, by magnifying a captured image, a detailed portion of the image can be observed. In the field of media relay, images can be displayed on screens of different resolutions by scaling the images. With the development of images towards high pixels, the requirements of people on the speed and the real-time performance of image scaling processing are higher and higher, and how to realize the real-time performance of the image scaling processing becomes a problem which has to be studied.

For the realization of image processing, two methods, namely software realization and hardware realization, are mainly divided. The image processing is realized through software, and a certain time is required in the processing process, so that the real-time requirement cannot be met. The image processing is realized through hardware, and a special image processing chip is adopted in the main scheme, but the cost performance of the image processing chip is low for the scaling processing of the image, and the waste of resources is easily caused. Because of the rapid development of programmable logic devices, FPGAs have become the primary implementation platform for digital logic design. The FPGA is internally composed of logic gates, triggers and the like, and because the logic gates of the FPGA work in a parallel mode, the running speed of the FPGA is high, logic operation can be completed in a shorter time, compared with a CPU serial mode, the FPGA has higher calculation speed, and the application with higher calculation intensity can be optimized. Therefore, the real-time requirement of image processing can be met by selecting the scaling algorithm based on the FPGA.

Disclosure of Invention

To this end, the invention proposes a multichannel image acquisition and processing system based on ZYNQ, comprising:

the data acquisition channel is used for acquiring video data through the optical fiber communication port and the Ethernet port;

The data writing module is used for writing video data acquired through the optical fiber communication port and video data subjected to scaling treatment by the image scaling module into the data storage module by utilizing the AXI_DMA core;

The data reading module is used for reading the video data which is stored in the data storage module and is not subjected to scaling processing by utilizing the AXI_DMA core and outputting the video data to the image scaling module;

The data storage module is used for storing the video data written by the data writing module;

the image scaling module is used for realizing image scaling by utilizing an improved bilinear interpolation algorithm;

And the communication module is used for transmitting the video data subjected to scaling processing by the image scaling module to the upper computer after receiving the upper computer instruction.

Further, the system also comprises a GTP core communication module, a data alignment module and a video data analysis module, which are respectively used for receiving and sending video data from the optical fiber communication port through the GTP core, carrying out data alignment on the received video data, analyzing the video data and storing the video data into the FIFO, and simultaneously recovering the frame synchronization signals of the video images.

Further, the system also comprises a video stream forming module and a video image data packet module which are respectively used for converting the signals output from the data reading module into video stream form output and sending the video stream packets to the GTP core communication module during testing.

Further, the writing process of the axi_dma core in the data writing module includes: storing video data in an AXI write FIFO through a data splicing sub-module, starting an AXI burst writing process when the data volume in the AXI write FIFO is larger than the data volume of one line of data of an original image, and reading data from the AXI write FIFO for writing when a handshake mechanism is formed by a data writing channel; the AXI write FIFO has a write data width of 64 bits, a read data width of 64 bits, and a data splicing sub-module for converting 16bit data width input into 64bit output.

Further, the writing process of the axi_dma core in the data writing module further includes: designing a frame counter to record the written image frame number, adding 1 to the frame counter when the clock rising edge detects that the falling edge detection signal corresponding to the frame synchronization signal is high, and clearing and restarting to count when the frame counter is added to the maximum value of the frame counter; designing a writing frame offset address, when a falling edge detection signal corresponding to a frame synchronizing signal is high, adding the total byte number of an original image of one frame to the writing frame offset address, and when the frame counter counts to the maximum value, resetting the writing frame offset address to restart counting; a single bit cross time domain control sub-module is designed, wherein an asynchronous FIFO port is invoked to achieve synchronization of the frame synchronization signal with the clock signal in the AXI protocol.

Further, the reading process of the axi_dma core in the data reading module includes: when the initialization success signal and the reading channel enabling signal are both pulled high, AXI burst reading is started, and meanwhile frame buffering is carried out on image data through a multi-frame buffering frame; writing the data read in the AXI burst reading process into an AXI reading FIFO, and when the data cached in the AXI reading FIFO reaches a certain amount, raising a signal for notifying a next logic module to start reading so as to notify the outside to read out the image data from the AXI reading FIFO; the multi-frame buffer framework is used for reading only the previous frame image of the frame image which is beginning to be written when being read, and comprises: designing a frame buffer maximum value, and resetting the frame counter to restart counting when the frame counter counts to the frame buffer maximum value; a read frame offset address is designed, which is equal to the total number of bytes of the original image of the current frame read image minus 1 and multiplied by one frame.

Further, the image scaling module comprises an image data buffer control sub-module, a coordinate coefficient generation sub-module, an interpolation calculation sub-module and a FIFO control sub-module; wherein the image data buffer control submodule is designed to: 4 dual-port rams are called and used for caching four lines of image data required when the ordinate of the image point of the target is respectively Y and Y+1; each dual port ram has one write data port and one read data port from which two adjacent address data are read simultaneously.

Further, the workflow of the image data buffer control sub-module includes: when video data is read, firstly, an initial state 0 is entered, whether the number of readable data in a FIFO for storing target image data is smaller than the total number of one line of data of the target image minus 10 is judged in the initial state 0, if so, the initial state 0 is maintained, and if so, the state 2 is entered; when entering the state 2, a calculation target image is mapped to an original image ordinate enabling signal, namely whether the number of lines of the image data read out from the data storage module is smaller than the number of lines of the image data which need to be read out from the data storage module is judged, if so, the state 1 is entered, and if so, the state 3 is entered; reading data from the data storage module when the data storage module is in the state 1, and entering the state 2; in state 3, the scaling calculation is performed on the target image, while the next ordinate src_y mapped to the original image of the target image is calculated.

Further, the solution formula for calculating the ordinate src_y of the target image, which is mapped to the original image, is as follows:

Where src_y is the ordinate of the mapping point, des_y is the ordinate of a certain point of the target image, src_h is the height of the original image, and des_h is the height of the target image.

Further, the coordinate coefficient generation submodule is used for solving the coordinate of a certain point of the target image mapped to the original image, wherein the mapping formula is as follows:

Wherein des_x and des_y are coordinates of a point in the target image, src_h and src_w are vertical length and horizontal length of the original image, and des_h and des_w are vertical length and horizontal length of the target image;

after the coordinate solutions (src_x, src_y) of the mapping points are obtained, coordinate values of four adjacent points of the mapping points are further calculated.

Further, the interpolation computation submodule is used for computing and obtaining a pixel value of a certain point of the target image after coordinate values of four adjacent points of the mapping point are obtained; the FIFO control sub-module is used for caching pixel values of the target image.

The beneficial technical effects of the invention are as follows:

The invention designs a multichannel image acquisition and processing system based on ZYNQ, the system adopts Zynq-7000 series Soc integrated with FPGA and ARM as a main control chip, an acquisition channel consists of an Ethernet port at a PS end and an optical fiber communication port at a PL end, video image data acquired by different channels are stored in different areas in DDR3 at the PS end, a command is sent out by a test upper computer to determine which path of video is read out, and the read video image data is transmitted to the test upper computer through PCIe communication after being processed by a scaling algorithm; the system comprises a data writing module, a data reading module, an image scaling module, a GTP core communication module, a data alignment module, a video image data analysis module and the like, wherein the data writing module and the data reading module are used for writing and reading video data by using an AXI_DMA core which is designed, and the image scaling module is used for realizing image scaling by using an improved bilinear interpolation algorithm. The AXI_DMA core designed by the invention has small occupation of resources, is simpler and more flexible to use, can realize the reading and writing of the DDR3 at the PS end, and can prevent the tearing of the video transmission process and the video reading and writing conflict.

Drawings

The invention may be better understood by reference to the following description taken in conjunction with the accompanying drawings, which are included to provide a further illustration of the preferred embodiments of the invention and to explain the principles and advantages of the invention, together with the detailed description below.

Fig. 1 is a schematic structural diagram of a multichannel image acquisition and processing system based on ZYNQ according to an embodiment of the present invention.

Fig. 2 is a block diagram of a multichannel image acquisition and processing system based on ZYNQ according to an embodiment of the present invention.

FIG. 3 is a diagram illustrating a network port channel test framework according to an embodiment of the present invention.

FIG. 4 is a schematic diagram of an embodiment of an optical port channel test frame.

FIG. 5 is a diagram of a custom AXI_DMA core design framework in an embodiment of the present invention.

FIG. 6 is a diagram of asynchronous FIFO port signals in an embodiment of the invention.

FIG. 7 is a simulation diagram of a single bit cross-time domain control sub-module in an embodiment of the present invention.

Fig. 8 is a diagram of a design framework of an image scaling module in an embodiment of the invention.

FIG. 9 is a state jump flow chart of the image data buffer control sub-module in an embodiment of the present invention.

Fig. 10 is a state explanatory diagram of the image data buffer control sub-module in the embodiment of the present invention.

FIG. 11 is a solution flow chart for src_Y in an embodiment of the present invention.

Fig. 12 is a diagram showing an example of the structure of interpolation coefficient arithmetic logic in the embodiment of the present invention.

Fig. 13 is a schematic diagram of port signals of the video stream forming module in the embodiment of the present invention.

Fig. 14 is a state machine jump flow chart in the video data packet transmission module in the embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, exemplary embodiments or examples of the present invention will be described below with reference to the accompanying drawings. It is apparent that the described embodiments or examples are only implementations or examples of a part of the invention, not all. All other embodiments or examples, which may be made by one of ordinary skill in the art without undue burden, are intended to be within the scope of the present invention based on the embodiments or examples herein.

The invention provides a multichannel image acquisition and processing system based on ZYNQ. The system adopts Zynq-7000 series Soc integrated with FPGA and ARM as a main control chip, an acquisition channel consists of an Ethernet port at a PS end and an optical fiber communication port at a PL end, a PCIe protocol is selected as a host interface protocol, video image data acquired by different channels are stored in different areas in DDR 3at the PS end, a command is sent out by a test upper computer to determine which path of video is read out, and the read video image data is transmitted to the test upper computer through PCIe communication after being processed by a scaling algorithm. In the traditional ZYNQ-based image acquisition system design, read-write control of DDR 3at a PS end is needed to be realized by means of VDMAIP or DMAIP of the XILINX official, however, VDMAIP or DMAIP of the XILINX official needs to consult a manual and configure a register when in use, the use is complicated, and in order to reduce the complexity of the overall system design, the invention designs an AXI_DMA core based on an AXI_FULL bus principle and combined with video image transmission characteristics, and compared with VDMAIP or DMAIP of the XILINX official, the IP core has small resource occupation and simpler and more flexible use, can realize read-write of DDR 3at the PS end and can prevent tearing and video read-write collision in the video transmission process.

The embodiment of the invention provides a multichannel image acquisition and processing system based on ZYNQ, as shown in figure 1, which comprises:

the data acquisition channel 110 is used for acquiring video data through the optical fiber communication port and the Ethernet port;

the data writing module 120 is configured to write, by using the axi_dma core, video data collected through the optical fiber communication port and video data scaled by the image scaling module into the data storage module;

A data reading module 130, configured to read video data stored in the data storage module without scaling processing by using the axi_dma core, and output the video data to the image scaling module;

a data storage module 140 for storing the video data written by the data writing module;

An image scaling module 150 for implementing image scaling using a modified bilinear interpolation algorithm;

And the communication module 160 is used for transmitting the video data subjected to scaling processing by the image scaling module to the upper computer after receiving the upper computer instruction.

In this embodiment, the system preferably further includes a GTP core communication module 170, a data alignment module 180, and a video data parsing module 190, which are respectively configured to receive and transmit video data from the optical fiber communication port through the GTP core, perform data alignment on the received video data, parse the video data, and store the parsed video data in the FIFO, and simultaneously recover the frame synchronization signal of the video image.

In this embodiment, the system preferably further includes a video stream forming module 191 and a video image data packet module 192, which are respectively used for converting the signal output from the data reading module into a video stream format for outputting and sending the video stream packets to the GTP core communication module during testing.

The following describes embodiments of the present invention in detail.

The design goal of the ZYNQ-based multichannel image acquisition and processing system is to realize that images can be acquired from an optical port channel and an Ethernet channel, video image data acquired by different channels are stored in different areas in DDR3 of a PS end, a command is sent out by a test upper computer to determine which path of video is read out, and the read video image data is transmitted to the test upper computer through PCIe communication (namely a communication module 160) after being processed by a scaling algorithm. The system has strong design expansibility and wide application prospect in the aspect of high-speed image acquisition, transmission and processing.

Firstly, image acquisition is realized, and an Ethernet port for acquiring images is an Ethernet port of a PS end and is communicated with a test upper computer through a TCP protocol. And designing an SDK program at the ARM end, and analyzing and writing the received image data into DDR3 at the PS end by calling an LWIP protocol stack. The optical fiber signal acquisition port acquires image data by transmitting the image data through an optical fiber, checking the image data by GTP in an FPGA, firstly aligning the data after receiving, converting the received image data into a video stream form by a video image analysis module, and writing the video stream form into DDR3 at a PS end by an AXI_DMA core.

For the storage of image data, the storage is realized by the PS end DDR3 (i.e., the data storage module 140). Because the HP port of the Zynq platform supports the external world to carry out data interaction with the DDR3 through the AXI_FULL bus. Therefore, in view of simplifying the design complexity of the whole system, an axi_dma core is designed based on the axi_full bus principle, and reading and writing of DDR3 can be realized through the axi_dma core.

For an image acquisition channel of an Ethernet port, when the Ethernet port of a PS end acquires an image and writes into DDR3, two AXI_DMA cores are called at the moment, one AXI_DMA core is responsible for reading data of the DDR3, the writing function in the AXI_DMA core is closed, a frame interrupt signal is not sent, the reading function is opened, and the image data is read out from the DDR3 to a scaling algorithm processing module. The other AXI_DMA core is responsible for writing into DDR3, the writing function in the AXI_DMA core is opened, the reading function is closed, the image data processed by the scaling algorithm passes through the video stream forming module, then the image data is written into DDR3 through the AXI_DMA core, and a frame interrupt signal is sent to inform XDMA of reading the image data in DDR3 and transmitting the image data to an upper computer. And for the image acquisition channel of the optical fiber port, two AXI_DMA cores are called at the moment, one AXI_DMA core is responsible for reading and writing DDR3, the reading and writing functions in the AXI_DMA core are all opened, the writing function writes the image data acquired by the optical fiber port into DDR3 without sending out a frame interrupt signal, and the reading function is responsible for reading the image data in DDR3 to a scaling algorithm processing module. The other AXI_DMA core is responsible for writing into DDR3, the writing function in the AXI_DMA core is opened, the reading function is closed, the image data processed by the scaling algorithm passes through the video stream forming module, then the image data is written into DDR3 through the AXI_DMA core, and a frame interrupt signal is sent to inform XDMA of reading the image data in DDR3 and transmitting the image data to an upper computer.

In the whole system test, for the test of the image acquisition channel of the Ethernet port, because the image data sent by adopting the network port debugging assistant is unstable, a video image sending upper computer is designed by combining VS2015 and Qt5.8.1, and the video image data can be stably sent. For testing the image acquisition channel of the optical port, two optical modules are first used, and the transmitting end and the receiving end of the two optical modules are connected to form a loop. Then sending image data through the Ethernet at the PS end, writing the image data into DDR3 at the PS end, then reading the image data, and taking the image data as a test image data source of an optical port image acquisition channel through the SFP loop. The overall frame design of the system is shown in fig. 2, the network port channel test frame design is shown in fig. 3, and the optical port channel test frame design is shown in fig. 4.

The used FPGA chip model is XC7Z100FFG900-2, and XC7Z100FFG900-2 adopts Zynq-7000 series platforms unique to Xilinx, and combines a Processor System (PS) with Programmable Logic (PL) to realize perfect balance of high integration and programmability. In the Zynq-7000 series platform, the ZYNQ core is embedded in the ARM processor, and the kilomega Ethernet peripheral and the DDR3 can be utilized through configuration of the core, so that the Zynq-7000 series platform is different from the pure logic design in kilomega Ethernet communication and DDR3 read-write control design, and the Zynq-7000 series platform is much simpler and more convenient in design of Ethernet communication and DDR3 read-write control of the PS end. The Zynq series chip has the characteristic that the chip plays an important role in high-speed data acquisition and processing. Therefore, the system is designed by adopting ZYNQ series chips. The ZYNQ series chip has a unified basic architecture and is divided into a PS part and a PL part. The system designed by ZYNQ can be divided into two parts of software and hardware, the PS part is responsible for the program and operating system of the software, and the PL part can develop a logic end. In the Zynq-7000 series chip, an internal high-speed bus is adopted between the FPGA and the ARM for communication. The high-speed bus communication is realized through a standard AXI interface, and the connection mode can reduce the production cost in terms of manufacture and save a lot of resources.

1. Data writing module and data reading module (custom AXI_DMA core)

In the design of an image acquisition system based on a Zynq platform, reading and writing of the DDR3 at the PS end are usually realized by means of Vdma IP or DMA IP. Vdma is a soft core ip provided by Xilinx, which is used for converting a data Stream in AXI Stream format into a Memory Map format or converting data in the Memory Map format into an AXI Stream data Stream, so as to realize communication with DDR 3. Many video type applications require frame buffering to handle frame rate changes or to perform scaling, cropping, and other size conversion operations on the image. Vdma can control up to 32 frame memories and can freely switch frame memories, so double buffering and multi buffering operations can be easily realized. The use of Vdma in Zynq-based image, video processing systems is highly desirable. However, in the design of the image acquisition system based on the Zynq platform, the defects of Vdma IP are also obvious, for example, when Vdma IP is used, the register is configured by referring to the IP instruction manual, so that the workload of a developer is greatly increased. Because the HP port of Zynq supports the PL terminal to carry out data interaction with the DDR3 of the PS terminal through the AXI_FULL protocol, the invention designs a custom AXI_DMA core based on the AXI_FULL protocol by combining the image transmission characteristic, and the read-write of the DDR3 can be realized through the custom IP core, and the invention is convenient to use. The custom axi_dma core design framework is shown in fig. 5.

As can be seen from fig. 5, the external input signals of the axi_dma Write section include write_en, pre_image_vs, pre_image_de, pre_image_data, pre_image_clk, and wr_vs_req. Wherein the wr_en signal is a write channel open enable signal. Pre_image_vs is a frame synchronization signal. Pre_image_de is an image data valid enable signal. Pre_image_data is 24-bit wide RGB image data. Pre_image_clk is an external image data write clock signal, and wr_vs_req is an interrupt flag signal for one frame image per write DDR 3. In order to support the writing of 4K image data, the AXI_DMA write FIFO has a write data width of 64 bits, a write data depth of 1024, a read data width of 64 bits, and a read data depth of 1024. Thus, the 16bit input needs to be converted to a 64bit output by the data stitching sub-module before external image data is written into the axi_dma write FIFO. The pre_image_vs signal is synchronized with the pre_image_clk clock, and because the pre_image_vs signal needs to be synchronized with the axi_clk clock in the axi_dma write module, a single-bit cross-time-domain control sub-module is designed to realize synchronization of the pre_image_vs signal with the axi_clk signal. An asynchronous FIFO is invoked in the single bit cross-time domain control sub-module, the port signals of which are shown in fig. 6. In fig. 6, the vs_ negedge signal is an edge detection signal of the falling edge of the pre_image_vs, the reg_1 signal is connected to the null signal port, the reg_1 signal is 0 when data is written in the FIFO, and the reg_1 signal is 1 when data is read empty in the FIFO. The axi_vs signal is obtained by inverting the reg_1 signal. To more intuitively represent the principle of a single bit cross-time domain control sub-module, the simulation of this module is shown in fig. 7, and it can be seen that the pre_image_clk clock and the axi_clk clock are clocks with different frequencies, and the pre_image_vs signal is synchronized to the axi_vs on the side of the axi_clk clock.

When wr_en is enabled to be 1, the write channel is open. To record the number of frames of the image written in DDR3, a frame counter wr_vs_cnt is designed, with the maximum value of the wr_vs_cnt count set to 5. The reset restarts counting when wrvs cnt is incremented by 1 and incremented by 5 each time the axiclk clock rising edge detects that vs negedge signal is high. In order to prevent image tearing during image transmission, a write frame offset address wr_vs_ adder is designed, when the vs_ negedge signal is high, wr_vs_ adder is added with image_ feibian, image_ feibian represents the total number of bytes of an original image frame, and when vs_cnt is counted to a maximum value of 5, wr_vs_adder is cleared to restart counting. The burst write length is wr_len, and the total amount of data written once per burst is equal to one line of data of the original image in consideration of the image transfer characteristics, so 8×wr_len is equal to the original image length. When the parameter wr_rd_data_count in the AXI write FIFO is greater than or equal to wr_len, meaning that the number of data stored in the AXI write FIFO is greater than one line of data of the original image, the axi_start signal is pulled high, the axi_ awvalid signal is pulled high in the next cycle, and when the axi_ awvalid signal forms a handshake mechanism with m_axi_ AWREADY, axi_ awaddr is added with 8 x wr_len and the axi_wvalid signal is pulled high. AXI awaddr is equal to wr vs adder when AXI vs is high, for the purpose of preventing image tearing during image transmission. During the handshake between the axijwvalid signal and the m_axijwready signal, the axijread_de signal is pulled high, and after the handshake between the axijwvalid signal and the m_axijwready signal is completed, the axijread_de signal is pulled low immediately. The axiread de signal is the read enable of the AXI write FIFO. The axi_ bready signal is pulled high all the time.

From the above analysis, the entire axi_dma write process can be summarized as: firstly, storing image data into an AXI write FIFO through a data splicing sub-module, starting an AXI burst writing process when the data volume in the AXI write FIFO is larger than the data volume of one line of original image data, and reading data from the AXI write FIFO and writing the data into DDR3 when a handshake mechanism is formed by a data writing channel.

As can be seen from fig. 5, the external input signals in the axi_dma Read section include post_image_de, post_image_data, post_image_clk, sdk_initial_en, sdk_vs_cnt, read_en, and post_start. The post_image_de signal is a Read enable signal of the next-stage logic module for reading the AXI Read FIFO data, the post_image_data is target Image data stored in the AXI Read FIFO, the post_image_clk is a Read clock of the next-stage logic module, sdk _initial_en is an initialization success signal transmitted to the axi_dma Read part by the ARM terminal through the axi_lite bus, sdk _vs_cnt is a frame counter of the ARM terminal, read_en is a Read channel opening enable signal, and the post_start is a signal for notifying the next-stage logic module to start reading.

In the system, DDR3 is DDR3 of a PS end, DDR3 of the PS end is not like DDR3 of a PL end, and a next logic module can be informed of starting reading and writing of DDR3, so that bus blocking is caused in order to prevent data reading of DDR3 from being started too early, an initialization success signal sdk _initial_en is transmitted to an AXI_DMA reading part through an AXI_LITE bus at an ARM end, and after a program is burnt into an FPGA, reading and writing of DDR3 can be started.

As mentioned above, the video applications all need frame buffering to process frame rate changes or perform size conversion operations such as scaling and cropping of images, and Vdma can control up to 32 frame memories and can freely perform frame memory switching, so that double buffering and multiple buffering operations can be easily implemented. Therefore, in the system, in order to replace VDMA, the custom axi_dma core must also have a frame buffer function, so that the frame buffer function is implemented by designing a multi-frame buffer frame, and the design concept of the frame is to read only the previous frame image of the frame image that is beginning to be written when reading DDR 3. As mentioned above, a write frame counter wr_vs_cnt is designed for recording which frame image is being written. A frame buffer maximum value VS_MAX is set, and VS_MAX refers to how many frame images are buffered at maximum. When wr_vs_cnt counts to vs_max, wr_vs_cnt clears the count from the new start. The DDR3 is read and written in three cases of same writing and reading speed, and writing and reading speed is slow, in the case of same writing and reading speed, assuming that the writing speed is 30, the second frame is read when the second frame is written, the second frame is read when the third frame is written, and therefore the writing and the reading cannot conflict. In the case of slow writing and fast reading, assuming that the writing number is 30 and the reading speed is 90, the time when one frame image is written is enough to read three frames of images, since DDR3 can repeatedly read out, when the second frame image is written, the first frame is repeatedly read out all the time, and when the third frame image is written, the second frame is repeatedly read out all the time, so that the writing and reading do not collide. When the writing speed is low, the value of VS_MAX is assumed to be 3, the writing number is 90, the reading speed is 30, and one frame of image is read after three frames of images are written. When the third frame image is written, the second frame is read, when the second frame image is read for two thirds, the number of the written frames reaches the second frame image reading position, and in order to prevent the read-write conflict, the VS_MAX value is expanded to 4, and when the second frame image is read for two thirds, the number of the written frames reaches the first frame image position, so that the read-write conflict can be effectively avoided. The value of vs_max is set to 5 according to actual needs.

In the multi-channel image acquisition, if the image is acquired through the PS end Ethernet, the AXI writing part is closed, the AXI reading part is opened, the counting of the writing frame number is realized through an ARM end program, and then a PS end frame counter sdk _VS_cnt of the AXI reading part is transmitted through an AXI-lite bus. If an image is acquired through the optical fiber port, the AXI writing part and the reading part are both opened, and a wr_vs_cnt signal is transmitted to the AXI reading part through the AXI writing part for counting the number of writing frames. For multi-frame buffer frames, the rd_vs_cnt signal indicates which frame image is read, when the AXI writing section is closed, if sdk _VS_cnt is greater than 1, then rd_vs_cnt is equal to sdk _VS_cnt minus one, otherwise rd_vs_cnt is equal to VS_MAX. When the AXI write portion is open, rd_vs_cnt is equal to wr_vs_cnt minus one if wr_vs_cnt is greater than 1, otherwise rd_vs_cnt is equal to vs_max.

In order to prevent image tearing during image transmission, a read frame offset address rd_vs_ adder is also designed in the AXI read part, and since the initial address rd_vs_ adder is 0, rd_vs_ adder is equal to the total number of bytes of the original image of one frame after the subtraction of 1 from rd_vs_cnt.

When the read channel enable signal and sdk _initial_en signal are both high, the read cycle flag signal rd_cycle_flag is pulled high. When both the rd_cycle_flag and the read data flag signal rd_data_flag are high, the rd_cmd_flag is pulled high. When the read address channel forms a handshake mechanism, rd_cmd_flag is pulled low. When the rd_cmd_flag signal is pulled high and the rd_wr_data_count in the AXI read FIFO is less than 4 rows, the axi_ arvalid signal is pulled high. When the read address channel forms a handshake mechanism, the axi_ arvalid signal pulls high and the read data flag signal rd_data_flag pulls high. When the read data channel forms a handshake mechanism, the read data flag signal rd_data_flag is pulled low. The axi_ rready signal is equal to the rd_data_flag signal. In order to prevent image tearing during image transmission, a read frame offset address rd_vs_ adder is also designed in the AXI read part, and since the initial address rd_vs_ adder is 0, rd_vs_ adder is equal to the total number of bytes of the original image of one frame after the subtraction of 1 from rd_vs_cnt. When the axi_ rready signal forms a handshake mechanism with the m_axi_ RVALID signal, rd_ hcnt is incremented by 1 and incremented by 179 to clear the count from the new start when m_axi_aclk is on the rising edge. When rd_ hcnt is equal to 179, rd_ vcnt is incremented by 1 and incremented to 719 clear the count from the new start. When rd_ vcnt equals 719, rd_base_addr is assigned to AXI read axi_ araddr. The axi_write_de signal goes high. The axi_write_de signal is the write enable of the AXI read FIFO. In view of the image transfer characteristics, the total amount of data read once per burst is equal to one line of data of the original image, and 8×rd_len is equal to the number of bytes of one line of data of the original image. When rd_rd_data_count in the AXI read FIFO is greater than 32×rd_len, the post_start signal is pulled high, informing the next stage logic module that image data can be read out from the AXI read FIFO.

From the above analysis, the entire axi_dma read process can be summarized as: when the initialization success signal sdk _initial_en and the Read channel enabling signal read_en are both pulled high, AXI burst reading is started, meanwhile, the frame buffer function of image data is effectively realized through the design of a multi-frame buffer frame, data Read in the AXI burst reading process are written into an AXI reading FIFO, and when the data buffered in the AXI reading FIFO reaches a certain amount, a post_start signal is pulled high to inform the outside that the image data can be Read from the AXI reading FIFO.

2. Image scaling module

The whole image scaling module is mainly divided into an image data buffer control sub-module, a coordinate coefficient generation sub-module, an interpolation calculation module and a FIFO control sub-module. The overall design framework is shown in fig. 8.

1) An image data buffer control sub-module: is the core of the whole algorithm module (image scaling module) design.

When the image algorithm is realized by using the FPGA, if one frame of image is cached each time, the consumption of logic resources and the time delay of video transmission are greatly increased, so that a storage buffer mechanism of image data is required to be established, and the image data is timely and accurately provided for interpolation operation. According to the algorithm principle, gray values of two adjacent lines and four points of an original image need to be obtained when gray values of one point of a target image are calculated, so that in order to increase data throughput, 4 dual-port rams are designed and called at an image data buffer control submodule and used for buffering four lines of image data needed when the ordinate of an image point of the target image is Y and Y+1 point pixels respectively. Each dual port ram has a write data port and a read data port from which two adjacent address data can be read simultaneously. And the depth of each RAM is 4096, the data width is 8 bits, so as to meet the requirement of caching the maximum resolution of 4K image data. The whole image data buffer control submodule is designed in the way that data are continuously read from DDR3 through conversion of a state machine, are stored in a RAM and are then read, and a state machine flow chart and a state machine explanatory diagram are shown in figures 9 and 10.

It can be seen in conjunction with fig. 9 and 10 how the state machine within the image data buffer control sub-module proceeds. Firstly, when the logic module axi_dma notification algorithm module can read data from DDR3, that is, the pre_ready signal is pulled high, it enters an initial state 0, in the initial state 0, it is judged whether the number of readable data in the FIFO storing target image data is less than the total number of one line of data of the target image minus 10, and the reason for the minus 10 in the judging condition is that in state 3, the calculation of one line of target image pixels is started, and a period of time is delayed from the starting of the calculation of the target image pixels to the calculation of generating one line of target image pixels to the buffering of the FIFO, so that the data amount written into the FIFO is greater than the depth setting of the writing FIFO in order to prevent the premature entering into state 2, therefore, according to the code design, the number of clock delay cycles from the starting of the calculation of the target image data to the writing of one line of target image data FIFO is determined, and when the number of readable data in the FIFO storing target image data is less than the total number of one line of data of the target image minus 10, the state machine enters into state 2 from state 0.

If the number of readable data in the storage target image data FIFO satisfies less than 10 minus the total number of data in one line, this means that the original image data needs to be read from DDR3 for interpolation calculation. In bilinear interpolation algorithm principle, if the pixel value of a certain point of a target image is to be solved, the coordinate value of the certain point of the target image mapped to a point on an original image needs to be obtained first, and after the ordinate src_y value of the mapped point is obtained, which two lines of image data need to be read from DDR3 can be judged. In the logic design, it is assumed that when the value of the ordinate value src_y of the map point is known, the number of lines of raw image data required to be read from DDR3 is src_y.

When the state machine goes from the 0 state to the 2 state, a calculation target image map is generated to the original image ordinate enable signal expect_src_ vcnt _de. In order to reduce the time delay and increase the data throughput, the calculation of the ordinate src_y of the next map of the target image to the original image can also be performed simultaneously when the scaling calculation of the target image is performed in state 3. Thus, when state 3 is entered, a calculation target image is also generated mapped to the original image ordinate enable signal expect_src_ vcnt _de_1.

The ordinate of the map point is assumed to be src_y, and the solution of src_y is calculated by the following formula:

Where src_y is the ordinate of the mapping point, des_y is the ordinate of a certain point of the target image, src_h is the height of the original image, and des_h is the height of the target image. The solution flow for src_y is shown in fig. 11.

As shown in fig. 10, in the code design, in order to reduce the consumption of logic resources, fixed-point operations are used instead of floating-point operations for the representation of the decimal numbers in the algorithm. Meanwhile, in order to improve the throughput and the running speed of the system, a pipeline design is adopted in the solving and calculating process of the src_Y. In the code, des_y is replaced with dst_ vcnt because the algorithm design supports a 4k output, the ordinate dst_ vcnt bit wide of the point in the target image is set to 14 bits, initially dst_ vcnt is 0, since in the code design, the decimal part is specified to be 2 bits, 0.5 is represented as 2, for alignment with the decimal bit of 0.5, the decimal bit of dst_ vcnt bit wide is extended by 2 bits, expect _src_y0 is 2 bits, expect _src_y0 is equal to dst_ vcnt plus 2 if a solution is entered state 2 start, and Expect _src_y0 is equal to (dst_ vcnt +1) plus 2 if a solution is entered state 3 start, because in state 3, the resolved src_y is the ordinate of the mapped point of the next row 34t (34t+1) that is being calculated. If the decimal part of sy is defined as 12 bits, then Expect _src_y1 decimal is 14 bits. Finally, in order to calculate the appropriate decimal place fixed point as 12 bits, when Expect _src_y2 is calculated, expect _src_y1 decimal place is 14 bits, 2 bits are truncated as 12 bits, and 0.5 is reduced as 2048.

The final calculated Expect _src_y2 value cannot be used directly because the final calculated src_y value must be an integer, the Expect _src_y2 value is truncated, and the 12-bit fraction is removed. Expect _src_y2[25:12] is src_y_0.src_y_0 is the ordinate value of the side near the x-axis of the four adjacent points of the map point. According to the algorithm principle, when src_y_0 is 0, the number of images of the 0 th row and the 1 st row of the original image needs to be read, when src_y_0 is 1, the image data of the 1 st row and the 2 nd row of the original image needs to be read, and the like, when the value of src_y_0 is known as y, the image data of the y th row and the y+1 th row of the original image needs to be read. Thus, a value src_y is set, where src_y represents how many lines of raw image data need to be read when the value src_y_0 is known, and when src_y_0 is 0, that is, when the ordinate of a point of the target image is mapped to the upper left of the ordinate of the raw image, scaling calculation of the point needs two lines of data, namely, the 0 th line and the 1 st line, so that src_y is equal to src_y_0 plus 2.

After the src_y is obtained, in state 2, the magnitudes of src_ vcnt and src_y are compared, and src_ vcnt is the number of lines in which the image data has been read from DDR3, and src_y is the number of lines in which the image data needs to be read from DDR. The expt _src_ vcnt _de2 signal is expect_src_ vcnt _de|expect_src_ vcnt _de_1, when src_ vcnt is less than src_Y, state 2 jumps to state 1, an src_de enable signal is generated and the SRC_IW is pulled high for a clock period, and the src_de is assigned to the external request data signal pre_req, so that during the period that the src_de enable signal is pulled high, data is continuously read from DDR3, when the read data amount src_ hcnt is counted to the maximum value SRC_IW-1, one row of original image data is read from DDR3, the src_ vcnt is increased by 1, and the src_de enable signal is pulled low. As mentioned above, in order to increase the data throughput, the scaling calculation of one line of data of the target image can also be performed to calculate the ordinate of the next line of the target image mapped to the original image, so that when in state 3, one src_de enable signal can be generated and src_iw is pulled high for several clock cycles, and one line of original image data is read from DDR 3.

In state 2, two lines of raw image data are required for calculating the ordinate of the mapping point dst_ vcnt, and in state 3, two lines of raw image data are also required for calculating the ordinate of the mapping point (dst_ vcnt +1), so four dual port RAMs are required to buffer the four lines of raw image data. In order to reduce the code amount, in the algorithm module, four dual-port RAM caches original image data by for loop statement, the loop control condition is (i=0, i <4; i=i+1). The write address enable signal for the dual port RAM is wr_addr_de [ i ], the write port address signal wr_addr [ i ], the write port data signal douta [ i ], the read port address signal rd_addr [ i ], the read port data signal doutb [ i ]. When the write address enable signal wr_addr_de [ i ] is pulled high, data can be written into the RAM, so that the write address enable signal wr_addr_de [ i ] determines which dual-port RAM to write one row of original image data into. The external request read data signal pre_req is equal to the src_de enable signal, when the src_de enable signal is pulled high, the pre_req is pulled high simultaneously, the write address counter wr_addr_cnt starts from an initial value of 0 during the period that the src_de enable signal is pulled high, 1 is added every time a clock rising edge is detected, and the maximum count of wr_addr_cnt is cleared to SRC_IW-1. When the write RAM address selection signal wr_addr_sel has an initial value of 0 and when wr_addr_cnt counts to src_iw-1, meaning that one line of original image data is read from DDR3, the write RAM address selection signal wr_addr_sel is incremented by 1 and the maximum of wr_addr_sel count to 3 is cleared. When the wr_addr_de [ i ] signal goes high, it represents that one line of image data needs to be written to the i-th RAM at this time. During the period of pulling up the wr_addr_de [ i ], pre_wr_addr [ i ] is added with 1 from 0, and is added to the maximum value SRC_IW-1 for zero clearing, and meanwhile, the pre_wr_addr [ i ] is assigned to the write RAM address signal wr_addr [ i ].

From the foregoing analysis, when original image data is read from DDR3, line 0 is written to RAM [0], line 1 is written to RAM [1], line 2 is written to RAM [2], line 3 is written to RAM [3], line 4 is written to RAM [0], line 5 is written to RAM [1],. With this, from row 0, one cycle every four rows, write into four RAMs in turn.

In state 2, if src_ vcnt is greater than src_y, it indicates that the number of raw image data lines read from DDR3 has met the interpolation requirement, so the state machine enters state 3. In state 3, when the calculation of one line of target image data is not scaled, an enable signal dst_de is generated, and when the enable signal dst_de is pulled high, the calculation of the coordinate coefficient module is started. As described above, to obtain the pixel value of a certain point in the target image, image data of four points in two adjacent lines of the original image is required, and to obtain image data of four points in two adjacent lines of the original image, coordinate values of four points are required. Before writing a line of image data into the RAM, the ordinate of the line of image data has been found, and the memory address of another line of image data in the RAM can be regarded as the abscissa of the line of image data, so if two points of original image data in the same line are to be read from the dual port RAM, the abscissas src_x0, src_x1 of the two points need to be used as the read addresses of the two ports of the RAM. The src_x0 and the src_x1 are calculated by the coordinate coefficient module.

As mentioned above, it is determined which of the dual port RAMs to write one line of original image data to is determined by the wr_addr_de [ i ] signal, and in order to correspond to the wr_addr_de [ i ] signal, when it is known what value of src_y0 (src_y0 and src_y0 are all defined as the ordinate on the x-axis side of four points adjacent to the mapping point), which of the two dual port RAMs to read is determined by the signal src_mod. The value of src_mod always loops from 0 to 3, given that src_y0 has a remainder of 4. src_y0 is the ordinate value of the upper one of the two adjacent rows of the original image mapping point, and the other ordinate src_y1 is src_y0 plus 1. When the image is read for one frame, the image is read from top to bottom, when src_y0 is 0, the other ordinate src_y1 is 1, and src_mod is 0, and at this time, the data of the 0 th row and the 1 st row need to be read from the RAM [0] and the RAM [1 ]. When src_y0 is 1, src_mod is 1, at this time, row 1 and row 2 data need to be read from RAM [1] and RAM [2], and when src_y0 is 2, src_mod is 2, at this time, row 2 and row 3 data need to be read from RAM [2] and RAM [3 ]. When src_y0 is 3, src_mod is 3, and at this time, the 3 rd and 4 th line data need to be read from the RAM [3] and the RAM [0]. By analogy, when src_mod is 0, RAM [0] and RAM [0] need to be read, when src_mod is 1, RAM [1] and RAM [2] need to be read, when src_mod is 2, RAM [2] and RAM [3] need to be read, and when src_mod is 3, RAM [3] and RAM [0] need to be read. The reading of the RAM corresponds to the distribution of the original image data of each row in the RAM, which is obtained by the previous analysis. Therefore, no matter what the value of src_y0 is, the corresponding two lines of original image data can be always acquired through the two dual-port RAMs.

During the period that dst_de enable signal is pulled high, the column counter is continuously increased by 1, when the column counter is added to the maximum value DST_IW-1, one line of data of the target image is scaled and calculated, then the state jumps to the state 0, and whether the total number of data written in the FIFO is smaller than the total number of one line of data minus 10 is continuously judged.

2) Coordinate coefficient generation submodule

In bilinear interpolation algorithm principle, a pixel value of a certain point of a target image is obtained, and coordinates of the point mapped to an original image need to be obtained first. The formula is as follows:

Where des_x and des_y are coordinates of a point in the target image, src_h and src_w are a vertical length and a horizontal length of the original image, and des_h and des_w are a vertical length and a horizontal length of the target image. And obtaining a mapping point coordinate solution (src_x, src_y) according to the above formula, and obtaining coordinate values of four adjacent mapping points after solving the coordinate values of the mapping points. Four points are arranged on two adjacent rows and two columns, the ordinate of the two rows is src_y0 and src_y1, and the abscissa of the two columns is src_x0 and src_x1. And src_y0 plus 1 is equal to src_y1, and src_x0 plus 1 is equal to src_x1. The mapping point coordinates (src_x, src_y) are provided with decimal parts, the decimal parts of the mapping point coordinates (src_x, src_y) are obtained by the algorithm principle, the src_x0 and the src_y0 can be obtained, and the src_x1 and the src_y1 can be further deduced.

In order to improve the data throughput and the image transmission speed of the system, an arithmetic logic structure with pipeline characteristics is designed, as shown in fig. 12. As can be seen from fig. 12, when the state 3 is entered, the dst_de enable signal is generated, when the dst_de signal is pulled high, the computation of the src_xf0 is started, in order to reduce the consumption of logic resources, fixed-point computation is adopted for the decimal processing, in the first step of computing the src_xf0, the decimal is fixed-point by two bits, 0.5 is represented as 2, and dst_ vcnt is extended by 2 bits in order to align with 0.5 during computation. Since the sx decimal point is 12 bits, there are 14 decimal places in src_xf1. When calculating src_xf2, setting the decimal place to 12 bits, 0.5 is expressed as 2048, meanwhile, intercepting 2 decimal places for src_xf1, when calculating src_x0, intercepting 12 decimal places for src_xf2, and then, src_xf2[25:12] is an integer, so that src_x0 is obtained. Similarly, the solution flow for src_y0, src_y1 is the same as that of FIG. 12, with the only difference that the first step of solution for src_y0 starts on the basis of dst_ vcnt counter.

3) Interpolation calculation module

When the coefficient generation sub-module obtains the pixel values of four adjacent points of the mapping point, the pixel value of one point of the target image can be calculated by combining the coordinates of the four points. The calculation formula is as follows:

f(x,y)＝f(Q₁₁)W₁₁+f(Q₂₁)W₂₁+f(Q₁₂)W₁₂+f(Q₂₂)W₂₂

Where f (Q ₁₁) is the pixel value at coordinates (src_x0, src_y0), w ₁₁ is (x 1-x) (y-y 1), f (Q ₂₁) is the pixel value at coordinates (src_x1, src_y0), w ₂₁ is the pixel value at coordinates (x-x ₀)(y₁-y),f(Q₁₂) (src_x0, src_y1), w ₁₂ is the pixel value at coordinates (x 1-x) (y-y 0), f (Q ₂₂) is the pixel value at coordinates (src_x1, src_y1), and w ₂₂ is the pixel value at coordinates (x-x 0) (y-y 0).

When the coefficient generation submodule solves the src_x0 and the src_x1, outputs the obtained values to the image data buffer control submodule, and reads the RAM, the src_x0 and the src_x1 are used as read addresses of two ports of the RAM, so that pixel values of four points of coordinates (src_x0, src_y0), (src_x0, src_y1), (src_x1, src_y0) and (src_x1) can be obtained, and then the pixel value of one point of the target image can be calculated by combining the formulas.

4) FIFO control submodule

The image value of the target image calculated by the interpolation calculation module is not directly output to the next logic module, but is firstly buffered in the FIFO. The fifo_write_depth value of the FIFO is set to be greater than dst_iw, where dst_iw is the data amount in the horizontal direction of the target image, meaning that the FIFO can at least buffer one line of target image data, and when rd_data_count of the FIFO is greater than dst_iw, a post_ready signal is pulled high to inform the next logic module that the target image data can start to be read from the FIFO.

3. PCIE communication module

In the present system, PCIE communications are implemented through a call XDMA core. DMA Subsystem forPCIExpressIP provided by Xilinx is a high-performance, configurable SG-mode DMA suitable for PCIE2.0, PCIE3.0, providing a user selectable AXI4 interface or AXI4-Stream interface. The AXI4 interface is generally configured to be added to bus interconnection, is suitable for asynchronous transmission of large data quantity, DDR is generally used, and the AXI4-Stream interface is suitable for low-delay data Stream transmission. XDMA is SGDMA, not BlockDMA, and in SG mode, the host will compose the data to be transferred into a linked list form, and then transfer the linked list head address to the XDMA through the BAR, where the XDMA will sequentially complete the transfer task specified by the linked list according to the linked list structure head address.

The read-write section is divided into two types, one is read-write of data and the other is read-write of configuration data. In the data read-write part, for DDR3 at PL end, XDMA controls DDR3 through MIG, and for DDR3 at PS end, XDMA controls DDR3 through AXI-FULL bus. Configuration data read-write is completed through AXI-lite bus connection with BRAM, XDMA stores PCIe configuration information in BRAM, maps an incoming host to a user logical address when performing configuration information read-write, and then processes with offset address (physical address=segment address < <4+ offset address), so that offset address setting of BRAM needs to be the same as offset address mapped by host address when setting BRAM. Since the system uploads data to the host computer through XDMA, when the AXI_DMA writing part writes one frame of image, a frame interrupt signal wr_vs_req needs to be output to xdma core to inform the host computer of reading the data.

4. GTP core communication module

Video data received from the fiber communication port is first received through the GTP core.

Data transmission is generally implemented by using a parallel bus, i.e. a clock line and a parallel data bus, where data is transmitted on the clock edge or on the clock double edge. However, when the clock frequency of sampling is high, the quality of the clock and the data are unstable in the transmission process, which is the limitation of the parallel bus. The high-speed serial bus does not need to transmit data at the edges of the clock or at the double edges of the clock, and the clock signal can be recovered from the transmitted data. Compared with a parallel bus, the serial bus has higher transmission efficiency and accuracy in data transmission. GTP is a high-speed serial transceiver with characteristics of universality, usability, low power consumption and low cost, which is proposed by Xilinx company, and is often applied to board-level communication, and the linear speed supported by GTP is 6.6Gb/s at maximum. GTP transceivers allow the physical layer to support various protocols, such as ：PCI Express,Revision1.1/2.0;Serial RapidIO(SRIO);Serial Digital Interface(SDI);10Gb Attachment Unit;Interface(XAUI), etc.

There are eight sets of transceivers (one for each lane) for the ZYNQ7000 series FPGA. Four groups GTPE2_CHANNEL and one GTXE2_COMMON are combined together to call the Quad. The transceivers of Xilinx are in units of a Quad comprising 4 sets of transceivers, a pair of PLLs, each set of transceivers consisting of a transmit channel (TX) and a receive channel (RX). Each Quad shares two phase-locked loops, PLL0 and PLL1, respectively, and each group of transceivers in the Quad can use either phase-locked loop as the drive clock for GTP IP. GTP belongs to a hard core, and in the system, a development platform is Vivado2019.1. In the design of the next logic module, the user interface can realize the receiving and transmitting of the optical fiber data through the user interface by using the GTP user interface signal.

5. Data alignment module

After receiving through the GTP core, data alignment is performed.

The GTP transceiver external user data interface is 32 bits wide and the internal data width is 20 bits (8 b/10b conversion). In the actual test, the transmitted 32-bit data may shift by 16-bit data, that is, the transmitted data and the received data may shift by 16-bit.

The K-code control word is added when the GTP transmits the sync signal and the garbage data, and the byte_ctrl signal is set to 0001, and if the 16-bit data shift occurs, the K-code control word is shifted when the sync signal and the garbage data are received, and the byte_ctrl signal becomes 0100. Therefore, in programming, whether the received GTP data is shifted is judged by judging the value of the byte_ctrl signal, if the received byte_ctrl is 0100, the received data is shifted, otherwise, the received data is directly output without shifting. The design concept for solving the data shift is that when receiving the input of the data rx_data_out, the data rx_data_out is registered in one beat to the rx_data_in_d0, then registered in one beat to the rx_data_in_d1, when the signal byte_ctrl is 0100, the final output data rx_data_out is equal to { rx_data_in_d0[15:0], and rx_data_in_d1[31:16] }.

6. Video data analysis module

The next logic module of the signal output of the video data analysis module is an axi_dma write part, and the input signals of the axi_dma write part are pre_image_vs, pre_image_de, pre_image_data and pre_image_clk, so that the signal output of the video image data analysis module also needs to be in the form of a video stream. Since only a part of the 32-bit data received from the 32-bit data alignment module is data of the video image, and the other is a frame synchronization signal, a transmission data start signal, a transmission data end signal and a useless data signal, the video image data needs to be parsed and stored in a 32-bit in-16-bit out-of-FIFO in the video data parsing module, and the frame synchronization signal of the video image is restored. In the video image data packet module, six constants of an image frame synchronization signal 0, an image frame synchronization signal 1, a data start signal 0, a data start signal 1, a data end signal 0, a data end signal 1, and a garbage signal are defined. When the image frame synchronization signal is received, the video image frame signal vs is pulled up. When a data start signal is received, writing of the received data into the FIFO is started. When the data end signal is received, the FIFO read enable signal rd_en is pulled high, the data is read out from the FIFO, and the valid data signal de in the video stream form is equal to rd_en.

7. Video stream forming module

For the design of the optical fiber communication port, namely an optical port channel test scheme, an image is sent through a test upper computer and transmitted through a PS end Ethernet, then written into DDR3, read out by an AXI_DMA core, and looped back through SFP to serve as an optical port channel test data source. As can be seen from the AXI read part in 2.3.3, the signals associated with the logic module at the next stage in the AXI read part include post_image_de, post_image_data, post_image_clk and post_start. Since the input of the video Image data module is the video stream format input, the video stream forming module is used for converting post_image_de, post_image_data, post_image_clk and post_start signals into video stream format output. The video stream forming module port signal diagram is shown in fig. 13.

The design concept of the video stream forming module is that when the start signal is pulled high, meaning that the axi_dma core can start reading data from DDR3, during the period of the start signal pulling high, the line counter hcnt is incremented by 1 each time clk rises, and is cleared to total_iw-1, and the count is restarted, where total_iw is the length of one frame of image. The column counter vcnt is incremented by 1 each time the row counter hcnt counts to total_iw-1, and the column counter vcnt is cleared to restart counting when it is added to total_ih-1. When vcnt is greater than or equal to 2, the frame signal vs is pulled high, otherwise pulled low. When the row counter hcnt satisfies the condition that h_start is greater than or equal to h_start and less than or equal to h_start+active_iw (h_start is the row valid START calculation position and active_iw is the row valid length), the row valid signal h_de is pulled high, otherwise pulled low. When the column counter vcnt satisfies the conditions that v_start is greater than or equal to v_start and less than or equal to v_start+active_ih (v_start is the column valid START calculation position and active_ih is the column valid height), the column valid signal v_de is pulled high, otherwise pulled low.

When the row valid signal h_de and the column valid signal v_de are simultaneously high, the req signal is pulled high, data can be read from the AXI_DMA read FIFO, the req signal is delayed for one period to obtain the de signal, and externally read data req_data is assigned to data through a D trigger.

8. Video image data packet module

First, before a frame of image starts to be transmitted, GTP will send a synchronization signal. And judging the data quantity in the FIFO, if a certain amount of video data is not contained in the FIFO, sending useless data by the GTP, and when a certain amount of data is contained in the FIFO, sending a data start signal by the GTP, and then sending the video image data out through the GTP. When the data in the FIFO is fast read and empty, the GTP sends a data end signal, and then returns to the initial state, and whether the data amount in the FIFO reaches the data sending requirement is continuously judged.

The initial state is tx_ unuse _data, and in the state tx_ unuse _data, the transmission data signal gt_tx_data is 32'h55a109bc, and the k code control word signal gt_tx_ctrl is 4' b0001. The rising edge of the frame signal can be detected by signal vs_ pose, when vs_ pose is high, a frame image is marked as going into the transfer process, and at the same time, the state machine jumps from tx_ unuse _data to tx_vs_ pose0. While in state tx_vs_ pose0, the state machine jumps immediately to state tx_vs_ pose1, while the transmit data signal gt_tx_data is 32'h55a101bc and the k code control word signal gt_tx_ctrl is 4' b0001. When in state tx_vs_ pose1, the state machine jumps immediately to state tx_ unuse _data, while transmitting data signal gt_tx_data of 32'h55a102bc and k code control word signal gt_tx_ctrl of 4' b0001. When in state tx_ unuse _data, (data_start & & & _ data_start_d0) is equal to 1, and a certain amount of image data is stored in the flag FIFO, so that the process of transmitting image data is entered, the state machine jumps to tx_data_start0, when in state tx_data_start0, the state machine immediately jumps to tx_data_start1, and simultaneously the transmission data signal gt_tx_data is 32'h55a105bc, and the k code control word signal gt_tx_ctrl is 4' b0001. When in state tx_data_start1, the state machine jumps immediately to tx_send_data, while sending data signal gt_tx_data of 32'h55a106bc and k code control word signal gt_tx_ctrl of 4' b0001. When in the state tx_send_data, if the signal fifo_almost_empty is 0, data reading to FIFO is started, the signal rd_en is 1, the transmit data signal gt_tx_data is connected to FIFO read data port dout, and the K code control word signal gt_tx_ctrl is 4'd0000. When (fifo_all_empty & ≡fifo_empty) is 1, it indicates that the data in FIFO is ready to read empty, and the signal rd_en is 0, and the data reading in FIFO is stopped. When (fifo_all_empty & ≡fifo_empty) is 0, the state machine jumps to tx_data_end0 at this time, when the state machine is in the state tx_data_end0, the state machine jumps to tx_data_end1 immediately, and simultaneously the transmission data signal gt_tx_data is 32'h55a107bc, and the k code control word signal gt_tx_ctrl is 4' b0001. When the state machine is in state tx_data_end1, the state machine jumps immediately to tx_ unuse _data, while the transmit data signal gt_tx_data is 32'h55a108bc and the k code control word signal gt_tx_ctrl is 4' b0001. The state machine jump is shown in fig. 14.

While the invention has been described with respect to a limited number of embodiments, those skilled in the art, having benefit of the above description, will appreciate that other embodiments are contemplated within the scope of the invention as described herein. The disclosure of the present invention is intended to be illustrative, but not limiting, of the scope of the invention, which is defined by the appended claims.

Claims

1. A ZYNQ-based multichannel image acquisition and processing system, comprising:

2. The ZYNQ-based multichannel image acquisition and processing system of claim 1, further comprising a GTP core communication module, a data alignment module, and a video data parsing module configured to receive and transmit video data from the fiber communication port through the GTP core, to data align the received video data, to parse the video data, and to store the parsed video data in the FIFO, respectively, while recovering frame synchronization signals of the video image.

3. The ZYNQ-based multichannel image acquisition and processing system of claim 2, further comprising a video stream forming module and a video image data packetizing module for converting signals output from the data reading module into video stream format output and transmitting video stream packetizing to the GTP core communication module, respectively, upon testing.

4. A ZYNQ-based multichannel image acquisition and processing system according to claim 1 or 2, wherein the process of writing in the data writing module using axi_dma core comprises: storing video data in an AXI write FIFO through a data splicing sub-module, starting an AXI burst writing process when the data volume in the AXI write FIFO is larger than the data volume of one line of data of an original image, and reading data from the AXI write FIFO for writing when a handshake mechanism is formed by a data writing channel; the AXI write FIFO has a write data width of 64 bits, a read data width of 64 bits, and a data splicing sub-module for converting 16bit data width input into 64bit output.

5. The ZYNQ-based multichannel image acquisition and processing system of claim 4, wherein the process of writing in the data writing module using an axi_dma core further comprises: designing a frame counter to record the written image frame number, adding 1 to the frame counter when the clock rising edge detects that the falling edge detection signal corresponding to the frame synchronization signal is high, and clearing and restarting to count when the frame counter is added to the maximum value of the frame counter; designing a writing frame offset address, when a falling edge detection signal corresponding to a frame synchronizing signal is high, adding the total byte number of an original image of one frame to the writing frame offset address, and when the frame counter counts to the maximum value, resetting the writing frame offset address to restart counting; a single bit cross time domain control sub-module is designed, wherein an asynchronous FIFO port is invoked to achieve synchronization of the frame synchronization signal with the clock signal in the AXI protocol.

6. A ZYNQ-based multichannel image acquisition and processing system according to claim 1 or 2, wherein the process of reading with an axi_dma core in the data reading module comprises: when the initialization success signal and the reading channel enabling signal are both pulled high, AXI burst reading is started, and meanwhile frame buffering is carried out on image data through a multi-frame buffering frame; writing the data read in the AXI burst reading process into an AXI reading FIFO, and when the data cached in the AXI reading FIFO reaches a certain amount, raising a signal for notifying a next logic module to start reading so as to notify the outside to read out the image data from the AXI reading FIFO; the multi-frame buffer framework is used for reading only the previous frame image of the frame image which is beginning to be written when being read, and comprises: designing a frame buffer maximum value, and resetting the frame counter to restart counting when the frame counter counts to the frame buffer maximum value; a read frame offset address is designed, which is equal to the total number of bytes of the original image of the current frame read image minus 1 and multiplied by one frame.

7. The ZYNQ-based multichannel image acquisition and processing system of claim 1 or 2, wherein the image scaling module comprises an image data buffer control sub-module, a coordinate coefficient generation sub-module, an interpolation calculation sub-module, and a FIFO control sub-module; wherein the image data buffer control submodule is designed to: 4 dual-port rams are called and used for caching four lines of image data required when the ordinate of the image point of the target is respectively Y and Y+1; each dual port ram has one write data port and one read data port from which two adjacent address data are read simultaneously.

8. The ZYNQ-based multichannel image acquisition and processing system of claim 7, wherein the workflow of the image data buffer control sub-module comprises: when video data is read, firstly, an initial state 0 is entered, whether the number of readable data in a FIFO for storing target image data is smaller than the total number of one line of data of the target image minus 10 is judged in the initial state 0, if so, the initial state 0 is maintained, and if so, the state 2 is entered; when entering the state 2, a calculation target image is mapped to an original image ordinate enabling signal, namely whether the number of lines of the image data read out from the data storage module is smaller than the number of lines of the image data which need to be read out from the data storage module is judged, if so, the state 1 is entered, and if so, the state 3 is entered; reading data from the data storage module when the data storage module is in the state 1, and entering the state 2; when the image is in the state 3, scaling calculation is carried out on the target image, and meanwhile, calculation is carried out on the ordinate src_y of the next mapping of the target image to the original image; the solving formula for calculating the ordinate src_y of the next image mapped to the original image is as follows:

9. The system for multi-channel image acquisition and processing based on ZYNQ according to claim 7, wherein the coordinate coefficient generating submodule is configured to calculate coordinates of a point of the target image mapped to the original image, and the mapping formula is as follows:

10. The system for collecting and processing the multichannel image based on ZYNQ according to claim 9, wherein the interpolation computation submodule is used for computing and obtaining a pixel value of a certain point of the target image after obtaining coordinate values of four adjacent points of the mapping point; the FIFO control sub-module is used for caching pixel values of the target image.