CN113962842B - Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit - Google Patents

Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Download PDF

Info

Publication number
CN113962842B
CN113962842B CN202111223132.7A CN202111223132A CN113962842B CN 113962842 B CN113962842 B CN 113962842B CN 202111223132 A CN202111223132 A CN 202111223132A CN 113962842 B CN113962842 B CN 113962842B
Authority
CN
China
Prior art keywords
video
module
data
despinning
axi
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111223132.7A
Other languages
Chinese (zh)
Other versions
CN113962842A (en
Inventor
张弘
宋剑波
杨一帆
邢万里
袁丁
李旭亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202111223132.7A priority Critical patent/CN113962842B/en
Publication of CN113962842A publication Critical patent/CN113962842A/en
Application granted granted Critical
Publication of CN113962842B publication Critical patent/CN113962842B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/90Dynamic range modification of images or parts thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • H04N19/436Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/12Indexing scheme for image data processing or generation, in general involving antialiasing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/28Indexing scheme for image data processing or generation, in general involving image processing hardware

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)

Abstract

The invention relates to a dynamic stepless despin system and a method based on high-level synthesis of a large-scale integrated circuit, which comprises a video acquisition module, a video decoding module, a video storage module, a data communication module, a video coding module, a dynamic stepless despin module and a pixel combination module (namely a four-in-one module) which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate. The invention adopts a high-level comprehensive technology to realize a dynamic stepless despinning function, can perform real-time despinning treatment on the acquired video image in a photoelectric platform, fully utilizes the characteristics of parallel acceleration and pipeline optimization of an FPGA (field programmable gate array), and has the excellent characteristics of high video resolution, large despinning range, high despinning precision, clear and non-sawtooth processed image, low output delay, strong system stability, easy processing, low power consumption, small volume and the like.

Description

Dynamic stepless despinning system and method based on large-scale integrated circuit high-level synthesis
Technical Field
The invention relates to the field of intelligent embedded video processing, in particular to a dynamic non-polar despinning system and a dynamic non-polar despinning method based on high-level integration of a large-scale integrated circuit.
Background
During the video recording and aiming process of the onboard pod television, the outer frame structure of the television inevitably generates roll motion, which causes relative motion of an optical system relative to the onboard television and further causes image rotation; or in the flight process of the fighter, the airframe always rolls in a large angle (even can reach 360 degrees), so that the television picture rotates in a large angle, and the visual sensation of an operator is seriously influenced. Therefore, in order to eliminate the problem of image rotation caused by the attitude change of the aircraft among a plurality of optical aiming devices or photoelectric pod systems, the original video image acquired by the television system needs to be subjected to anti-rotation processing, namely despinning conversion, so that the normal and stable image is ensured, and the observation of an operator and the subsequent target detection, identification and tracking work are facilitated. At present, in practical engineering application, there are three common despinning modes, namely electronic despinning, optical despinning and physical despinning, and optical despinning is the most used means at present, and corrects an image by rotating a despinning prism in an imaging optical path, although the mode has low delay and high response speed, the processing technology is complex, the despinning angle precision is low, and the system volume and power consumption are large. With the rapid development of large-scale integrated circuits and digital signal processing technologies, the electronic despin technology realized by a real-time video image processing algorithm is the mainstream research direction at present, and the mode overcomes the defects of an optical despin system and is more and more widely applied.
With the continuous development of the computer vision field and the continuous improvement of the performance of various processing chips, the electronic despinning technology based on video image processing becomes the mainstream research direction of various despinning technologies at present, and the elimination of the image rotation problem caused by the change of the aircraft posture through electronic despinning becomes the first choice of the current engineering application.
Disclosure of Invention
The invention solves the problems: the system and the method overcome the defects of the prior art, provide a dynamic stepless despinning system and method based on high-level integration of a large-scale integrated circuit, and realize stepless despinning processing with high precision, large range, high real-time performance and high output image quality by utilizing the characteristics of FPGA parallel acceleration and pipeline optimization based on the high-level integration technology. The precision can reach 0.001 degrees, namely racemization treatment can be carried out on the extremely small angle; the racemization range is 0-360 degrees, namely, racemization treatment can be carried out on any angle; the processing time of one frame of image is less than 12ms, so that real-time despun processing can be realized; and a bilinear interpolation method is adopted for despinning, so that the image is smooth and has no saw teeth, and the image output quality is high. The method is limited in that the contradiction between the real-time performance, the precision, the range and the image quality exists in the prior art, so that the prior art can only realize one or more of the indexes independently and cannot realize all the technical indexes simultaneously, and therefore, the method has high engineering application value.
The technical solution of the invention is as follows: a dynamic non-polar despinning system based on high-level synthesis of a large-scale integrated circuit is designed based on a high-level synthesis method of the large-scale integrated circuit, and has the following innovation points as the core of the invention as a whole: 1) Utilizing a high-level comprehensive technology, namely using C + + and other high-level languages to carry out FPGA algorithm design optimization and resource scheduling; 2) The algorithm flow line is accelerated and optimized, the data throughput is improved, the time delay is greatly reduced, and the real-time property of image despinning is improved; 3) The high-bandwidth real-time parallel optimization of the multiple AXI buses improves the data reading and writing efficiency and the algorithm real-time performance; 4) A four-in-one module for four-pixel combination is designed, namely four 8-bit pixel points for bilinear interpolation are combined into 32-bit data, so that the function of reading four pixel points at one time can be realized at the later stage, and the high delay caused by repeated data reading is greatly reduced.
The system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA and ARM architecture and is a Zynq UltraScale + MPSoC15EG chip; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate, namely a four-in-one module; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;
the video acquisition module is used for acquiring an original video image by using a camera, wherein the video image is data to be despund; the original video image after the acquisition enters a video decoding module;
the video decoding module is used for converting serial videos acquired by the camera into parallel video data and obtaining a series of dominant video synchronous signals, and the parallel video data and the synchronous signals obtained by decoding are sent to the FPGA;
in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization. Then the data in the AXI bus video stream format flows into a four-in-one module which is innovatively designed by the invention, because the despinning processing of bilinear interpolation is carried out subsequently, every time a pixel is processed, four pixels which are adjacent to the pixel are read from a DDR, the delay caused by pixel reading is considerable, and the pixel reading for multiple times is higher in delay, the four-in-one module is designed, namely, the data stream is cached in an on-chip cache every two lines flows, four 8-bit pixel points around each pixel are merged into one 32-bit data, when the four pixels which are adjacent to a certain pixel are required to be read subsequently, only the merged 32-bit pixel is read once and is divided into four independent 8-bit data, the function of reading the four pixel points once can be realized, and the processing can fully utilize the AXI bus bandwidth to reduce the delay to one fourth of the original delay. Caching the merged 32-bit video stream data into a DDR of an ARM through an AXI video stream DDR read-write module;
the dynamic non-polar despinning module is used for dynamically non-polar despinning the video data in the video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.
The used image electronic racemization algorithm based on bilinear interpolation is as follows:
(1) And solving the coordinates (x, y) of each pixel point (x ', y') of the image after racemization processing corresponding to the pixel point of the image before racemization processing according to the racemization angle sent by the upper computer. The formula is as follows:
Figure BDA0003313358630000031
wherein theta is a rotation angle,
Figure BDA0003313358630000032
is a rotation matrix.
Generally set to the center of the image (x) 0 ,y 0 ) For rotation of the center of rotation, the above formula should be rewritten as:
Figure BDA0003313358630000033
writing the above formula as a scalar:
Figure BDA0003313358630000034
(2) And (4) performing pixel mapping by using a bilinear interpolation method. Since the pixel coordinates (x, y) mapped to the original image calculated in step (1) are often not integers, the pixel mapping cannot be directly performed according to a one-to-one relationship. The non-integer pixel coordinate problem occurring in the mapping process is generally solved by adopting a resampling mode.
According to the image reconstruction theory, three common interpolation methods are generally adopted for image mapping: nearest neighbor interpolation, bilinear interpolation, and cubic interpolation. The interpolation effect of the nearest neighbor interpolation method is poor, and the deswirled image has obvious saw tooth effect and burr phenomenon; the bilinear interpolation method and the cubic interpolation method have good effect, and the gray scale is continuous without sawtooth. The cubic interpolation method has complex algorithm and overlong calculation time, so that the real-time requirement is difficult to meet in practical engineering application. Therefore, the image despinning algorithm based on the bilinear interpolation method is finally selected and used in the invention in consideration of the compromise between the despinning precision and the system real-time property.
The schematic diagram of the electronic racemization algorithm based on bilinear interpolation is shown in fig. 2. The method carries out linear interpolation in the x direction and the y direction according to the gray values of 4 points around the integer coordinate point of the non-integer sampling point. In fig. 2, (x, y) is a pixel coordinate obtained by bilinear interpolation, f (x, y) is a pixel gray scale value at the coordinate (x, y), f (0,0), f (1,0), f (0,1), and f (1,1) are pixel gray scale values of 4 points around (x, y), and thus the calculation formula of the bilinear interpolation method can be obtained as follows:
f(x,y)=[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1,0)-f(0,1)-f(0,0)]xy+f(0,0)
(3) And determining the image boundary after racemization. The size of the image after rotation is typically changed from before rotation, and therefore the image boundary needs to be re-determined. The determination of the four boundary positions of the upper, lower, left and right of the image is calculated according to the following formula:
left=max(x 1 ,x 2 ,x 3 ,x 4 )
right=min(x 1 ,x 2 ,x 3 ,x 4 )
top=max(y 1 ,y 2 ,y 3 ,y 4 )
bottom=min(y 1 ,y 2 ,y 3 ,y 4 )
(4) The image resolution is fixed. In practical engineering application, the resolution of an output image is often fixed, and after despinning operations of different despin angles are performed on an original video image, the resolution of the image is bound to change and the resolution cannot be fixed, so that the invention aims at clipping the despinned image by taking the image center as the center, and fixing the resolution of the output image, namely keeping the same size of the output image.
The invention focuses on realizing dynamic stepless racemization based on a high-level comprehensive technology of a large-scale integrated circuit, which is an important guarantee for the real-time performance of a high-resolution system and is also the most important innovation point of the invention.
Compared with the prior art, the invention has the advantages that:
(1) The invention innovatively designs a four-in-one module, namely the advantages of high-bandwidth data flow are fully utilized, the data flow is cached in an on-chip cache every two lines flow in, four 8-bit pixel points around each pixel are merged into one 32-bit data and cached into a DDR (double data rate) in a data flow mode, then when a certain pixel point is despund in a bilinear interpolation mode, the 32-bit data can be taken out and divided into four 8-bit pixel points, namely four pixel points required by the bilinear interpolation, the function of reading the four pixel points at one time can be realized, the time delay of an algorithm can be reduced to one fourth of the original time, the processing time delay is the same as the despun processing of nearest neighbor interpolation, but the processing effect is much better than the despun processing of the nearest neighbor interpolation.
(2) And (4) accelerating and optimizing an algorithm pipeline. Compared with a general embedded system, the large-scale integrated circuit FPGA has the great advantage that the algorithm can be optimized in a data pipelining mode, so that the algorithm is compiled in a pipelining mode, when algorithm development is carried out in a Vivado HLS development tool, a precompiled instruction pipeline (pipelining optimization instruction) is used, and the compiled program is ensured to be in accordance with data input, data use and data output once, namely, one piece of data can be input once and used once, and finally, the data flow is prevented from being blocked by a pipelining programming principle that the data must be output and output once, namely, the algorithm can be subjected to pipelining processing in a mode of sacrificing hardware logic resources.
In particular, pipelining allows operations to be performed in parallel, with each execution step not having to wait for all operations to complete before starting the next operation. Pipelining is suitable for functions and cycles, taking circulating pipeline optimization as an example, variables in each cycle relate to three operations of reading, calculating and writing, before pipeline optimization is not performed, the three operations are executed according to a serial sequence, input is read once every 3 clock cycles, and values are output after 2 clock cycles; after the pipeline optimization is carried out, a read operation is executed once in each clock, and multiple groups of data are executed in a parallel mode. The delay conditions before and after pipeline optimization are shown in fig. 3, before pipeline optimization is carried out, 3 clock cycles are needed between two read operations, and the last write operation can be executed after 8 clock cycles; after the pipeline optimization is carried out, 1 clock cycle is needed between two reading operations, the last writing operation can be executed after 4 clock cycles, the pipeline optimization of the visible algorithm improves the data throughput, greatly reduces the time delay and improves the real-time property of image despinning.
(3) Multiple AXI high bandwidth buses are optimized in real-time in parallel. The invention aims to solve the problem that real-time despun processing is realized on a high-resolution image, and the space of a cache on a chip (BRAM) of an FPGA chip is limited and is not enough to cache a whole frame of high-resolution image, so that a 64-bit 128MB DDR chip is hung externally at an ARM embedded end and is used for image caching. Different from direct caching in BRAM, because the DDR is externally hung at the ARM end, the FPGA chip needs to read and write data from the FPGA end to the DDR of the ARM end through the AXI bus. As can be derived from analysis and actual measurement of the delay, since the algorithm has been pipeline optimized in (1) and the delay of the racemization algorithm itself has been reduced to a lower level, the delay mainly results from reading and writing data from the DDR over the AXI bus. The FPGA + ARM processing architecture chip used by the invention is Zynq UltraScale + MPSOC15EG, and has abundant AXI bus resources (7 128-bit AXI buses), so that the invention uses a parallel processing mode of a plurality of AXI high-bandwidth buses to read and write and process a plurality of pixel points simultaneously, thereby greatly reducing time delay, increasing data throughput and improving algorithm real-time. Finally, the invention uses 2 buses with 128 bits and 1 bus with 64 bits to carry out multi-bus parallel processing, aiming at 1080p gray level images, the whole time delay of executing bilinear interpolation despinning algorithm in the range of 360 degrees is 12ms, no matter aiming at 30fps video images or 60fps video images, the despinning operation can be completed in one frame time, namely the real-time despinning processing of high-resolution images is realized. Meanwhile, the invention only occupies 36% of bus resources, namely, the racemization of 1080p images is realized, so that the resolution of real-time racemization of the images can be further improved by continuously increasing the use of the bus.
(4) And (3) realizing algorithm design optimization and resource scheduling by using a high-level comprehensive technology. The Zynq UltraScale + MPSOC15EG processing chip used by the invention is a heterogeneous embedded chip developed by Xilinx company, is developed by using a Vivado development kit, comprises a high-level development tool Vivado HLS, can use a high-level language (C/C + +/System C) to carry out algorithm development and optimization design according to specific specifications under an HLS development framework, and finally converts a high-level language program into a hardware description language (Verilog HDL/VHDL) program by using the HLS tool. By using a high-level comprehensive tool for development, algorithm design optimization and dynamic scheduling of logic resources can be conveniently performed, the development efficiency is greatly improved, the parallel computing advantages of multiple AXI buses of an FPGA + ARM architecture and the acceleration characteristics of multiple pipelines are fully exerted, and the despinning algorithm performance is remarkably improved. The invention carries out design balance from the aspects of logic resource occupation, delay, throughput and the like, and because the chip hardware used by the invention has richer logic resources, the logic resource occupation is determined to be sacrificed to realize lower algorithm delay and higher data throughput. The invention fully utilizes the advantages of HLS and improves the performance of the racemization algorithm from the aspects of data type optimization and data throughput optimization. Specifically, in the aspect of data type optimization, 20-bit-width data is used for multiple times, however, the data type bit width of the standard C is an integral multiple of 8 bits, and if the integer data with the bit width of 32 bits is directly used, the waste of logic resources is caused, and the advantages of high performance and strong parallel capability of the FPGA cannot be exerted, so that the invention defines one 20-bit-width data by using a mode defined by any bit-width data provided by an HLS tool, and greatly saves the use of the logic resources. The invention discloses a data throughput optimization method, which performs pipeline optimization and cycle expansion optimization on a cycle according to the idea of changing the speed by area, improves the throughput of an algorithm at the cost of sacrificing logic resources and improves the performance of the algorithm.
(5) Through practical tests, real-time despinning can be realized for 1920 x 1080 visible light images, the despinning range is 0-360 degrees, the delay is less than 12ms, the accuracy of the despinning angle can reach 0.001 degrees, the maximum pixel error is less than 1 pixel, and the whole system has the excellent characteristics of high video resolution, large despinning range, high despinning accuracy, clear and non-sawtooth processed images, low output delay, strong system stability, easiness in processing, low power consumption, small size and the like.
Drawings
FIG. 1 is a schematic frame diagram of a dynamic despinning-free system based on high-level integration of LSI;
FIG. 2 is a schematic diagram of an image despinning algorithm principle based on bilinear interpolation;
FIG. 3 is a diagram of the effect of pipeline optimization delay;
FIG. 4 is a flow diagram of a dynamic non-polar racemization processing module;
FIG. 5 is a dynamic non-polar racemization system effect demonstration, wherein (a) is before racemization treatment and (b) is after racemization treatment.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings.
As shown in fig. 1, the despinning system of the present invention includes a video capture module, a video decoding module, a core processing module and a video encoding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR (double data rate) reading and writing module and a pixel merging module (a four-in-one module) which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus.
The video acquisition module is an industrial camera, the resolution is 1920 multiplied by 1080, the frame frequency is 30Hz or 60Hz, and the video output format is not limited. The video decoding module uses a video decoding chip and is used for converting an input serial video signal into a parallel format video, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC, and the data signal, the effective signal and the synchronizing signal are transmitted to the FPGA for subsequent processing. The video storage module adopts 4 pieces of DDR4 with 16 bits and 128MB to form a piece of DDR with 64 bits and 128MB, because racemization processing needs whole frame image caching, and the space of the on-chip caching in the FPGA is small and is not enough to store the whole frame image, a plug-in memory is needed, and the DDR is finally selected to be plug-in at the ARM end of the Zynq chip, thereby being more beneficial to subsequent operation. The data communication module mainly comprises two parts, wherein one part is communication between the electronic despin system and the main control of the upper computer, which is designed based on RS422, and the stable low-speed transmission protocol can meet the transmission of despin angles in the system; the other is communication between an FPGA end and an ARM end in the Zynq chip, and the communication between the FPGA end and the ARM end adopts an AXI bus communication protocol provided by Xilinx to transmit instruction information and image information through an AXI bus. The video coding module is a video coding chip and is used for converting parallel video data, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC into serial video signals to be output, and finally outputting the serial video signals to a display or an acquisition card to be displayed in real time. The model of a core processing module of the system is Zynq UltraScale + MPSOC15EG, and the Zynq framework chip can fully exert the parallel acceleration function of the FPGA end and the master control scheduling function of the ARM end, and is one of the mainstream chips of the existing heterogeneous system-on-chip. The core of the invention is a four-in-one module and a dynamic non-polar despun module, the algorithm of the dynamic non-polar despun module is deployed at the FPGA end, and the memory scheduling and the communication with the upper computer are carried out at the ARM end.
The invention specifically comprises the following steps:
the method comprises the following steps: video capture and decoding
The invention adopts an industrial camera to collect video images, and carries out video decoding through a decoding chip to obtain parallel videos, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC. The invention is designed based on FPGA AXI data stream, therefore, the related signals obtained by decoding need to be sent to a video to AXI bus video stream module, and parallel video data are converted into AXI bus video stream data, thereby being convenient for realizing the accelerated optimization of the production line in later period with high efficiency.
Step two: immediate neighbor pixel binning
The invention innovatively designs a four-in-one module, a data stream is cached in an on-chip cache every two lines flow in, four 8-bit pixel points adjacent to each pixel are merged into one 32-bit data, then when racemization of bilinear interpolation is carried out on a certain pixel point, the 32-bit data can be taken out and divided into four 8-bit pixel points, namely four pixel points required by the bilinear interpolation, and the function of reading the four pixel points at one time can be realized, so that the algorithm delay can be reduced to one fourth of the original delay.
Step three: video data storage
Caching the 32-bit video stream data merged in the step two into DDR of the ARM through the AXI video stream DDR read-write module;
step four: real-time dynamic non-polar despinning processing of video data
The flow chart of the dynamic non-polar racemization processing module is shown in figure 4. The invention designs a dynamic non-polar racemization algorithm by using Vivado high-level comprehensive technology, and encapsulates the dynamic non-polar racemization algorithm into an IP core, wherein the IP core defines two m _ AXI (AXI host) ports which are respectively used for reading and writing DDR4, the m _ AXI reading port is used for reading original pixel information from a frame buffer area of the DDR4 through an AXI bus, and the original pixel information is output to another frame buffer area of the DDR by using the m _ AXI writing port after dynamic non-polar racemization processing is carried out through the racemization algorithm, thereby completing the whole process of image racemization.
Step five: video encoding and output display
After the despinning processing in the fourth step, the despinned image is cached in a cache area of the DDR, the cached despinned video image is read into the AXI video stream from the DDR by using the AXI video stream DDR read-write module again, the AXI video stream is converted into parallel video data with a dominant synchronous signal by using the AXI bus video stream video module, and the parallel video data is sent into a video coding chip to be coded and output to a monitor or an acquisition card to carry out real-time display of the despinned result.
According to the steps, the host computer gives any racemization angle, and the system can output a racemization result in real time. For example, the rotation angle of the host computer is 0.625 ° clockwise, and the images before and after being processed by the rotation-eliminating system are shown in fig. 5. Fig. 5 (a) is an original image before racemization, it can be seen that the image has a tilt in the horizontal direction, that is, the optical axis is not accurately balanced, and a rotation angle in the counterclockwise direction exists, and the rotation angle is 0.625 ° as measured by the upper computer, so the upper computer issues a racemization angle of 0.625 ° to the racemization system, and as shown in fig. 5 (b), it can be seen that the image after racemization has been balanced in the horizontal direction, and the image after racemization has no sawtooth effect, the accuracy of the racemization angle reaches 0.001 °, and the processing time of the frame video image is less than 12ms, which has high real-time performance.
Details not described in the present specification are prior art known to those skilled in the art.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims (6)

1. A dynamic stepless despinning system based on large-scale integrated circuit high-level synthesis is characterized in that: the system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is an all-in-one module and is used for reducing algorithm delay and improving the bus bandwidth utilization rate and is innovatively designed; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;
the video acquisition module is used for acquiring an original video image by using a camera, wherein the video image is data to be despuned; the original video image after the acquisition enters a video decoding module;
the video decoding module is used for converting serial videos acquired by the camera into parallel video data and obtaining a series of dominant video synchronous signals, and the parallel video data and the synchronous signals obtained by decoding are sent to the FPGA;
in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and more beneficial to realizing data synchronization and pipeline acceleration optimization, then data in an AXI bus video stream format flows into a four-in-one module, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when four pixels adjacent to one pixel are required to be read subsequently, the merged 32-bit pixel is only required to be read once and is divided into four independent 8-bit data, namely, the function of reading the four pixel points at one time is realized, and the processing utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay; caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;
the dynamic non-polar despinning module is used for dynamically performing non-polar despinning on video data in a video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, and is matched with the four-in-one module during despinning processing to divide 32-bit data read from the DDR into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.
2. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the four-in-one module and the dynamic non-polar despun module are developed by using a high-level comprehensive tool Vivado HLS, and are subjected to pipeline optimization by using a precompiled instruction pipeline, namely a pipeline optimization instruction, so that under the condition that the requirements of one-time input, one-time use and one-time output of data are met, namely that one data can be input only once and can be used only once, and finally, the data needing 8 clock cycles for processing can be processed only by using 4 clock cycles.
3. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the system also improves the performance of the racemization algorithm in the aspects of data type optimization, namely self-defined bit width data type and data throughput optimization; and performing real-time parallel optimization on the plurality of AXI high-bandwidth buses, and simultaneously reading and writing and processing a plurality of pixel points in a parallel computing mode.
4. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: in the dynamic non-polar despinning module, an image electronic despinning algorithm based on bilinear interpolation is adopted for real-time despinning, and the method specifically comprises the following steps:
(1) According to the despinning angle sent by the upper computer, the coordinate (x, y) of each pixel point (x ', y') of the video image after the despinning processing corresponding to the pixel point of the video image before the despinning processing is solved
Figure FDA0003313358620000021
Wherein θ represents the racemic angle, x 0 ,y 0 Respectively representing the horizontal and vertical coordinates of the center of the image;
(2) Pixel mapping using bilinear interpolation
f(x,y)=[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1,0)-f(0,1)-f(0,0)]xy+f(0,0)
Wherein x and y are respectively integer coordinates obtained by rounding off the pixel coordinate points after racemization obtained in the step (1), f (0,0), f (1,0), f (0,1), f (1,1) are pixel gray values of 4 points around the (x, y), and f (x, y) is a pixel gray value obtained by bilinear interpolation at the coordinates of the (x, y);
(3) Determining the boundary of the despun image, wherein the size of the rotated image is generally changed compared with that before the rotation, so that the boundary of the video image needs to be determined again, and the determination of the four boundary positions of the video image, namely the upper boundary position, the lower boundary position, the left boundary position and the right boundary position, is calculated according to the following formula:
left=max(x 1 ,x 2 ,x 3 ,x 4 )
right=min(x 1 ,x 2 ,x 3 ,x 4 )
top=max(y 1 ,y 2 ,y 3 ,y 4 )
bottom=min(y 1 ,y 2 ,y 3 ,y 4 )
(4) And fixing the image resolution, cutting the despin video image by taking the center of the video image as the center, and fixing the output image resolution, namely keeping the same size of the output image.
5. The LSI high-level synthesis-based dynamic despinning system of claim 1, wherein: the heterogeneous system on chip with the FPGA and ARM architecture adopted by the core processing module is a Zynq UltraScale + MPSoC15EG chip.
6. A dynamic non-polar despinning method based on high-level synthesis of a large-scale integrated circuit is characterized by comprising the following implementation steps of:
(1) Converting serial video collected by a camera into parallel video data, obtaining a series of dominant video synchronous signals, and sending the parallel video data and the synchronous signals obtained by decoding to an FPGA;
(2) In the FPGA, video data is converted into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization through a video-to-AXI bus video stream module;
(3) Then the data in the AXI bus video stream format flows into a four-in-one module, as the despun processing of bilinear interpolation is carried out subsequently, each pixel is processed, the four pixels adjacent to each pixel are read from the DDR, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when the four pixels adjacent to a certain pixel are required to be read subsequently, only the merged 32-bit pixel needs to be read once and is divided into four independent 8-bit data, namely the function of reading the four pixel points once is realized, and the processing fully utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay;
(4) Caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;
(5) Then the dynamic non-polar despinning module performs dynamic non-polar despinning on video data in the video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by an upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR;
(6) Reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display;
in the steps (3) and (5), the four-in-one module and the dynamic non-polar despin module are developed by using a high-level comprehensive tool Vivado HLS, and a precompiled instruction pipeline is used for carrying out pipeline optimization on the algorithm, so that the programmed program meets the conditions that data is input, used and output once, namely, one data can be input once and used once, and finally, the data needs to be output and output once is subjected to pipeline processing, and the data which needs to be processed in 8 clock cycles originally is processed in 4 clock cycles; in addition, the performance of the despun algorithm is improved from the aspects of data type optimization and data throughput optimization; meanwhile, a plurality of AXI high-bandwidth buses are transferred to perform real-time parallel optimization, and a plurality of pixel points are read and written and processed simultaneously in a parallel computing mode.
CN202111223132.7A 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Active CN113962842B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111223132.7A CN113962842B (en) 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111223132.7A CN113962842B (en) 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit

Publications (2)

Publication Number Publication Date
CN113962842A CN113962842A (en) 2022-01-21
CN113962842B true CN113962842B (en) 2022-12-09

Family

ID=79465107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111223132.7A Active CN113962842B (en) 2021-10-20 2021-10-20 Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit

Country Status (1)

Country Link
CN (1) CN113962842B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data
CN106342328B (en) * 2008-05-23 2012-07-25 中国航空工业集团公司洛阳电光设备研究所 Electronics racemization method for parallel processing based on TIDSP
CN109658337A (en) * 2018-11-21 2019-04-19 中国航空工业集团公司洛阳电光设备研究所 A kind of FPGA implementation method of image real-time electronic racemization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5832119A (en) * 1993-11-18 1998-11-03 Digimarc Corporation Methods for controlling systems using control signals embedded in empirical data
US5832119C1 (en) * 1993-11-18 2002-03-05 Digimarc Corp Methods for controlling systems using control signals embedded in empirical data
CN106342328B (en) * 2008-05-23 2012-07-25 中国航空工业集团公司洛阳电光设备研究所 Electronics racemization method for parallel processing based on TIDSP
CN109658337A (en) * 2018-11-21 2019-04-19 中国航空工业集团公司洛阳电光设备研究所 A kind of FPGA implementation method of image real-time electronic racemization

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
实时图像的电子消旋系统;曾祥萍等;《光电工程》;20051030(第10期);全文 *

Also Published As

Publication number Publication date
CN113962842A (en) 2022-01-21

Similar Documents

Publication Publication Date Title
US10282805B2 (en) Image signal processor and devices including the same
CN108616717B (en) Real-time panoramic video splicing display device and method thereof
CN109658337B (en) FPGA implementation method for real-time electronic despinning of images
CN109857702B (en) Laser radar data read-write control system and chip based on robot
CN111064906A (en) Domestic processor and domestic FPGA multi-path 4K high-definition video comprehensive display method
US10861243B1 (en) Context-sensitive augmented reality
CN104717485A (en) VGA interface naked-eye 3D display system based on FPGA
CN108053385A (en) A kind of real-time correction system of flake video and method
CN109587421B (en) HD-SDI/3G-SDI transceiving and real-time picture-in-picture switching output processing method
CN112367537A (en) Video acquisition-splicing-display system based on ZYNQ
US20100165014A1 (en) Display system having resolution conversion
CN109873998B (en) Infrared video enhancement system based on multi-level guide filtering
CN104883517A (en) Three-path high-resolution video stream blending system and method
CN109708662B (en) High-frame-frequency high-precision injection type star atlas simulation test platform based on target identification
CN110738594A (en) FPGA-based onboard electronic instrument image generation method
CN111145133A (en) ZYNQ-based infrared and visible light co-optical axis image fusion system and method
CN111770342B (en) Video stepless scaling method
CN113962842B (en) Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit
CN101901278A (en) High-speed data acquisition card and data acquisition method
CN111639046B (en) System and method for caching and transmitting data of far ultraviolet aurora imager in real time
CN101793557A (en) High-resolution imager data real-time acquisition system and method
Guo et al. An FPGA implementation of multi-channel video processing and 4K real-time display system
WO2023184754A1 (en) Configurable real-time disparity point cloud computing apparatus and method
CN115002304B (en) Video image resolution self-adaptive conversion device
CN109688314B (en) Camera system and method with low delay, less cache and controllable data output mode

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant