CN113962842B - Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit - Google Patents
Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit Download PDFInfo
- Publication number
- CN113962842B CN113962842B CN202111223132.7A CN202111223132A CN113962842B CN 113962842 B CN113962842 B CN 113962842B CN 202111223132 A CN202111223132 A CN 202111223132A CN 113962842 B CN113962842 B CN 113962842B
- Authority
- CN
- China
- Prior art keywords
- video
- module
- data
- despinning
- axi
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 10
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 54
- 238000005457 optimization Methods 0.000 claims abstract description 35
- 238000004891 communication Methods 0.000 claims abstract description 16
- 230000001133 acceleration Effects 0.000 claims abstract description 7
- 238000003860 storage Methods 0.000 claims abstract description 5
- 230000006340 racemization Effects 0.000 claims description 39
- 230000001360 synchronised effect Effects 0.000 claims description 10
- 230000008901 benefit Effects 0.000 claims description 8
- 238000013507 mapping Methods 0.000 claims description 5
- 230000009286 beneficial effect Effects 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 11
- 238000011161 development Methods 0.000 description 10
- 238000013461 design Methods 0.000 description 8
- 230000000694 effects Effects 0.000 description 7
- 230000006870 function Effects 0.000 description 7
- 230000003287 optical effect Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 230000008569 process Effects 0.000 description 5
- 230000010354 integration Effects 0.000 description 4
- 101100498818 Arabidopsis thaliana DDR4 gene Proteins 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 206010034719 Personality change Diseases 0.000 description 1
- 238000012952 Resampling Methods 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000035807 sensation Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000001308 synthesis method Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
- 239000002699 waste material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/90—Dynamic range modification of images or parts thereof
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
- H04N19/436—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation using parallelised computational arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/12—Indexing scheme for image data processing or generation, in general involving antialiasing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2200/00—Indexing scheme for image data processing or generation, in general
- G06T2200/28—Indexing scheme for image data processing or generation, in general involving image processing hardware
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Image Processing (AREA)
Abstract
The invention relates to a dynamic stepless despin system and a method based on high-level synthesis of a large-scale integrated circuit, which comprises a video acquisition module, a video decoding module, a video storage module, a data communication module, a video coding module, a dynamic stepless despin module and a pixel combination module (namely a four-in-one module) which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate. The invention adopts a high-level comprehensive technology to realize a dynamic stepless despinning function, can perform real-time despinning treatment on the acquired video image in a photoelectric platform, fully utilizes the characteristics of parallel acceleration and pipeline optimization of an FPGA (field programmable gate array), and has the excellent characteristics of high video resolution, large despinning range, high despinning precision, clear and non-sawtooth processed image, low output delay, strong system stability, easy processing, low power consumption, small volume and the like.
Description
Technical Field
The invention relates to the field of intelligent embedded video processing, in particular to a dynamic non-polar despinning system and a dynamic non-polar despinning method based on high-level integration of a large-scale integrated circuit.
Background
During the video recording and aiming process of the onboard pod television, the outer frame structure of the television inevitably generates roll motion, which causes relative motion of an optical system relative to the onboard television and further causes image rotation; or in the flight process of the fighter, the airframe always rolls in a large angle (even can reach 360 degrees), so that the television picture rotates in a large angle, and the visual sensation of an operator is seriously influenced. Therefore, in order to eliminate the problem of image rotation caused by the attitude change of the aircraft among a plurality of optical aiming devices or photoelectric pod systems, the original video image acquired by the television system needs to be subjected to anti-rotation processing, namely despinning conversion, so that the normal and stable image is ensured, and the observation of an operator and the subsequent target detection, identification and tracking work are facilitated. At present, in practical engineering application, there are three common despinning modes, namely electronic despinning, optical despinning and physical despinning, and optical despinning is the most used means at present, and corrects an image by rotating a despinning prism in an imaging optical path, although the mode has low delay and high response speed, the processing technology is complex, the despinning angle precision is low, and the system volume and power consumption are large. With the rapid development of large-scale integrated circuits and digital signal processing technologies, the electronic despin technology realized by a real-time video image processing algorithm is the mainstream research direction at present, and the mode overcomes the defects of an optical despin system and is more and more widely applied.
With the continuous development of the computer vision field and the continuous improvement of the performance of various processing chips, the electronic despinning technology based on video image processing becomes the mainstream research direction of various despinning technologies at present, and the elimination of the image rotation problem caused by the change of the aircraft posture through electronic despinning becomes the first choice of the current engineering application.
Disclosure of Invention
The invention solves the problems: the system and the method overcome the defects of the prior art, provide a dynamic stepless despinning system and method based on high-level integration of a large-scale integrated circuit, and realize stepless despinning processing with high precision, large range, high real-time performance and high output image quality by utilizing the characteristics of FPGA parallel acceleration and pipeline optimization based on the high-level integration technology. The precision can reach 0.001 degrees, namely racemization treatment can be carried out on the extremely small angle; the racemization range is 0-360 degrees, namely, racemization treatment can be carried out on any angle; the processing time of one frame of image is less than 12ms, so that real-time despun processing can be realized; and a bilinear interpolation method is adopted for despinning, so that the image is smooth and has no saw teeth, and the image output quality is high. The method is limited in that the contradiction between the real-time performance, the precision, the range and the image quality exists in the prior art, so that the prior art can only realize one or more of the indexes independently and cannot realize all the technical indexes simultaneously, and therefore, the method has high engineering application value.
The technical solution of the invention is as follows: a dynamic non-polar despinning system based on high-level synthesis of a large-scale integrated circuit is designed based on a high-level synthesis method of the large-scale integrated circuit, and has the following innovation points as the core of the invention as a whole: 1) Utilizing a high-level comprehensive technology, namely using C + + and other high-level languages to carry out FPGA algorithm design optimization and resource scheduling; 2) The algorithm flow line is accelerated and optimized, the data throughput is improved, the time delay is greatly reduced, and the real-time property of image despinning is improved; 3) The high-bandwidth real-time parallel optimization of the multiple AXI buses improves the data reading and writing efficiency and the algorithm real-time performance; 4) A four-in-one module for four-pixel combination is designed, namely four 8-bit pixel points for bilinear interpolation are combined into 32-bit data, so that the function of reading four pixel points at one time can be realized at the later stage, and the high delay caused by repeated data reading is greatly reduced.
The system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA and ARM architecture and is a Zynq UltraScale + MPSoC15EG chip; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate, namely a four-in-one module; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;
the video acquisition module is used for acquiring an original video image by using a camera, wherein the video image is data to be despund; the original video image after the acquisition enters a video decoding module;
the video decoding module is used for converting serial videos acquired by the camera into parallel video data and obtaining a series of dominant video synchronous signals, and the parallel video data and the synchronous signals obtained by decoding are sent to the FPGA;
in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization. Then the data in the AXI bus video stream format flows into a four-in-one module which is innovatively designed by the invention, because the despinning processing of bilinear interpolation is carried out subsequently, every time a pixel is processed, four pixels which are adjacent to the pixel are read from a DDR, the delay caused by pixel reading is considerable, and the pixel reading for multiple times is higher in delay, the four-in-one module is designed, namely, the data stream is cached in an on-chip cache every two lines flows, four 8-bit pixel points around each pixel are merged into one 32-bit data, when the four pixels which are adjacent to a certain pixel are required to be read subsequently, only the merged 32-bit pixel is read once and is divided into four independent 8-bit data, the function of reading the four pixel points once can be realized, and the processing can fully utilize the AXI bus bandwidth to reduce the delay to one fourth of the original delay. Caching the merged 32-bit video stream data into a DDR of an ARM through an AXI video stream DDR read-write module;
the dynamic non-polar despinning module is used for dynamically non-polar despinning the video data in the video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.
The used image electronic racemization algorithm based on bilinear interpolation is as follows:
(1) And solving the coordinates (x, y) of each pixel point (x ', y') of the image after racemization processing corresponding to the pixel point of the image before racemization processing according to the racemization angle sent by the upper computer. The formula is as follows:
Generally set to the center of the image (x) 0 ,y 0 ) For rotation of the center of rotation, the above formula should be rewritten as:
writing the above formula as a scalar:
(2) And (4) performing pixel mapping by using a bilinear interpolation method. Since the pixel coordinates (x, y) mapped to the original image calculated in step (1) are often not integers, the pixel mapping cannot be directly performed according to a one-to-one relationship. The non-integer pixel coordinate problem occurring in the mapping process is generally solved by adopting a resampling mode.
According to the image reconstruction theory, three common interpolation methods are generally adopted for image mapping: nearest neighbor interpolation, bilinear interpolation, and cubic interpolation. The interpolation effect of the nearest neighbor interpolation method is poor, and the deswirled image has obvious saw tooth effect and burr phenomenon; the bilinear interpolation method and the cubic interpolation method have good effect, and the gray scale is continuous without sawtooth. The cubic interpolation method has complex algorithm and overlong calculation time, so that the real-time requirement is difficult to meet in practical engineering application. Therefore, the image despinning algorithm based on the bilinear interpolation method is finally selected and used in the invention in consideration of the compromise between the despinning precision and the system real-time property.
The schematic diagram of the electronic racemization algorithm based on bilinear interpolation is shown in fig. 2. The method carries out linear interpolation in the x direction and the y direction according to the gray values of 4 points around the integer coordinate point of the non-integer sampling point. In fig. 2, (x, y) is a pixel coordinate obtained by bilinear interpolation, f (x, y) is a pixel gray scale value at the coordinate (x, y), f (0,0), f (1,0), f (0,1), and f (1,1) are pixel gray scale values of 4 points around (x, y), and thus the calculation formula of the bilinear interpolation method can be obtained as follows:
f(x,y)=[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1,0)-f(0,1)-f(0,0)]xy+f(0,0)
(3) And determining the image boundary after racemization. The size of the image after rotation is typically changed from before rotation, and therefore the image boundary needs to be re-determined. The determination of the four boundary positions of the upper, lower, left and right of the image is calculated according to the following formula:
left=max(x 1 ,x 2 ,x 3 ,x 4 )
right=min(x 1 ,x 2 ,x 3 ,x 4 )
top=max(y 1 ,y 2 ,y 3 ,y 4 )
bottom=min(y 1 ,y 2 ,y 3 ,y 4 )
(4) The image resolution is fixed. In practical engineering application, the resolution of an output image is often fixed, and after despinning operations of different despin angles are performed on an original video image, the resolution of the image is bound to change and the resolution cannot be fixed, so that the invention aims at clipping the despinned image by taking the image center as the center, and fixing the resolution of the output image, namely keeping the same size of the output image.
The invention focuses on realizing dynamic stepless racemization based on a high-level comprehensive technology of a large-scale integrated circuit, which is an important guarantee for the real-time performance of a high-resolution system and is also the most important innovation point of the invention.
Compared with the prior art, the invention has the advantages that:
(1) The invention innovatively designs a four-in-one module, namely the advantages of high-bandwidth data flow are fully utilized, the data flow is cached in an on-chip cache every two lines flow in, four 8-bit pixel points around each pixel are merged into one 32-bit data and cached into a DDR (double data rate) in a data flow mode, then when a certain pixel point is despund in a bilinear interpolation mode, the 32-bit data can be taken out and divided into four 8-bit pixel points, namely four pixel points required by the bilinear interpolation, the function of reading the four pixel points at one time can be realized, the time delay of an algorithm can be reduced to one fourth of the original time, the processing time delay is the same as the despun processing of nearest neighbor interpolation, but the processing effect is much better than the despun processing of the nearest neighbor interpolation.
(2) And (4) accelerating and optimizing an algorithm pipeline. Compared with a general embedded system, the large-scale integrated circuit FPGA has the great advantage that the algorithm can be optimized in a data pipelining mode, so that the algorithm is compiled in a pipelining mode, when algorithm development is carried out in a Vivado HLS development tool, a precompiled instruction pipeline (pipelining optimization instruction) is used, and the compiled program is ensured to be in accordance with data input, data use and data output once, namely, one piece of data can be input once and used once, and finally, the data flow is prevented from being blocked by a pipelining programming principle that the data must be output and output once, namely, the algorithm can be subjected to pipelining processing in a mode of sacrificing hardware logic resources.
In particular, pipelining allows operations to be performed in parallel, with each execution step not having to wait for all operations to complete before starting the next operation. Pipelining is suitable for functions and cycles, taking circulating pipeline optimization as an example, variables in each cycle relate to three operations of reading, calculating and writing, before pipeline optimization is not performed, the three operations are executed according to a serial sequence, input is read once every 3 clock cycles, and values are output after 2 clock cycles; after the pipeline optimization is carried out, a read operation is executed once in each clock, and multiple groups of data are executed in a parallel mode. The delay conditions before and after pipeline optimization are shown in fig. 3, before pipeline optimization is carried out, 3 clock cycles are needed between two read operations, and the last write operation can be executed after 8 clock cycles; after the pipeline optimization is carried out, 1 clock cycle is needed between two reading operations, the last writing operation can be executed after 4 clock cycles, the pipeline optimization of the visible algorithm improves the data throughput, greatly reduces the time delay and improves the real-time property of image despinning.
(3) Multiple AXI high bandwidth buses are optimized in real-time in parallel. The invention aims to solve the problem that real-time despun processing is realized on a high-resolution image, and the space of a cache on a chip (BRAM) of an FPGA chip is limited and is not enough to cache a whole frame of high-resolution image, so that a 64-bit 128MB DDR chip is hung externally at an ARM embedded end and is used for image caching. Different from direct caching in BRAM, because the DDR is externally hung at the ARM end, the FPGA chip needs to read and write data from the FPGA end to the DDR of the ARM end through the AXI bus. As can be derived from analysis and actual measurement of the delay, since the algorithm has been pipeline optimized in (1) and the delay of the racemization algorithm itself has been reduced to a lower level, the delay mainly results from reading and writing data from the DDR over the AXI bus. The FPGA + ARM processing architecture chip used by the invention is Zynq UltraScale + MPSOC15EG, and has abundant AXI bus resources (7 128-bit AXI buses), so that the invention uses a parallel processing mode of a plurality of AXI high-bandwidth buses to read and write and process a plurality of pixel points simultaneously, thereby greatly reducing time delay, increasing data throughput and improving algorithm real-time. Finally, the invention uses 2 buses with 128 bits and 1 bus with 64 bits to carry out multi-bus parallel processing, aiming at 1080p gray level images, the whole time delay of executing bilinear interpolation despinning algorithm in the range of 360 degrees is 12ms, no matter aiming at 30fps video images or 60fps video images, the despinning operation can be completed in one frame time, namely the real-time despinning processing of high-resolution images is realized. Meanwhile, the invention only occupies 36% of bus resources, namely, the racemization of 1080p images is realized, so that the resolution of real-time racemization of the images can be further improved by continuously increasing the use of the bus.
(4) And (3) realizing algorithm design optimization and resource scheduling by using a high-level comprehensive technology. The Zynq UltraScale + MPSOC15EG processing chip used by the invention is a heterogeneous embedded chip developed by Xilinx company, is developed by using a Vivado development kit, comprises a high-level development tool Vivado HLS, can use a high-level language (C/C + +/System C) to carry out algorithm development and optimization design according to specific specifications under an HLS development framework, and finally converts a high-level language program into a hardware description language (Verilog HDL/VHDL) program by using the HLS tool. By using a high-level comprehensive tool for development, algorithm design optimization and dynamic scheduling of logic resources can be conveniently performed, the development efficiency is greatly improved, the parallel computing advantages of multiple AXI buses of an FPGA + ARM architecture and the acceleration characteristics of multiple pipelines are fully exerted, and the despinning algorithm performance is remarkably improved. The invention carries out design balance from the aspects of logic resource occupation, delay, throughput and the like, and because the chip hardware used by the invention has richer logic resources, the logic resource occupation is determined to be sacrificed to realize lower algorithm delay and higher data throughput. The invention fully utilizes the advantages of HLS and improves the performance of the racemization algorithm from the aspects of data type optimization and data throughput optimization. Specifically, in the aspect of data type optimization, 20-bit-width data is used for multiple times, however, the data type bit width of the standard C is an integral multiple of 8 bits, and if the integer data with the bit width of 32 bits is directly used, the waste of logic resources is caused, and the advantages of high performance and strong parallel capability of the FPGA cannot be exerted, so that the invention defines one 20-bit-width data by using a mode defined by any bit-width data provided by an HLS tool, and greatly saves the use of the logic resources. The invention discloses a data throughput optimization method, which performs pipeline optimization and cycle expansion optimization on a cycle according to the idea of changing the speed by area, improves the throughput of an algorithm at the cost of sacrificing logic resources and improves the performance of the algorithm.
(5) Through practical tests, real-time despinning can be realized for 1920 x 1080 visible light images, the despinning range is 0-360 degrees, the delay is less than 12ms, the accuracy of the despinning angle can reach 0.001 degrees, the maximum pixel error is less than 1 pixel, and the whole system has the excellent characteristics of high video resolution, large despinning range, high despinning accuracy, clear and non-sawtooth processed images, low output delay, strong system stability, easiness in processing, low power consumption, small size and the like.
Drawings
FIG. 1 is a schematic frame diagram of a dynamic despinning-free system based on high-level integration of LSI;
FIG. 2 is a schematic diagram of an image despinning algorithm principle based on bilinear interpolation;
FIG. 3 is a diagram of the effect of pipeline optimization delay;
FIG. 4 is a flow diagram of a dynamic non-polar racemization processing module;
FIG. 5 is a dynamic non-polar racemization system effect demonstration, wherein (a) is before racemization treatment and (b) is after racemization treatment.
Detailed Description
The following description of the embodiments of the present invention will be made with reference to the accompanying drawings.
As shown in fig. 1, the despinning system of the present invention includes a video capture module, a video decoding module, a core processing module and a video encoding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR (double data rate) reading and writing module and a pixel merging module (a four-in-one module) which is innovatively designed for reducing algorithm delay and improving bus bandwidth utilization rate; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus.
The video acquisition module is an industrial camera, the resolution is 1920 multiplied by 1080, the frame frequency is 30Hz or 60Hz, and the video output format is not limited. The video decoding module uses a video decoding chip and is used for converting an input serial video signal into a parallel format video, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC, and the data signal, the effective signal and the synchronizing signal are transmitted to the FPGA for subsequent processing. The video storage module adopts 4 pieces of DDR4 with 16 bits and 128MB to form a piece of DDR with 64 bits and 128MB, because racemization processing needs whole frame image caching, and the space of the on-chip caching in the FPGA is small and is not enough to store the whole frame image, a plug-in memory is needed, and the DDR is finally selected to be plug-in at the ARM end of the Zynq chip, thereby being more beneficial to subsequent operation. The data communication module mainly comprises two parts, wherein one part is communication between the electronic despin system and the main control of the upper computer, which is designed based on RS422, and the stable low-speed transmission protocol can meet the transmission of despin angles in the system; the other is communication between an FPGA end and an ARM end in the Zynq chip, and the communication between the FPGA end and the ARM end adopts an AXI bus communication protocol provided by Xilinx to transmit instruction information and image information through an AXI bus. The video coding module is a video coding chip and is used for converting parallel video data, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC into serial video signals to be output, and finally outputting the serial video signals to a display or an acquisition card to be displayed in real time. The model of a core processing module of the system is Zynq UltraScale + MPSOC15EG, and the Zynq framework chip can fully exert the parallel acceleration function of the FPGA end and the master control scheduling function of the ARM end, and is one of the mainstream chips of the existing heterogeneous system-on-chip. The core of the invention is a four-in-one module and a dynamic non-polar despun module, the algorithm of the dynamic non-polar despun module is deployed at the FPGA end, and the memory scheduling and the communication with the upper computer are carried out at the ARM end.
The invention specifically comprises the following steps:
the method comprises the following steps: video capture and decoding
The invention adopts an industrial camera to collect video images, and carries out video decoding through a decoding chip to obtain parallel videos, a data effective signal DE, a line synchronizing signal HSYNC and a field synchronizing signal VSYNC. The invention is designed based on FPGA AXI data stream, therefore, the related signals obtained by decoding need to be sent to a video to AXI bus video stream module, and parallel video data are converted into AXI bus video stream data, thereby being convenient for realizing the accelerated optimization of the production line in later period with high efficiency.
Step two: immediate neighbor pixel binning
The invention innovatively designs a four-in-one module, a data stream is cached in an on-chip cache every two lines flow in, four 8-bit pixel points adjacent to each pixel are merged into one 32-bit data, then when racemization of bilinear interpolation is carried out on a certain pixel point, the 32-bit data can be taken out and divided into four 8-bit pixel points, namely four pixel points required by the bilinear interpolation, and the function of reading the four pixel points at one time can be realized, so that the algorithm delay can be reduced to one fourth of the original delay.
Step three: video data storage
Caching the 32-bit video stream data merged in the step two into DDR of the ARM through the AXI video stream DDR read-write module;
step four: real-time dynamic non-polar despinning processing of video data
The flow chart of the dynamic non-polar racemization processing module is shown in figure 4. The invention designs a dynamic non-polar racemization algorithm by using Vivado high-level comprehensive technology, and encapsulates the dynamic non-polar racemization algorithm into an IP core, wherein the IP core defines two m _ AXI (AXI host) ports which are respectively used for reading and writing DDR4, the m _ AXI reading port is used for reading original pixel information from a frame buffer area of the DDR4 through an AXI bus, and the original pixel information is output to another frame buffer area of the DDR by using the m _ AXI writing port after dynamic non-polar racemization processing is carried out through the racemization algorithm, thereby completing the whole process of image racemization.
Step five: video encoding and output display
After the despinning processing in the fourth step, the despinned image is cached in a cache area of the DDR, the cached despinned video image is read into the AXI video stream from the DDR by using the AXI video stream DDR read-write module again, the AXI video stream is converted into parallel video data with a dominant synchronous signal by using the AXI bus video stream video module, and the parallel video data is sent into a video coding chip to be coded and output to a monitor or an acquisition card to carry out real-time display of the despinned result.
According to the steps, the host computer gives any racemization angle, and the system can output a racemization result in real time. For example, the rotation angle of the host computer is 0.625 ° clockwise, and the images before and after being processed by the rotation-eliminating system are shown in fig. 5. Fig. 5 (a) is an original image before racemization, it can be seen that the image has a tilt in the horizontal direction, that is, the optical axis is not accurately balanced, and a rotation angle in the counterclockwise direction exists, and the rotation angle is 0.625 ° as measured by the upper computer, so the upper computer issues a racemization angle of 0.625 ° to the racemization system, and as shown in fig. 5 (b), it can be seen that the image after racemization has been balanced in the horizontal direction, and the image after racemization has no sawtooth effect, the accuracy of the racemization angle reaches 0.001 °, and the processing time of the frame video image is less than 12ms, which has high real-time performance.
Details not described in the present specification are prior art known to those skilled in the art.
The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.
Claims (6)
1. A dynamic stepless despinning system based on large-scale integrated circuit high-level synthesis is characterized in that: the system comprises a video acquisition module, a video decoding module, a core processing module and a video coding module; the core processing module adopts a heterogeneous system on chip with an FPGA + ARM architecture; the FPGA comprises a dynamic non-polar despun module, a video-to-AXI bus video stream module, an AXI video stream DDR read-write module and a pixel merging module which is an all-in-one module and is used for reducing algorithm delay and improving the bus bandwidth utilization rate and is innovatively designed; the ARM comprises a video storage module DDR and an RS422 serial port communication module, and data communication between the FPGA and the ARM is carried out by adopting an AXI control bus;
the video acquisition module is used for acquiring an original video image by using a camera, wherein the video image is data to be despuned; the original video image after the acquisition enters a video decoding module;
the video decoding module is used for converting serial videos acquired by the camera into parallel video data and obtaining a series of dominant video synchronous signals, and the parallel video data and the synchronous signals obtained by decoding are sent to the FPGA;
in the FPGA, firstly, a video-to-AXI bus video stream module converts video data into AXI bus video stream data with lower delay and more beneficial to realizing data synchronization and pipeline acceleration optimization, then data in an AXI bus video stream format flows into a four-in-one module, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when four pixels adjacent to one pixel are required to be read subsequently, the merged 32-bit pixel is only required to be read once and is divided into four independent 8-bit data, namely, the function of reading the four pixel points at one time is realized, and the processing utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay; caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;
the dynamic non-polar despinning module is used for dynamically performing non-polar despinning on video data in a video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by the upper computer through the RS422 serial port communication module, and is matched with the four-in-one module during despinning processing to divide 32-bit data read from the DDR into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR; and reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display.
2. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the four-in-one module and the dynamic non-polar despun module are developed by using a high-level comprehensive tool Vivado HLS, and are subjected to pipeline optimization by using a precompiled instruction pipeline, namely a pipeline optimization instruction, so that under the condition that the requirements of one-time input, one-time use and one-time output of data are met, namely that one data can be input only once and can be used only once, and finally, the data needing 8 clock cycles for processing can be processed only by using 4 clock cycles.
3. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: the system also improves the performance of the racemization algorithm in the aspects of data type optimization, namely self-defined bit width data type and data throughput optimization; and performing real-time parallel optimization on the plurality of AXI high-bandwidth buses, and simultaneously reading and writing and processing a plurality of pixel points in a parallel computing mode.
4. The LSI high-level synthesis-based dynamic non-polar racemization system according to claim 1, wherein: in the dynamic non-polar despinning module, an image electronic despinning algorithm based on bilinear interpolation is adopted for real-time despinning, and the method specifically comprises the following steps:
(1) According to the despinning angle sent by the upper computer, the coordinate (x, y) of each pixel point (x ', y') of the video image after the despinning processing corresponding to the pixel point of the video image before the despinning processing is solved
Wherein θ represents the racemic angle, x 0 ,y 0 Respectively representing the horizontal and vertical coordinates of the center of the image;
(2) Pixel mapping using bilinear interpolation
f(x,y)=[f(1,0)-f(0,0)]x+[f(0,1)-f(0,0)]y+[f(1,1)-f(1,0)-f(0,1)-f(0,0)]xy+f(0,0)
Wherein x and y are respectively integer coordinates obtained by rounding off the pixel coordinate points after racemization obtained in the step (1), f (0,0), f (1,0), f (0,1), f (1,1) are pixel gray values of 4 points around the (x, y), and f (x, y) is a pixel gray value obtained by bilinear interpolation at the coordinates of the (x, y);
(3) Determining the boundary of the despun image, wherein the size of the rotated image is generally changed compared with that before the rotation, so that the boundary of the video image needs to be determined again, and the determination of the four boundary positions of the video image, namely the upper boundary position, the lower boundary position, the left boundary position and the right boundary position, is calculated according to the following formula:
left=max(x 1 ,x 2 ,x 3 ,x 4 )
right=min(x 1 ,x 2 ,x 3 ,x 4 )
top=max(y 1 ,y 2 ,y 3 ,y 4 )
bottom=min(y 1 ,y 2 ,y 3 ,y 4 )
(4) And fixing the image resolution, cutting the despin video image by taking the center of the video image as the center, and fixing the output image resolution, namely keeping the same size of the output image.
5. The LSI high-level synthesis-based dynamic despinning system of claim 1, wherein: the heterogeneous system on chip with the FPGA and ARM architecture adopted by the core processing module is a Zynq UltraScale + MPSoC15EG chip.
6. A dynamic non-polar despinning method based on high-level synthesis of a large-scale integrated circuit is characterized by comprising the following implementation steps of:
(1) Converting serial video collected by a camera into parallel video data, obtaining a series of dominant video synchronous signals, and sending the parallel video data and the synchronous signals obtained by decoding to an FPGA;
(2) In the FPGA, video data is converted into AXI bus video stream data with lower delay and better benefit for realizing data synchronization and pipeline acceleration optimization through a video-to-AXI bus video stream module;
(3) Then the data in the AXI bus video stream format flows into a four-in-one module, as the despun processing of bilinear interpolation is carried out subsequently, each pixel is processed, the four pixels adjacent to each pixel are read from the DDR, the four-in-one module realizes that the data stream is cached in an on-chip cache every two lines of flowing in, four 8-bit pixel points around each pixel are merged into one 32-bit data, when the four pixels adjacent to a certain pixel are required to be read subsequently, only the merged 32-bit pixel needs to be read once and is divided into four independent 8-bit data, namely the function of reading the four pixel points once is realized, and the processing fully utilizes the AXI bus bandwidth to reduce the delay to one fourth of the original delay;
(4) Caching the merged 32-bit video stream data into DDR of an ARM through an AXI video stream DDR read-write module;
(5) Then the dynamic non-polar despinning module performs dynamic non-polar despinning on video data in the video data stream cached in the DDR according to a despinning instruction and a despinning angle sent by an upper computer through the RS422 serial port communication module, the four-in-one module is matched during despinning processing, 32-bit data read from the DDR is divided into four 8-bit data for bilinear interpolation, and a processed video image is still stored in the DDR;
(6) Reading the cached deswirled video image from the DDR into the AXI video stream again by using the AXI video stream DDR read-write module, converting the AXI video stream into parallel video data with dominant synchronous signals by using the AXI bus video stream video module, and sending the parallel video data into the video coding module for coding and outputting to a display or an acquisition card for real-time display;
in the steps (3) and (5), the four-in-one module and the dynamic non-polar despin module are developed by using a high-level comprehensive tool Vivado HLS, and a precompiled instruction pipeline is used for carrying out pipeline optimization on the algorithm, so that the programmed program meets the conditions that data is input, used and output once, namely, one data can be input once and used once, and finally, the data needs to be output and output once is subjected to pipeline processing, and the data which needs to be processed in 8 clock cycles originally is processed in 4 clock cycles; in addition, the performance of the despun algorithm is improved from the aspects of data type optimization and data throughput optimization; meanwhile, a plurality of AXI high-bandwidth buses are transferred to perform real-time parallel optimization, and a plurality of pixel points are read and written and processed simultaneously in a parallel computing mode.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111223132.7A CN113962842B (en) | 2021-10-20 | 2021-10-20 | Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111223132.7A CN113962842B (en) | 2021-10-20 | 2021-10-20 | Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113962842A CN113962842A (en) | 2022-01-21 |
CN113962842B true CN113962842B (en) | 2022-12-09 |
Family
ID=79465107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111223132.7A Active CN113962842B (en) | 2021-10-20 | 2021-10-20 | Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113962842B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5832119A (en) * | 1993-11-18 | 1998-11-03 | Digimarc Corporation | Methods for controlling systems using control signals embedded in empirical data |
CN106342328B (en) * | 2008-05-23 | 2012-07-25 | 中国航空工业集团公司洛阳电光设备研究所 | Electronics racemization method for parallel processing based on TIDSP |
CN109658337A (en) * | 2018-11-21 | 2019-04-19 | 中国航空工业集团公司洛阳电光设备研究所 | A kind of FPGA implementation method of image real-time electronic racemization |
-
2021
- 2021-10-20 CN CN202111223132.7A patent/CN113962842B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5832119A (en) * | 1993-11-18 | 1998-11-03 | Digimarc Corporation | Methods for controlling systems using control signals embedded in empirical data |
US5832119C1 (en) * | 1993-11-18 | 2002-03-05 | Digimarc Corp | Methods for controlling systems using control signals embedded in empirical data |
CN106342328B (en) * | 2008-05-23 | 2012-07-25 | 中国航空工业集团公司洛阳电光设备研究所 | Electronics racemization method for parallel processing based on TIDSP |
CN109658337A (en) * | 2018-11-21 | 2019-04-19 | 中国航空工业集团公司洛阳电光设备研究所 | A kind of FPGA implementation method of image real-time electronic racemization |
Non-Patent Citations (1)
Title |
---|
实时图像的电子消旋系统;曾祥萍等;《光电工程》;20051030(第10期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113962842A (en) | 2022-01-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10282805B2 (en) | Image signal processor and devices including the same | |
CN108616717B (en) | Real-time panoramic video splicing display device and method thereof | |
CN109658337B (en) | FPGA implementation method for real-time electronic despinning of images | |
CN109857702B (en) | Laser radar data read-write control system and chip based on robot | |
CN111064906A (en) | Domestic processor and domestic FPGA multi-path 4K high-definition video comprehensive display method | |
US10861243B1 (en) | Context-sensitive augmented reality | |
CN104717485A (en) | VGA interface naked-eye 3D display system based on FPGA | |
CN108053385A (en) | A kind of real-time correction system of flake video and method | |
CN109587421B (en) | HD-SDI/3G-SDI transceiving and real-time picture-in-picture switching output processing method | |
CN112367537A (en) | Video acquisition-splicing-display system based on ZYNQ | |
US20100165014A1 (en) | Display system having resolution conversion | |
CN109873998B (en) | Infrared video enhancement system based on multi-level guide filtering | |
CN104883517A (en) | Three-path high-resolution video stream blending system and method | |
CN109708662B (en) | High-frame-frequency high-precision injection type star atlas simulation test platform based on target identification | |
CN110738594A (en) | FPGA-based onboard electronic instrument image generation method | |
CN111145133A (en) | ZYNQ-based infrared and visible light co-optical axis image fusion system and method | |
CN111770342B (en) | Video stepless scaling method | |
CN113962842B (en) | Dynamic non-polar despinning system and method based on high-level synthesis of large-scale integrated circuit | |
CN101901278A (en) | High-speed data acquisition card and data acquisition method | |
CN111639046B (en) | System and method for caching and transmitting data of far ultraviolet aurora imager in real time | |
CN101793557A (en) | High-resolution imager data real-time acquisition system and method | |
Guo et al. | An FPGA implementation of multi-channel video processing and 4K real-time display system | |
WO2023184754A1 (en) | Configurable real-time disparity point cloud computing apparatus and method | |
CN115002304B (en) | Video image resolution self-adaptive conversion device | |
CN109688314B (en) | Camera system and method with low delay, less cache and controllable data output mode |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |