CN117440168A - Hardware architecture for realizing parallel spiral search algorithm - Google Patents

Hardware architecture for realizing parallel spiral search algorithm Download PDF

Info

Publication number
CN117440168A
CN117440168A CN202311752857.4A CN202311752857A CN117440168A CN 117440168 A CN117440168 A CN 117440168A CN 202311752857 A CN202311752857 A CN 202311752857A CN 117440168 A CN117440168 A CN 117440168A
Authority
CN
China
Prior art keywords
module
pixel
motion estimation
sub
merge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311752857.4A
Other languages
Chinese (zh)
Other versions
CN117440168B (en
Inventor
陈志峰
施隆照
王诗鑫
杨小玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou Shixin Technology Co ltd
Original Assignee
Fuzhou Shixin Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou Shixin Technology Co ltd filed Critical Fuzhou Shixin Technology Co ltd
Priority to CN202311752857.4A priority Critical patent/CN117440168B/en
Publication of CN117440168A publication Critical patent/CN117440168A/en
Application granted granted Critical
Publication of CN117440168B publication Critical patent/CN117440168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/182Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a pixel
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • H04N19/517Processing of motion vectors by encoding
    • H04N19/52Processing of motion vectors by encoding by predictive encoding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/70Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by syntax aspects related to video coding, e.g. related to compression standards

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention relates to a hardware architecture for realizing a parallel spiral search algorithm, which comprises a split pixel interpolation storage module, a UMVP control module, a motion estimation control module, a Merge control module, a cost calculation comparison module and a brightness prediction pixel reconstruction module; the whole pixel motion estimation, the sub-pixel motion estimation and the Merge share the same cost calculation comparison module to calculate and compare the cost, and the utilization rate of hardware resources is high. In the invention, the whole pixel motion estimation can search by taking a spiral search algorithm as a core, the sub-pixel motion estimation adopts a motion vector grouping strategy, the Merge adopts a strategy of cutting the length of Merge candidate list and simultaneously adopts column raster scanning and CU block alternate scanning, thereby effectively reducing the calculation complexity and reducing the clock number required by hardware realization.

Description

Hardware architecture for realizing parallel spiral search algorithm
Technical Field
The invention belongs to the technical field of video encoding and decoding, and particularly relates to a hardware architecture for realizing a parallel spiral search algorithm.
Background
The HEVC video coding standard is newly added with a set of special image segmentation modes based on H.264/AVC, wherein the modes of a coding unit, a prediction unit and a transformation unit are divided, and compared with the H.264, the code stream of the HEVC video under the condition of the same PSNR can be saved by 25% -50%.
The excellent performance of HEVC in terms of coding efficiency benefits from its possession of advanced coding structures, various advanced techniques, but this also makes HEVC far more complex than the h.264 coding format. Inter prediction occupies up to 80% of the complexity in the whole encoding process, and the motion estimation calculation time in inter prediction is about 70% of the whole inter prediction, so that reducing the motion estimation time can effectively reduce the complexity of the whole encoding process. The TZsearch algorithm adopted in the HM16.7 test model can effectively reduce the complexity by more than 93 percent under the condition that the performance loss is only 0.28 percent; however, the position change of the search point is large, the data is difficult to read quickly, the time consumption is high, and the hardware implementation is not facilitated; for the full search algorithm with a fixed search sequence, the complexity is far from meeting the requirement of real-time application. In HEVC, a common inter-frame motion estimation search algorithm needs to iterate CU blocks continuously to obtain optimal MVs and costs of all PU blocks, which is not suitable for video coding with CTUs increasing continuously, and repeatedly calculates pixel residual sums in the iteration process, resulting in repeated computation of many pixels. After the optimal point of the whole pixel motion estimation is determined by the PU block in the HEVC coding standard, 8 sub-pixel points with 1/2 precision around the optimal point are searched first, 8 sub-pixel points around the optimal 1/2 sub-pixel point are searched after the optimal 1/2 sub-pixel point is determined, namely, 16 sub-pixel motion estimation needs to be carried out on each PU block to obtain a final result. Although the complexity of sub-pixel motion estimation is an order of magnitude lower at the algorithm level than whole-pixel motion estimation, a significant amount of clock cycles and logic resources are still required in a hardware implementation. In HEVC, a large loop circuit exists in the conventional Merge implementation, so that the problem of inter-adjacent block motion information interdependence can cause hardware pipeline to break, and adjacent PU blocks need to wait for a long clock cycle interval when calculating Merge prediction.
In order to solve the problems, the invention provides a hardware architecture for realizing a parallel spiral search algorithm. The whole pixel motion estimation in the architecture can search by taking a parallel spiral search algorithm as a core, the algorithm has a fixed search sequence and a higher data multiplexing rate, redundant calculation can be effectively solved, the calculation complexity is reduced, and the calculation of one search point can be completed every four clock cycles; the sub-pixel motion estimation adopts a motion vector grouping strategy, and the sub-pixel motion estimation is carried out on the prediction blocks with the same motion vector at the same time, so that the calculation complexity is effectively reduced; the Merge adopts a strategy of cutting the length of the Merge candidate list and simultaneously adopting column raster scanning and CU block interleaving scanning, so that full-pipeline calculation is realized, and the clock number required by hardware realization is reduced; the whole pixel motion estimation, the sub-pixel motion estimation and the Merge share the same cost calculation comparison module to calculate and compare the cost, and the utilization rate of hardware resources is high.
Disclosure of Invention
In view of this, an object of the present invention is to provide a hardware architecture that implements a parallel spiral search algorithm. The whole pixel motion estimation in the architecture can search by taking a parallel spiral search algorithm as a core, the algorithm has a fixed search sequence and a higher data multiplexing rate, redundant calculation can be effectively solved, the calculation complexity is reduced, and the calculation of one search point can be completed every four clock cycles; the sub-pixel motion estimation adopts a motion vector grouping strategy, and the sub-pixel motion estimation is carried out on the prediction blocks with the same motion vector at the same time, so that the calculation complexity is effectively reduced; the Merge adopts a strategy of cutting the length of the Merge candidate list and simultaneously adopting column raster scanning and CU block interleaving scanning, so that full-pipeline calculation is realized, and the clock number required by hardware realization is reduced; the whole pixel motion estimation, the sub-pixel motion estimation and the Merge share the same cost calculation comparison module to calculate and compare the cost, and the utilization rate of hardware resources is high.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a hardware architecture for implementing a parallel spiral search algorithm, comprising the following features:
the framework comprises a sub-pixel interpolation storage module, a UMVP control module, a motion estimation control module, a Merge control module, a cost calculation comparison module and a brightness prediction pixel reconstruction module; the motion estimation control module and the Merge control module share the cost calculation comparison module to calculate rate distortion cost and cost comparison;
the sub-pixel interpolation storage module is used for cutting reference pixels transmitted from the encoder top layer module into search frames, storing the search frames into the storage module, interpolating and filtering in advance, and storing all 1/2 and 1/4 sub-pixel values in the search frames for calculating cost of motion estimation and Merge;
the UMVP control module calculates the MVP value of the maximum CU of the current CTU as a starting point of whole pixel motion estimation;
the motion estimation control module controls the whole pixel motion estimation and twice sub-pixel motion estimation processes, and outputs the minimum rate distortion cost, the partitioning mode and the motion vector of all CU blocks under four depths;
the Merge control module establishes a candidate list of each PU block, sequentially calculates rate-distortion cost, compares the rate-distortion cost with a motion estimation result and updates an optimal result;
the cost calculation and comparison module is used for calculating the rate distortion cost of the current matching block and comparing the rate distortion cost of each PU block in the process of motion estimation and Merge;
the brightness prediction pixel reconstruction module extracts the prediction pixel values of all the inter-frame blocks from the storage module according to the coding information of the current CTU for the subsequent reconstruction module to use.
Further, the sub-pixel interpolation storage module specifically includes:
the sub-pixel interpolation storage module comprises a sub-pixel interpolation storage control module, a reference pixel storage matrix module, an integral pixel buffer module, a sub-pixel interpolation filtering module, a read address decoding module and a reference pixel screening module;
the sub-pixel interpolation storage control module is used for controlling reading of reference pixels and writing of sub-pixels and is also used for starting the sub-pixel interpolation filtering module.
The whole pixel buffer module is used for buffering the reference pixels input by the reference pixel storage matrix module.
The sub-pixel interpolation filtering module is used for carrying out interpolation filtering on the reference whole pixel so as to obtain the sub-pixel required in sub-pixel motion estimation.
The read address decoding module is used for translating the received MV signals into read addresses of the storage matrix;
and the reference pixel screening module screens the output reference pixels according to the position information to obtain the reference pixels with the size of 32 multiplied by 32 which are used for cost calculation of the inter-frame prediction module.
Further, the motion estimation control module specifically includes:
the motion estimation control module comprises a whole pixel motion estimation control module, a sub-pixel motion estimation control module, a division mode selection module, an optimal cost, a division mode and an MV storage module;
the whole pixel motion estimation control module is used for calculating a motion vector required by whole pixel motion estimation, the sub-pixel interpolation storage module outputs a reference pixel required by whole pixel motion estimation according to the motion vector, the whole pixel motion estimation takes a spiral search algorithm as a core, and searches from a starting point to the periphery in a spiral extending sequence during searching, the algorithm has a fixed search sequence and a higher data multiplexing rate, and the whole pixel motion estimation calculates the cost of all PU blocks in a CTU in a parallel mode;
the sub-pixel motion estimation control module is used for calculating a motion vector required by sub-pixel motion estimation, the sub-pixel interpolation storage module outputs a reference pixel required by the sub-pixel motion estimation according to the motion vector, and the sub-pixel motion estimation adopts a motion vector grouping strategy to combine prediction blocks with the same motion vector to carry out the sub-pixel motion estimation;
the partition mode selection module is used for determining an optimal partition mode of the current CU;
the optimal cost, the optimal dividing mode and the MV storage module are used for storing the minimum cost, the optimal dividing mode and the corresponding motion vector of each CU block.
Further, the Merge control module specifically includes:
the Merge control module comprises an MV selection control module, a time domain MV expansion module, a time domain and space domain reference MV memory module, a CU division mode table module, an MV lookup table module and a FIFO for caching data;
the MV selecting control module directly extracts candidate motion vectors from the MV memory module and the MV lookup table module through coordinates and a dividing mode of the PU blocks, the split pixel interpolation memory module outputs reference pixels required by Merge calculation according to the motion vectors, merge adopts a cutting Merge candidate list length, PU blocks with depth 0 and depth 1 construct a complete candidate list, the first line of PU blocks under depth 2 and depth 3 do not use A0 and B2 blocks, the other lines of PU blocks only use A1, time domain blocks and zero vectors, and Merge adopts a strategy of column raster scanning and interleaving scanning of CU blocks with different depths, and the specific scanning sequence is as follows: depth 0- (depth 1) first column CU- (depth 2) first column CU- (depth 3) second column CU- (depth 2) second column CU- (depth 3 third, four column CU- (depth 1 third CU- (depth 2 third column CU- (depth 3 fifth, six column CU- (depth 1) fourth column CU- (depth 2 fourth column CU- (depth 3 seventh, eight column CU);
the time domain MV expansion module is used for carrying out expansion transformation on the time domain MVs;
the time domain and space domain reference MV memory module is used for storing time domain and space domain reference MVs required by Merge calculation;
the CU partition mode table module is used for storing CU partition modes required by Merge calculation;
the MV lookup table module is used for storing the spatial reference MVs of the current PU block required by Merge calculation, and the spatial reference MVs of the current PU block required by Merge calculation are given by motion estimation.
Further, the cost calculation and comparison module specifically includes:
the cost calculation and comparison module comprises an original pixel buffer module, a reference pixel buffer module, an SAD/SATD calculation module, an MVD bit number calculation module and a cost comparison module, wherein the whole pixel motion estimation, the sub-pixel motion estimation and the Merge share the cost calculation and comparison module to carry out cost calculation and cost comparison;
the original pixel buffer module is used for buffering original pixel data required by motion estimation and mere;
the reference pixel buffer module is used for buffering reference pixel data required by motion estimation and mere;
the SAD/SATD calculation module can select SAD or SATD calculation of the current matching block according to motion estimation and Merge requirements;
the MVD bit number calculation module is used for calculating the head bit number of the current matching block, and the distortion degree of the current matching block is added with the head bit number to obtain rate distortion cost;
the cost comparison module is used for comparing the rate distortion cost of each PU block and selecting the motion information with the minimum rate distortion cost of each PU block;
compared with the prior art, the invention has the following beneficial effects:
the whole pixel motion estimation can search by taking a parallel spiral search algorithm as a core, the algorithm has a fixed search sequence and a higher data multiplexing rate, can effectively solve redundant calculation, reduces calculation complexity, and can finish calculation of one search point every four clock cycles; the sub-pixel motion estimation adopts a motion vector grouping strategy, and the sub-pixel motion estimation is carried out on the prediction blocks with the same motion vector at the same time, so that the calculation complexity is effectively reduced; the Merge adopts a strategy of cutting the length of the Merge candidate list and simultaneously adopting column raster scanning and CU block interleaving scanning, so that full-pipeline calculation is realized, and the clock number required by hardware realization is reduced; the whole pixel motion estimation, the sub-pixel motion estimation and the Merge share the same cost calculation comparison module to calculate and compare the cost, and the utilization rate of hardware resources is high.
Drawings
FIG. 1 is a block diagram of a hardware architecture of the method of the present invention;
FIG. 2 is a schematic diagram of a split pixel interpolation memory module architecture according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a motion estimation control module architecture according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a parallel spiral search sequence in an embodiment of the present invention;
FIG. 5 is a schematic diagram of a Merge control module architecture according to an embodiment of the present invention;
FIG. 6 is a column raster scan schematic of Merge in an embodiment of the invention;
FIG. 7 is a schematic view of a CU block interspersed scan of Merge in an embodiment of the invention;
FIG. 8 is a schematic diagram of a cost calculation comparison module architecture according to an embodiment of the present invention;
FIG. 9 is a schematic illustration of a pipeline in an embodiment of the invention;
FIG. 10 is a schematic diagram of a sequential state machine in an embodiment of the invention.
Detailed Description
The invention will be further described with reference to the accompanying drawings and examples.
Referring to fig. 1, the present invention provides a hardware architecture for implementing a parallel spiral search algorithm, where the architecture includes a split pixel interpolation storage module, a UMVP control module, a motion estimation control module, a Merge control module, a cost calculation comparison module, and a luminance prediction pixel reconstruction module;
referring to fig. 2, the sub-pixel interpolation storage module includes a sub-pixel interpolation storage control module, a reference pixel storage matrix module, an integer pixel buffer module, a sub-pixel interpolation filtering module, a read address decoding module and a reference pixel screening module;
the sub-pixel interpolation storage control module is used for controlling reading of reference pixels and writing of sub-pixels and also used for starting the sub-pixel interpolation filtering module;
the whole pixel buffer module is used for buffering the reference pixels input by the reference pixel storage matrix module;
the sub-pixel interpolation filtering module is used for carrying out interpolation filtering on the reference whole pixel so as to obtain a sub-pixel required in sub-pixel motion estimation;
the read address decoding module is used for translating the received MV signals into read addresses of the storage matrix;
and the reference pixel screening module screens the output reference pixels according to the position information to obtain the reference pixels with the size of 32 multiplied by 32 which are used for cost calculation of the inter-frame prediction module.
Referring to fig. 3, the motion estimation control module includes a whole pixel motion estimation control module, a sub-pixel motion estimation control module, a partition mode selection module, and an optimal cost, partition mode and MV storage module;
the whole pixel motion estimation control module is used for calculating a motion vector required by whole pixel motion estimation, the sub-pixel interpolation storage module outputs a reference pixel required by whole pixel motion estimation according to the motion vector, the whole pixel motion estimation takes a spiral search algorithm as a core, the whole pixel motion estimation searches from a starting point to four sides in a spiral extending sequence during searching, the searching sequence is shown in fig. 4, the algorithm has a fixed searching sequence and a higher data multiplexing rate, and the whole pixel motion estimation calculates the cost of all PU blocks in a CTU in a parallel mode;
the sub-pixel motion estimation control module is used for calculating a motion vector required by sub-pixel motion estimation, the sub-pixel interpolation storage module outputs a reference pixel required by the sub-pixel motion estimation according to the motion vector, and the sub-pixel motion estimation adopts a motion vector grouping strategy to combine prediction blocks with the same motion vector to carry out the sub-pixel motion estimation;
the partition mode selection module is used for determining an optimal partition mode of the current CU;
the optimal cost, the optimal dividing mode and the MV storage module are used for storing the minimum cost, the optimal dividing mode and the corresponding motion vector of each CU block.
Referring to fig. 5, the Merge control module includes an MV selection control module, a time domain MV expansion module, a time domain and space domain reference MV memory module, a CU partition mode table module, an MV lookup table module, and a FIFO for caching data;
the MV selecting control module directly extracts candidate motion vectors from the MV memory module and the MV lookup table module through coordinates and a dividing mode of the PU blocks, the split pixel interpolation memory module outputs reference pixels required by Merge calculation according to the motion vectors, merge adopts a cutting Merge candidate list length, PU blocks with depth 0 and depth 1 construct a complete candidate list, the first line of PU blocks under depth 2 and depth 3 do not use A0 and B2 blocks, the other lines of PU blocks only use A1, time domain blocks and zero vectors, and Merge adopts a strategy of column raster scanning and interleaving scanning of CU blocks with different depths, and the specific scanning sequence is as follows: depth 0-depth 1 first CU-depth 2 first column CU-depth 3 first, second column CU-depth 1 second column CU-depth 3 third, fourth column CU-depth 1 third CU-depth 2 third column CU-depth 3 fifth, six column CU-depth 1 fourth CU-depth 2 fourth column CU-depth 3 seventh, eight column CU, see fig. 6 and 7;
the time domain MV expansion module is used for carrying out expansion transformation on the time domain MVs;
the time domain and space domain reference MV memory module is used for storing time domain and space domain reference MVs required by Merge calculation;
the CU partition mode table module is used for storing CU partition modes required by Merge calculation;
the MV lookup table module is used for storing the spatial reference MVs of the current PU block required by Merge calculation, and the spatial reference MVs of the current PU block required by Merge calculation are given by motion estimation.
Referring to fig. 8, the cost calculation and comparison module includes an original pixel buffer module, a reference pixel buffer module, a SAD/SATD calculation module, an MVD bit number calculation module and a cost comparison module;
the original pixel buffer module is used for buffering original pixel data required by motion estimation and mere;
the reference pixel buffer module is used for buffering reference pixel data required by motion estimation and mere;
the SAD/SATD calculation module can select SAD or SATD calculation of the current matching block according to motion estimation and Merge requirements;
the MVD bit number calculation module is used for calculating the head bit number of the current matching block, and the distortion degree of the current matching block is added with the head bit number to obtain rate distortion cost;
the cost comparison module is used for comparing the rate distortion cost of each PU block and selecting the motion information with the minimum rate distortion cost of each PU block.
Referring to fig. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10, the implementation of the present embodiment includes the following steps:
step S1: the encoder top-layer module inputs the original pixel, reference pixel, coordinates, temporal and spatial reference MV list of the current CTU.
Step S2: the sub-pixel interpolation storage module cuts reference pixels transmitted from the top layer into search frames and stores the search frames into the reference pixel storage matrix module, after the required reference pixels are stored, the sub-pixel interpolation storage control module starts the sub-pixel interpolation filtering module, and then the sub-pixel interpolation filtering module reads whole pixel points from the reference pixel storage matrix module and interpolates and filters out all 1/2 and 1/4 sub-pixel values of the search frames for motion estimation and calculation cost of Merge;
step S3: starting a UMVP control module to calculate the MVP value of the current CTU as a starting point of whole pixel motion estimation;
step S4: the motion estimation control module sequentially carries out integral pixel motion estimation and twice sub-pixel motion estimation, the motion vectors calculated by the integral pixel motion estimation control module and the sub-pixel motion estimation control module are input to a read address decoding module in a sub-pixel interpolation storage module, the read address decoding module translates received MV signals into read addresses of a reference pixel storage matrix module, the reference pixel storage matrix module outputs reference pixels to a reference pixel screening module according to the read addresses, the reference pixel screening module screens and outputs reference pixels with the size of 32 multiplied by 32 to a cost calculation comparison module for cost calculation, the cost calculation comparison module calls a SAD/SATD calculation module and an MVD bit number calculation module to calculate rate distortion cost at a current search point, then the rate distortion cost of each PU block at the current search point is compared with the stored minimum rate distortion cost of each PU block, the optimal cost and a corresponding MV (current CU) are reserved and output to a partition mode selection module in the motion estimation module, and the optimal partition mode of each CU block is determined by the partition mode selection module, and the optimal cost of each CU and the corresponding motion vector and the optimal CU block are stored in the partition mode and the optimal CU;
step S5: starting a Merge control module, sequentially traversing each PU block by an MV selection control module, directly taking out candidate MVs from a time domain and space domain reference MV memory module and an MV lookup table module through coordinates and a division mode of the PU blocks, reading reference pixels from a reference pixel memory matrix module to a cost calculation comparison module according to the candidate MVs, sequentially calculating rate distortion cost by using the cost calculation comparison module, comparing the rate distortion cost with a motion estimation result, updating an optimal result to an optimal cost, a division mode and an MV memory module, and outputting the optimal cost, the optimal division mode and corresponding motion vectors of all CU blocks under four depths by the optimal cost, the division mode and the MV memory module after the Merge is finished;
step S6: and after the Merge mode is finished, starting a brightness prediction pixel reconstruction module, and extracting the prediction pixel values of all the inter-frame blocks from a storage module by the brightness prediction pixel reconstruction module according to the coding information of the current CTU for use by a subsequent reconstruction module.
The hardware circuit of the embodiment adopts the Verilog HDL language to carry out RTL code description, and the Xilinx-based Vivado 2017 platform uses the FPGA device of VCU118 model to carry out synthesis and layout wiring, and the hardware resource consumption of the system is given in the table I.
The whole pixel motion estimation, the sub-pixel motion estimation and the Merge share the same cost calculation comparison module to calculate and compare the cost, so that the utilization rate of hardware resources is high; the whole pixel motion estimation can search by taking a spiral search algorithm as a core, the sub-pixel motion estimation adopts a motion vector grouping strategy, the Merge adopts a strategy of cutting the length of a Merge candidate list and simultaneously adopts column raster scanning and CU block interleaving scanning, so that the calculation complexity is effectively reduced, and the clock number required by hardware realization is reduced.
List one
The foregoing description is only of the preferred embodiments of the invention, and all changes and modifications that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (5)

1. The hardware architecture for realizing the parallel spiral search algorithm is characterized in that: the framework comprises a sub-pixel interpolation storage module, a UMVP control module, a motion estimation control module, a Merge control module, a cost calculation comparison module and a brightness prediction pixel reconstruction module; the motion estimation control module and the Merge control module share the cost calculation comparison module to calculate rate distortion cost and cost comparison;
the sub-pixel interpolation storage module is used for cutting reference pixels transmitted from the encoder top layer module into search frames, storing the search frames into the storage module, interpolating and filtering in advance, and storing all 1/2 and 1/4 sub-pixel values in the search frames for calculating cost of motion estimation and Merge;
the UMVP control module calculates the MVP value of the maximum CU of the current CTU as a starting point of whole pixel motion estimation;
the motion estimation control module controls the whole pixel motion estimation and twice sub-pixel motion estimation processes, and outputs the minimum rate distortion cost, the partitioning mode and the motion vector of all CU blocks under four depths;
the Merge control module establishes a candidate list of each PU block, sequentially calculates rate-distortion cost, compares the rate-distortion cost with a motion estimation result and updates an optimal result;
the cost calculation and comparison module is used for calculating the rate distortion cost of the current matching block and comparing the rate distortion cost of each PU block in the process of motion estimation and Merge;
the brightness prediction pixel reconstruction module extracts the prediction pixel values of all the inter-frame blocks from the storage module according to the coding information of the current CTU for the subsequent reconstruction module to use.
2. The hardware architecture for implementing a parallel spiral search algorithm according to claim 1, wherein the sub-pixel interpolation storage module comprises a sub-pixel interpolation storage control module, a reference pixel storage matrix module, an integer pixel buffer module, a sub-pixel interpolation filter module, a read address decoding module and a reference pixel screening module;
the sub-pixel interpolation storage control module is used for controlling reading of reference pixels and writing of sub-pixels and also used for starting the sub-pixel interpolation filtering module;
the whole pixel buffer module is used for buffering the reference pixels input by the reference pixel storage matrix module;
the sub-pixel interpolation filtering module is used for carrying out interpolation filtering on the reference whole pixel so as to obtain a sub-pixel required in sub-pixel motion estimation;
the read address decoding module is used for translating the received MV signals into read addresses of the storage matrix;
and the reference pixel screening module screens the output reference pixels according to the position information to obtain the reference pixels with the size of 32 multiplied by 32 which are used for cost calculation of the inter-frame prediction module.
3. The hardware architecture for implementing a parallel spiral search algorithm according to claim 1, wherein the motion estimation control module comprises an integer pixel motion estimation control module, a sub-pixel motion estimation control module, a partition mode selection module, and an optimal cost and partition mode and MV storage module;
the whole pixel motion estimation control module is used for calculating a motion vector required by whole pixel motion estimation, the sub-pixel interpolation storage module outputs a reference pixel required by whole pixel motion estimation according to the motion vector, the whole pixel motion estimation takes a spiral search algorithm as a core, and searches from a starting point to the periphery in a spiral extending sequence during searching, the algorithm has a fixed search sequence and a higher data multiplexing rate, and the whole pixel motion estimation calculates the cost of all PU blocks in a CTU in a parallel mode;
the sub-pixel motion estimation control module is used for calculating a motion vector required by sub-pixel motion estimation, the sub-pixel interpolation storage module outputs a reference pixel required by the sub-pixel motion estimation according to the motion vector, and the sub-pixel motion estimation adopts a motion vector grouping strategy to combine prediction blocks with the same motion vector to carry out the sub-pixel motion estimation;
the partition mode selection module is used for determining an optimal partition mode of the current CU;
the optimal cost, the optimal dividing mode and the MV storage module are used for storing the minimum cost, the optimal dividing mode and the corresponding motion vector of each CU block.
4. The hardware architecture for implementing a parallel spiral search algorithm according to claim 1, wherein the Merge control module includes a MV selection control module, a time domain MV expansion module, a time domain and space domain reference MV memory module, a CU partition pattern table module, a MV lookup table module, and a FIFO for caching data;
the MV selecting control module directly extracts candidate motion vectors from the MV memory module and the MV lookup table module through coordinates and a dividing mode of the PU blocks, the split pixel interpolation memory module outputs reference pixels required by Merge calculation according to the motion vectors, merge adopts a cutting Merge candidate list length, PU blocks with depth 0 and depth 1 construct a complete candidate list, the first line of PU blocks under depth 2 and depth 3 do not use A0 and B2 blocks, the other lines of PU blocks only use A1, time domain blocks and zero vectors, and Merge adopts a strategy of column raster scanning and interleaving scanning of CU blocks with different depths, and the specific scanning sequence is as follows: depth 0- (depth 1) first column CU- (depth 2) first column CU- (depth 3) second column CU- (depth 2) second column CU- (depth 3 third, four column CU- (depth 1 third CU- (depth 2 third column CU- (depth 3 fifth, six column CU- (depth 1) fourth column CU- (depth 2 fourth column CU- (depth 3 seventh, eight column CU);
the time domain MV expansion module is used for carrying out expansion transformation on the time domain MVs;
the time domain and space domain reference MV memory module is used for storing time domain and space domain reference MVs required by Merge calculation;
the CU partition mode table module is used for storing CU partition modes required by Merge calculation;
the MV lookup table module is used for storing the spatial reference MVs of the current PU block required by Merge calculation, and the spatial reference MVs of the current PU block required by Merge calculation are given by motion estimation.
5. The hardware architecture for implementing a parallel spiral search algorithm according to claim 1, wherein the cost calculation comparison module includes an original pixel buffer module, a reference pixel buffer module, a SAD/SATD calculation module, an MVD bit number calculation module, and a cost comparison module, where the whole pixel motion estimation, the sub-pixel motion estimation, and the Merge share the cost calculation comparison module to perform cost calculation and cost comparison;
the original pixel buffer module is used for buffering original pixel data required by motion estimation and mere;
the reference pixel buffer module is used for buffering reference pixel data required by motion estimation and mere;
the SAD/SATD calculation module can select SAD or SATD calculation of the current matching block according to motion estimation and Merge requirements;
the MVD bit number calculation module is used for calculating the head bit number of the current matching block, and the distortion degree of the current matching block is added with the head bit number to obtain rate distortion cost;
the cost comparison module is used for comparing the rate distortion cost of each PU block and selecting the motion information with the minimum rate distortion cost of each PU block.
CN202311752857.4A 2023-12-19 2023-12-19 Hardware architecture for realizing parallel spiral search algorithm Active CN117440168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311752857.4A CN117440168B (en) 2023-12-19 2023-12-19 Hardware architecture for realizing parallel spiral search algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311752857.4A CN117440168B (en) 2023-12-19 2023-12-19 Hardware architecture for realizing parallel spiral search algorithm

Publications (2)

Publication Number Publication Date
CN117440168A true CN117440168A (en) 2024-01-23
CN117440168B CN117440168B (en) 2024-03-08

Family

ID=89551935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311752857.4A Active CN117440168B (en) 2023-12-19 2023-12-19 Hardware architecture for realizing parallel spiral search algorithm

Country Status (1)

Country Link
CN (1) CN117440168B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090168887A1 (en) * 2009-01-12 2009-07-02 Mediatek Usa Inc. One step sub-pixel motion esitmation
CN101815218A (en) * 2010-04-02 2010-08-25 北京工业大学 Method for coding quick movement estimation video based on macro block characteristics
KR101356821B1 (en) * 2013-07-04 2014-01-28 상명대학교 천안산학협력단 A motion estimation method
KR20190050207A (en) * 2017-11-02 2019-05-10 한밭대학교 산학협력단 System and method for motion estimation for high-performance hevc encoder
CN110139106A (en) * 2019-04-04 2019-08-16 中南大学 A kind of video encoding unit dividing method and its system, device, storage medium
CN112188207A (en) * 2014-10-31 2021-01-05 三星电子株式会社 Video encoding apparatus and video decoding apparatus using high-precision skip coding and methods thereof
CN112291561A (en) * 2020-06-18 2021-01-29 珠海市杰理科技股份有限公司 HEVC maximum coding block motion vector calculation method, device, chip and storage medium
CN113489986A (en) * 2021-05-28 2021-10-08 杭州博雅鸿图视频技术有限公司 Integer pixel motion estimation method and device, electronic equipment and medium
CN113489987A (en) * 2021-06-11 2021-10-08 翱捷科技股份有限公司 HEVC sub-pixel motion estimation method and device
WO2022061613A1 (en) * 2020-09-23 2022-03-31 深圳市大疆创新科技有限公司 Video coding apparatus and method, and computer storage medium and mobile platform

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090168887A1 (en) * 2009-01-12 2009-07-02 Mediatek Usa Inc. One step sub-pixel motion esitmation
CN101815218A (en) * 2010-04-02 2010-08-25 北京工业大学 Method for coding quick movement estimation video based on macro block characteristics
KR101356821B1 (en) * 2013-07-04 2014-01-28 상명대학교 천안산학협력단 A motion estimation method
CN112188207A (en) * 2014-10-31 2021-01-05 三星电子株式会社 Video encoding apparatus and video decoding apparatus using high-precision skip coding and methods thereof
KR20190050207A (en) * 2017-11-02 2019-05-10 한밭대학교 산학협력단 System and method for motion estimation for high-performance hevc encoder
CN110139106A (en) * 2019-04-04 2019-08-16 中南大学 A kind of video encoding unit dividing method and its system, device, storage medium
CN112291561A (en) * 2020-06-18 2021-01-29 珠海市杰理科技股份有限公司 HEVC maximum coding block motion vector calculation method, device, chip and storage medium
WO2022061613A1 (en) * 2020-09-23 2022-03-31 深圳市大疆创新科技有限公司 Video coding apparatus and method, and computer storage medium and mobile platform
CN113489986A (en) * 2021-05-28 2021-10-08 杭州博雅鸿图视频技术有限公司 Integer pixel motion estimation method and device, electronic equipment and medium
CN113489987A (en) * 2021-06-11 2021-10-08 翱捷科技股份有限公司 HEVC sub-pixel motion estimation method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LONG-ZHAO SHI, ZHIYONG ZHANG, LONG LUO, XIUZHI YANG, ZHIFENG CHEN, XIAOLING YANG, CHEN FU: "Parallel spiral search algorithm applied to integer motion estimation", SIGNAL PROCESSING: IMAGE COMMUNICATION 95 (2021) SIGNAL PROCESSING: IMAGE COMMUNICATION, 16 April 2021 (2021-04-16), pages 1 - 10 *
傅晨,郑明魁,陈志峰,施隆照,王炎: "一种高效的CABAC熵编码硬件设计", 福州大学学报(自然科学版), 30 April 2020 (2020-04-30), pages 174 - 180 *

Also Published As

Publication number Publication date
CN117440168B (en) 2024-03-08

Similar Documents

Publication Publication Date Title
CN100471275C (en) Motion estimating method for H.264/AVC coder
Wang et al. A fast algorithm and its VLSI architecture for fractional motion estimation for H. 264/MPEG-4 AVC video coding
CN103414895A (en) Encoder intra-frame prediction device and method applicable to HEVC standards
CN113489987B (en) HEVC sub-pixel motion estimation method and device
US8509567B2 (en) Half pixel interpolator for video motion estimation accelerator
CN102148990B (en) Device and method for predicting motion vector
CN107087171A (en) HEVC integer pixel motion estimation methods and device
CN1589028B (en) Predicting device and method based on pixel flowing frame
CN101860747B (en) Sub-pixel movement estimation system and method
CN113436057B (en) Data processing method and binocular stereo matching method
CN102647595B (en) AVS (Audio Video Standard)-based sub-pixel motion estimation device
CN101778280B (en) Circuit and method based on AVS motion compensation interpolation
CN117440168B (en) Hardware architecture for realizing parallel spiral search algorithm
CN113810715A (en) Video compression reference image generation method based on void convolutional neural network
Kao et al. A memory-efficient and highly parallel architecture for variable block size integer motion estimation in H. 264/AVC
CN100568920C (en) The method and apparatus of the video image brightness interpolating of serial input and line output
CN109889851A (en) Block matching method, device, computer equipment and the storage medium of Video coding
CN100469146C (en) Video image motion compensator
CN102625091B (en) Inter prediction method based on AVS
CN102625095B (en) AVS-based interframe prediction method
CN102625093B (en) Interframe prediction method base on AVS
KR100571907B1 (en) Method for determining the number of processing element in moving picture estimating algorithm
JP2868457B2 (en) Motion vector search device
KR100424686B1 (en) Method and apparatus for encoding object image
CN102625094B (en) AVS-based interframe prediction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant