WO2022143215A1 - 一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品 - Google Patents

一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2022143215A1
WO2022143215A1 PCT/CN2021/139051 CN2021139051W WO2022143215A1 WO 2022143215 A1 WO2022143215 A1 WO 2022143215A1 CN 2021139051 W CN2021139051 W CN 2021139051W WO 2022143215 A1 WO2022143215 A1 WO 2022143215A1
Authority
WO
WIPO (PCT)
Prior art keywords
reference frame
frame type
prediction
block
sub
Prior art date
Application number
PCT/CN2021/139051
Other languages
English (en)
French (fr)
Inventor
张宏顺
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP21913956.5A priority Critical patent/EP4246970A4/en
Priority to JP2023518518A priority patent/JP2023543200A/ja
Publication of WO2022143215A1 publication Critical patent/WO2022143215A1/zh
Priority to US18/079,216 priority patent/US20230107111A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/107Selection of coding mode or of prediction mode between spatial and temporal predictive coding, e.g. picture refresh
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/103Selection of coding mode or of prediction mode
    • H04N19/105Selection of the reference unit for prediction within a chosen coding or prediction mode, e.g. adaptive choice of position and number of pixels used for prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/137Motion inside a coding unit, e.g. average field, frame or block difference
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/157Assigned coding mode, i.e. the coding mode being predefined or preselected to be further used for selection of another element or parameter
    • H04N19/159Prediction type, e.g. intra-frame, inter-frame or bidirectional frame prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/60Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding
    • H04N19/61Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using transform coding in combination with predictive coding

Definitions

  • the present application relates to video coding technology, and relates to an inter-frame prediction method, apparatus, electronic device, computer-readable storage medium, and computer program product.
  • Video coding is widely used in video transmission, and the future development trend of video is high definition, high frame rate, and high compression rate.
  • the video frame needs to be divided into coding units (Coding Unit, CU).
  • the reference frame can start to predict and get the predicted value.
  • Embodiments of the present application provide an inter-frame prediction method, apparatus, electronic device, computer-readable storage medium, and computer program product, which can improve the efficiency of video coding.
  • An embodiment of the present application provides an inter-frame prediction method, including:
  • the current prediction mode of the current prediction block is a preset prediction mode
  • the historical prediction mode is a prediction mode that completes prediction before the preset prediction mode
  • the sub-prediction The block is obtained by dividing the current prediction block into blocks by using the sub-block division type before the current sub-block division type;
  • a reference frame of the preset prediction mode is determined, and an inter-frame prediction is performed on the current prediction block by using the reference frame to obtain a prediction value corresponding to the current prediction block.
  • An embodiment of the present application provides an inter-frame prediction apparatus, including:
  • a mode determination module configured to determine a historical prediction mode corresponding to the current prediction block when the current prediction mode of the current prediction block is a preset prediction mode; the historical prediction mode is to complete the prediction before the preset prediction mode prediction model;
  • An information acquisition module configured to acquire adjacent block information of adjacent blocks of the current prediction block, sub-prediction block information of sub-prediction blocks, and historical optimal reference frames of the current prediction block in the historical prediction mode Type; the sub-prediction block is obtained by dividing the current prediction block into blocks by using the sub-block division type before the current sub-block division type;
  • a template generation module configured to generate a reference frame template based on the historically optimal reference frame type, the adjacent block information, the sub-prediction block information, and the frame type corresponding to the current prediction block;
  • an information prediction module configured to use the reference frame template to determine a reference frame of the preset prediction mode, use the reference frame to perform inter-frame prediction on the current prediction block, and obtain a prediction corresponding to the current prediction block value.
  • An embodiment of the present application provides an electronic device for inter-frame prediction, including:
  • the processor is configured to implement the inter-frame prediction method provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • Embodiments of the present application provide a computer-readable storage medium storing executable instructions for causing a processor to execute the inter-frame prediction method provided by the embodiments of the present application.
  • Embodiments of the present application provide a computer program product, including computer programs or instructions, which, when executed by a processor, implement the inter-frame prediction method provided by the embodiments of the present application.
  • the embodiment of the present application has the following beneficial effects: when the current prediction mode is a preset prediction mode, the historical prediction mode corresponding to the current prediction preset mode is determined, and then the historical optimal reference of the current prediction block in the historical prediction mode is obtained Frame type, the information of the adjacent blocks of the current prediction block in the historical prediction mode, the sub-prediction block information corresponding to the sub-prediction blocks obtained by dividing the current prediction block in the historical prediction mode, so that the current prediction block directly inherits Various information corresponding to the historical prediction mode that has been performed before the preset prediction mode, and then, when the prediction mode is preset, a reference frame template is adaptively generated for the current prediction block. In this way, the preset prediction mode is fully considered. Due to the characteristics of video coding, some existing information is directly used to generate a reference frame template, which greatly reduces the computational complexity and improves the efficiency of video coding.
  • Figure 1 is an example diagram of an encoding framework of AV1
  • Fig. 2 is the segmentation rule example diagram of CU
  • 3 is a schematic diagram of MVP selection of different single reference frame modes
  • Fig. 4 is a process schematic diagram of selecting the most suitable mode combination
  • FIG. 5 is a schematic diagram of an optional architecture of a video coding system provided by an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of an electronic device for inter-frame prediction provided by an embodiment of the present application.
  • FIG. 7 is an optional schematic flowchart 1 of an inter-frame prediction method provided by an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a positional relationship between a current prediction block and an adjacent block provided by an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a sequence of 10 different sub-block division types provided by an embodiment of the present application.
  • FIG. 10 is an optional second schematic flowchart of the inter-frame prediction method provided by the embodiment of the present application.
  • FIG. 11 is a schematic diagram of a reference relationship between an I frame, a P frame, a B frame, a b frame, and a non-reference B frame provided by an embodiment of the present application;
  • FIG. 12 is a schematic diagram of the reference relationship of GOP16 provided by the embodiment of the present application.
  • FIG. 13 is a schematic diagram of a process of generating a reference frame template for NEWMV mode provided by an embodiment of the present application
  • Figure 14 provides a schematic diagram one of the process of generating an initialization template
  • Figure 15 provides a schematic diagram two of the process of generating an initialization template
  • FIG. 16 is a schematic diagram of a process of applying a reference frame template in a NEWMV mode provided by an embodiment of the present application.
  • first ⁇ second involved is only to distinguish similar objects, and does not represent a specific ordering of objects. It is understood that “first ⁇ second” can be mutually The specific order or sequence may be changed to enable the embodiments of the application described herein to be implemented in sequences other than those illustrated or described herein.
  • Intra-frame coding is a coding method that uses the correlation existing between adjacent pixels in a video frame of a video image to reduce the spatial redundancy between adjacent pixels.
  • Inter-frame coding is a coding method that uses the similarity between adjacent frames in a video image to eliminate temporal redundancy between adjacent frames, thereby improving video coding efficiency.
  • Motion Vector is a vector that marks the positional relationship between the current block and the reference block when performing inter-frame prediction. Since there is a certain correlation between the image contents in adjacent frames in inter-frame coding, the frame image is divided into several blocks, the position of each block in adjacent video frames is searched, and the difference between the two is calculated. The relative offset of the spatial position between the two, the obtained relative offset is the motion vector.
  • Motion Estimation which refers to the process of estimating motion vectors.
  • the motion vector prediction value refers to the initial position of the MV derived from the adjacent blocks of the current block.
  • MVD Motion Vector Difference
  • the difference between the MV predicted value and the actual value (MVD) can also be encoded to reduce the consumption of bits.
  • Rate Distortion Cost (Rate Distortion Cost, RDCost), used to optimize multiple coding modes.
  • RDCost Rate Distortion Cost
  • dist is the distortion, that is, the residual signal between the original pixel and the predicted pixel of the pixel block, bit is the smallest unit of information, and ⁇ is the Lagrange multiplier.
  • the Hadamard transform algorithm (Sum of Absolute Transformed Difference, SATD) is a way of calculating distortion, which is obtained by performing Hadamard transform on the residual signal and then calculating the sum of the absolute values of each element. Compared with SAD, SATD has a larger amount of calculation, but the calculation accuracy will be higher.
  • Sum of the Squared Errors which is another way to calculate distortion, is the sum of squares of the errors of the original pixels and the reconstructed pixels.
  • the calculation of SSE requires the process of transforming, quantizing, inverse quantizing and inverse transforming the residual signal. Although the calculation complexity is relatively large, the estimated codeword is the same as the real encoding, and the selected encoding mode saves the most codewords.
  • Video coding is widely used in video transmission.
  • the future development trend of video is high definition, high frame rate, and high compression rate, which requires the continuous upgrading of the compression efficiency of video coding.
  • the first-generation video coding standard (AV1) has gained enormous attention since its introduction.
  • AV1 Compared with other video coding technologies, such as High Efficiency Video Codinh (HEVC) and Advanced Video Coding (AVC), AV1 has a higher compression rate. Bandwidth can be reduced by 30%, and both for streaming media and pictures, AV1 can be used for encoding transmission, which can be widely used in screen sharing and video game streaming.
  • HEVC High Efficiency Video Codinh
  • AVC Advanced Video Coding
  • Figure 1 is an example of the coding framework of AV1.
  • the electronic device first divides the incoming current video frame 1-1 into multiple 128 ⁇ 128 Coding Tree Units (CTU), and then Each CTU is divided into rectangular coding units (Coding Unit, CU) according to 10 different division rules, and each CU contains multiple prediction modes and transform units (Transform Unit, TU).
  • the electronic device performs inter-frame prediction 1-2 or intra-frame prediction 1-3 for each CU to obtain a prediction value.
  • inter-frame prediction 1-2 includes motion estimation (Motion Estimation, ME) 1-21 and motion compensation (Motion Compensation, MC) 1-22, which need to use reference frames 1-12;
  • Intra-frame prediction 1-3 includes prediction Mode selection 1-31 and prediction 1-32.
  • the electronic device subtracts the predicted value from the input value of each CU to obtain a residual value, then transforms the residual value by 1-4, quantizes 1-5 to obtain a residual coefficient, and then performs entropy encoding on the residual coefficient 1 -6, get the output code stream.
  • the electronic device will also perform inverse quantization 1-7 and inverse transform 1-8 on the residual coefficients to obtain the residual value of the reconstructed image. By adding the sum, the reconstructed image can be obtained, and the intra-frame prediction mode selection 1-31 is performed according to the reconstructed image, and the intra-frame prediction 1-32 is performed.
  • the electronic device also needs to filter the reconstructed image 1-9, the filtered reconstructed image is the reconstructed frame 1-11 corresponding to the current video frame 1-1, and the reconstructed frame 1-11 will enter the reference frame queue to As a reference frame for the next video frame, it is sequentially encoded backwards.
  • FIG. 2 is an example diagram of the segmentation rules for CU.
  • CU has a total of 10 segmentation rules, namely non-segmentation (NONE) 2-1, quarter division (SPLIT). ) 2-2, horizontal bisection (HORZ) 2-3, vertical bisection (VERT) 2-4, horizontal quarter (HORZ_4) 2-5, first horizontal third (HORZ_A) 2-6, Two horizontal threes (HORZ_B) 2-7, first vertical threes (VERT_A) 2-8, second vertical threes (VERT_B) 2-9, vertical quarters (VERT_4) 2-10.
  • NONE non-segmentation
  • SPLIT quarter division
  • each CU includes multiple prediction modes, that is, each CU includes an intra-frame prediction mode and an inter-frame prediction mode.
  • the electronic device first compares different prediction modes within the same prediction type to obtain the optimal prediction mode, and then compares the intra prediction mode and the inter prediction mode to find the optimal prediction mode for each CU.
  • the electronic device needs to select an optimal TU from the multiple TUs included in each CU, and then divide the current video frame into CUs.
  • Intra prediction modes include the following: mean prediction based on upper and left reference pixels (DC_PRED), horizontal and vertical difference combined prediction (SMOOTH_PRED), vertical interpolation prediction (SMOOTH_V_PRED), horizontal interpolation prediction (SMOOTH_H_PRED), gradient The smallest direction prediction (PEATH_PRED) and the prediction of 8 different main directions, the predictions of these 8 different main directions are: vertical direction prediction (V_PRED), horizontal direction prediction (H_PRED), 45 degree angle direction prediction (D45_PRED), 67 Angle direction prediction (D67_PRED), 113 degree angle direction prediction (D113_PRED), 135 degree angle direction prediction (D135_PRED), 157 degree angle direction prediction (D157_PRED), 203 degree angle direction prediction (D203_PRED). Each main direction also includes 6 angular offsets, which are plus or minus 3 degrees, plus or minus 6 degrees, plus or minus 9 degrees. In some cases, intra prediction modes may also include palette prediction modes and intra block copy prediction.
  • 4 single reference frame modes and 8 combined reference frame modes for inter prediction There are 4 single reference frame modes and 8 combined reference frame modes for inter prediction.
  • 4 single reference frame modes refer to using a single reference frame for prediction, 4 but reference frame modes can include: NEARESTMV, NEARMV, GLOBALMV and NEWMV;
  • 8 combined reference frame modes refer to using combined reference frames for prediction , which includes: NEAREST_NEARESTMV, NEAR_NEARMV, NEAREST_NEWMV, NEW_NEARESTMV, NEAR_NEWMV, NEW_NEARMV, GLOBAL_GLOBALMV, and NEW_NEWMV.
  • NEARESTMV and NEARMV mean that the MV of the predicted block is derived from the surrounding block information, and does not need to transmit the MVD
  • NEWMV refers to the derived MVD
  • GLOBALMV refers to the predicted block's MV information derived from the global motion. It can be seen that NEARESTMV, NEARMV and NEWMV all rely on the derivation of MVP, and for a given reference frame, AV1 will calculate 4 MVPs according to the MVP derivation rules.
  • the derivation rules for MVP are as follows:
  • the electronic device skip scans the block information in the left 1, 3, and 5 columns and the upper 1, 3, and 5 rows of the current block in a certain manner, and first selects the block that uses the same reference frame as the current block to deduplicate the MV. If the number of MVs after deduplication is less than 8, select the reference frame in the same direction and continue to add MVs; if the number of MVs is still less than 8, use the global motion vector to fill until 8 MVs are selected. After that, the electronic device will sort the selected 8 MVs, and select the most important 4 MVs according to the sorting result. After that, the electronic device will select the corresponding MVP for the three single reference frame modes of NEARESTMV, NEARMV and NEWMV from the four MVs.
  • FIG. 3 is a schematic diagram of MVP selection in different single reference frame modes.
  • the electronic device selects the block that uses the ref1 reference frame, selects 8 MVs, and then selects the most important 4 MV1s (MV2 and MV3 are the MVs of ref2 and ref3 respectively), and then uses the 0th MV1 as the MVP corresponding to NEARESTMV , use one of the 1st to 3rd MV1 as the MVP of NEARMV, and use one of the 0th to 2nd MV1 as the MVP of NEWMV.
  • the electronic device can also determine ZEROMV as ⁇ 0, 0 ⁇ .
  • FIG. 4 is a schematic diagram of the process of selecting the most suitable mode combination.
  • the process of selecting the most suitable mode combination may include:
  • n is the serial number of the current MVP.
  • the optimal motion vector down-interpolation direction is optimal.
  • Embodiments of the present application provide an inter-frame prediction method, apparatus, electronic device, computer-readable storage medium, and computer program product, which can improve the efficiency of video coding.
  • the following describes the electronic device for inter-frame prediction provided by the embodiments of the present application
  • the electronic device provided in this embodiment of the present application may be implemented as a terminal, and may also be implemented as a server.
  • the server may be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, or may provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, and cloud communications.
  • terminals can be smartphones, tablets, laptops, desktop computers, smart speakers, smart watches, Smart home appliances, vehicle terminals, etc., but not limited to this.
  • the terminal and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • FIG. 5 is a schematic diagram of an optional architecture of a video encoding system provided by an embodiment of the present application.
  • the electronic device 400 reads in the video frame waiting to be encoded 200 , and then the video frame 200 is divided into a plurality of image blocks, and one of them is selected as the current prediction block 300 .
  • the electronic device 400 first determines the current prediction mode of the current prediction block 300, and when the current prediction mode of the current prediction block is the preset prediction mode, determines the historical prediction mode corresponding to the current prediction block, that is, determines that the current prediction block is in the current prediction block.
  • the prediction mode in which the prediction was done before the preset prediction mode.
  • the electronic device 400 obtains the adjacent block information of the current prediction block 300, the sub-prediction block information of the sub-prediction block, and the historical optimal reference frame type of the current prediction block when it is in the historical prediction mode, wherein the sub-prediction block uses the current sub-block The sub-block division type before the division type, obtained by dividing the current prediction block.
  • the electronic device 400 generates the reference frame template 500 based on the historical optimal reference frame type, the adjacent block information sub-prediction block information, and the frame type corresponding to the current prediction block.
  • the electronic device 400 uses the reference frame template 500 to determine the reference frame corresponding to the preset prediction mode, and uses the reference frame to perform inter-frame prediction on the current prediction block to obtain the prediction value corresponding to the current prediction block.
  • the electronic device 400 will calculate the residual corresponding to the predicted value, and then obtain the final code stream by transforming, quantizing and entropy encoding the residual.
  • FIG. 6 is a schematic structural diagram of an electronic device for inter-frame prediction provided by an embodiment of the present application.
  • the electronic device 400 shown in FIG. 6 includes: at least one processor 410, a memory 450, at least one network interface 420 and User interface 430 .
  • the various components in electronic device 400 are coupled together by bus system 440 .
  • the bus system 440 is used to implement the connection communication between these components.
  • the bus system 440 also includes a power bus, a control bus, and a status signal bus.
  • the various buses are labeled as bus system 440 in FIG. 6 .
  • the processor 410 may be an integrated circuit chip with signal processing capabilities, such as a general-purpose processor, a digital signal processor (DSP, Digital Signal Processor), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., where a general-purpose processor may be a microprocessor or any conventional processor or the like.
  • DSP Digital Signal Processor
  • User interface 430 includes one or more output devices 431 that enable presentation of media content, including one or more speakers and/or one or more visual display screens.
  • User interface 430 also includes one or more input devices 432, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, and other input buttons and controls.
  • Memory 450 may be removable, non-removable, or a combination thereof.
  • Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like.
  • Memory 450 optionally includes one or more storage devices that are physically remote from processor 410 .
  • Memory 450 includes volatile memory or non-volatile memory, and may also include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM, Read Only Memory), and the volatile memory may be a random access memory (RAM, Random Access Memory).
  • ROM read-only memory
  • RAM random access memory
  • the memory 450 described in the embodiments of the present application is intended to include any suitable type of memory.
  • memory 450 is capable of storing data to support various operations, examples of which include programs, modules, and data structures, or subsets or supersets thereof, as exemplified below.
  • the operating system 451 includes system programs for processing various basic system services and performing hardware-related tasks, such as framework layer, core library layer, driver layer, etc., for implementing various basic services and processing hardware-based tasks;
  • a presentation module 453 for enabling presentation of information (eg, a user interface for operating peripherals and displaying content and information) via one or more output devices 431 (eg, a display screen, speakers, etc.) associated with the user interface 430 );
  • An input processing module 454 for detecting one or more user inputs or interactions from one of the one or more input devices 432 and translating the detected inputs or interactions.
  • the inter-frame prediction apparatus provided by the embodiments of the present application may be implemented in software.
  • FIG. 6 shows the inter-frame prediction apparatus 455 stored in the memory 450, which may be software in the form of programs and plug-ins, etc. It includes the following software modules: mode determination module 4551, information acquisition module 4552, template generation module 4553 and information prediction module 4554, these modules are logical, and therefore can be arbitrarily combined or further divided according to the realized functions. The function of each module will be explained below.
  • the inter-frame prediction apparatus provided by the embodiments of the present application may be implemented in hardware.
  • the inter-frame prediction apparatus provided by the embodiments of the present application may be a processor in the form of a hardware decoding processor. be programmed to execute the inter-frame prediction method provided by the embodiments of the present application, for example, a processor in the form of a hardware decoding processor may adopt one or more application-specific integrated circuits (ASIC, Application Specific Integrated Circuit), DSP, programmable logic Device (PLD, Programmable Logic Device), Complex Programmable Logic Device (CPLD, Complex Programmable Logic Device), Field Programmable Gate Array (FPGA, Field-Programmable Gate Array) or other electronic components.
  • ASIC Application Specific Integrated Circuit
  • DSP digital signal processor
  • PLD programmable logic Device
  • CPLD Complex Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • an electronic device for inter-frame prediction including:
  • the processor is configured to implement the inter-frame prediction method provided by the embodiment of the present application when executing the executable instructions stored in the memory.
  • the inter-frame prediction method provided by the embodiment of the present application will be described with reference to the exemplary application and implementation of the electronic device provided by the embodiment of the present application.
  • the embodiments of the present application may be implemented by means of cloud technology.
  • cloud technology is a hosting technology that integrates hardware, software, network and other series of resources in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data.
  • FIG. 7 is a first schematic flowchart of an optional inter-frame prediction method provided by an embodiment of the present application, which will be described with reference to the steps shown in FIG. 7 .
  • the embodiments of the present application are implemented in the scenario of encoding video.
  • the electronic device first divides the input video frame into multiple image blocks, and the current prediction block is the image block that is predicted at the current moment among the multiple image blocks.
  • the electronic device uses different reference frame modes each time to predict the current prediction block, and the current prediction mode is the reference frame mode used to predict the current prediction block at the current moment.
  • the electronic device determines that the current prediction mode is the preset prediction mode, it will collect the prediction modes that have been completed before the preset prediction mode (the corresponding reference frame template is determined), and determine these prediction modes as historical predictions model. That is, the historical prediction mode is the prediction mode in which the prediction is completed before the preset prediction mode.
  • the preset prediction mode may be a single reference frame mode of NEWMV, or may be a combined reference frame mode including NEWMV, which is not limited in this embodiment of the present application.
  • the historical prediction mode of which can be any one of NEAEESTMV, NEARMV, and GLOBALMV, or a combination of NEAEESTMV, NEARMV, and GLOBALMV.
  • the electronic device can directly obtain the historical optimal reference frame type. frame type.
  • the sub-prediction block refers to the sub-block division type before the current sub-block division type, which is obtained by dividing the current prediction block into blocks.
  • the prediction block is known, and the information of the sub-prediction block is known.
  • the adjacent block information corresponding to the adjacent block is also known. Therefore, in this embodiment of the present application, the electronic device can directly acquire the sub-prediction block information and the adjacent block information.
  • the adjacent block information may include the motion vector of the adjacent block, the reference frame type corresponding to the adjacent block, that is, the adjacent reference frame type, and may also include the number of adjacent blocks, etc., which is not limited in this application. .
  • the sub-prediction block information may include the reference frame type of the sub-prediction block, that is, the historical sub-reference frame type, and may also include the number of sub-prediction blocks, etc., which is not limited in this application.
  • adjacent blocks may refer to image blocks on the left, upper left, upper, and upper right of the current prediction block, and may also refer to image blocks located in the 1st, 3rd, and 5th columns to the left of the current prediction block, and The image blocks on the 1st, 3rd, and 5th rows above the current prediction block are not limited in this application.
  • the embodiment of the present application provides a schematic diagram of the positional relationship between the current prediction block and the adjacent blocks.
  • the image block E is the current prediction block, the image block A, the image block B, the image block C, and the image block D. Both are adjacent blocks of the current prediction block.
  • the optimal reference frame type selected by different prediction modes of the same image block may all be the same reference frame type.
  • the historical optimal reference frame type can be directly inherited as the reference frame part of the template.
  • the adjacent block is relatively close to the current prediction block, and its content may be similar. Therefore, a part of the reference frame template for the current prediction block can be generated based on the information such as the reference frame type and motion vector of the adjacent block.
  • the sub-prediction block is determined by using the sub-block division type before the current sub-block division type, and is closely related to the current prediction block. Therefore, the sub-prediction block information can also be directly used to generate a part of the reference frame template.
  • the current prediction block there may be 10 different sub-block division types, namely NONE division prediction, HORZ division prediction, VERT division prediction, SPLIT division prediction, HORZ_4 division prediction, HORZ_A division prediction, HORZ_B division prediction, VERT_A division prediction Partition prediction, VERT_B partition prediction, and VERT_4 partition prediction.
  • the order of the 10 sub-block division types is not fixed, and there can be a combination of various orders.
  • This embodiment of the present application provides a schematic diagram of a sequence of 10 different sub-block division types. Referring to FIG.
  • these 10 sub-block division types can be divided into prediction 9-1 according to NONE, prediction of HORZ division 9-2, and prediction of VERT division 9 -3, SPLIT division forecast 9-4, HORZ_A division forecast 9-5, HORZ_B division forecast 9-6, VERT_A division forecast 9-7, VERT_B division forecast 9-8, HORZ_4 division forecast 9-9 and VERT_4 division forecast 9- 10, when the current sub-block division type is HORZ_A division prediction 9-5, the electronic device can directly obtain NONE division prediction 9-1, HORZ division prediction 9-2, VERT division prediction 9-3, SPLIT division prediction 9-4 Sub-prediction block information of sub-prediction blocks corresponding to these sub-block division types.
  • S104 Determine a reference frame of a preset prediction mode by using a reference frame template, and perform inter-frame prediction on the current prediction block by using the reference frame to obtain a prediction value corresponding to the current prediction block.
  • the electronic device After the electronic device determines the reference frame template, it will compare the candidate reference frame type with the reference frame template for consistency. When the candidate reference frame type is consistent with the reference frame template, the candidate reference frame type is used as the reference of the current prediction block. frame, and then perform inter-frame prediction on the current prediction block according to the reference frame, so as to obtain the prediction value corresponding to the current prediction block, until all candidate reference frame types are cycled. Thus, the application of the reference frame template is completed.
  • the current prediction mode when the current prediction mode is a preset prediction mode, the historical prediction mode corresponding to the current prediction preset mode is determined, and then the historical optimal reference frame type of the current prediction block in the historical prediction mode is obtained, The information of the adjacent blocks of the current prediction block in the historical prediction mode, and the sub-prediction block information corresponding to the sub-prediction blocks obtained by dividing the current prediction block in the historical prediction mode, so that the current prediction block directly inherits the information in the preset Various information corresponding to the historical prediction mode that has been performed before the good prediction mode, and then when the prediction mode is preset, the reference frame template is adaptively generated for the current prediction block, so that the preset prediction mode is fully considered. In terms of the characteristics of video encoding, some existing information is directly used to generate a reference frame template, which greatly reduces the computational complexity and improves the efficiency of video encoding.
  • the adjacent block information includes: the motion vector of the adjacent block, the reference frame type of the adjacent block, and the number of adjacent blocks;
  • the sub-prediction block information includes: the reference frame type of the sub-prediction block, The number of sub-prediction blocks.
  • FIG. 10 is an optional second schematic flowchart of the inter-frame prediction method provided by the embodiment of the present application, based on the historical optimal reference frame type, adjacent block information, sub-prediction block information, and For the frame type corresponding to the current prediction block, generating a reference frame template, that is, the specific implementation process of S103, may include: S1031-S1034, as follows:
  • the range of candidate reference frame types is actually limited.
  • the electronic device first uses the reference frame type of the sub-prediction block and the reference frame type of the adjacent block to determine the number of times each candidate reference frame type has been selected, and then based on the number of times selected.
  • a suitable reference frame type that can be used as the current prediction block in the preset prediction mode is selected from each candidate reference frame type.
  • the electronic device can also directly inherit the historical optimal reference frame type, that is, the current prediction block best matches the historical prediction mode reference frame type.
  • the electronic device determines the selected appropriate reference frame type and the inherited optimal reference frame type as the initial template of the current prediction block.
  • the electronic device constructs an error condition when using each candidate reference frame type to perform prediction according to the motion vector of the adjacent block, and then according to the error condition, from each candidate reference frame type, determines that the current prediction block is being performed on the current prediction block.
  • a candidate reference frame type with a smaller error is generated, and the candidate reference frame type is used as the main template corresponding to the current prediction block.
  • the electronic device may further classify each candidate reference frame type, for example, into a forward reference class, a backward reference class, and a long-term reference Among the reference categories, the most suitable candidate reference frame type is selected, and the main template corresponding to the current prediction block is generated by using the selected candidate reference frame type.
  • S1033 Determine an enhancement template corresponding to the current prediction block by using the frame type of the current prediction block, the number of adjacent blocks and the number of sub-prediction blocks.
  • the electronic device In addition to determining the initial template and the main template, the electronic device also needs to determine the enhanced template for the current prediction block, so as to avoid using the enhanced template to ensure the final reference when both the initial template and the main template are ineffective.
  • the quality of the frame template The electronic device first uses the frame type of the current prediction block to determine a threshold, then uses the number of adjacent blocks and the number of sub-prediction blocks to calculate the number of inter-frame prediction blocks, and compares the number of inter-frame prediction blocks with the determined threshold. By comparing, it is determined whether some candidate reference frame types are to be used as enhancement templates. In this way, the electronic device completes the generation of the enhanced template.
  • the electronic device after obtaining the initial template, the main template and the enhanced template, the electronic device integrates the initial template, the main template and the enhanced template into a set, and this set constitutes the preset prediction of the current prediction block.
  • the reference frame template when in mode.
  • the electronic device may further fuse the initial template, the main template and the enhanced template to obtain a reference frame template corresponding to the current prediction block.
  • the electron can use the inherited information and parameters to generate a main template, an initial template and an enhanced template for the current prediction block respectively, and then integrate these three templates to obtain the reference frame template corresponding to the current prediction block , so as to use the reference frame template to determine the reference frame subsequently.
  • the initial template corresponding to the current prediction block is determined, that is, the specific implementation process of S1031, Can include: S1031a-S1031c, as follows:
  • S1031a Determine a first initial template according to the reference frame type of the sub-prediction block and the reference frame type of the adjacent block.
  • the electronic device can determine the selected candidate as the sub-prediction block and the adjacent block from all the candidate reference frame types according to the reference frame type of the sub-prediction block and the reference frame type of the adjacent block. What are the reference frame types, and then the selected candidate reference frame type is determined according to the selection times of the selected candidate reference frame type, so as to determine whether the first initial template is the selected reference frame type or an empty template.
  • the electronic device directly inherits the historical optimal reference frame type, and uses the optimal reference frame type as the second initial template. It can be understood that when the historical prediction mode is the NEARESTMV mode, the NEARMV mode and the GLOBALMV mode, the electronic device can sequentially determine the values of the optimal reference frame types for these three modes (the optimal reference frame types belong to those given in Table 1). 7 reference frame types) are compared with 0, and when the value of the optimal reference frame type of a certain mode is greater than 0, the optimal reference frame type of the mode is added to the second initial template.
  • the optimal reference frame type in the NEAREST mode is added to the second initial template.
  • S1031c Determine the initial template corresponding to the current prediction block by using the first initial template and the second initial template.
  • the electronic device after obtaining the first initial template and the second initial template, sets the first initial template and the second initial template together, and the obtained set is the initial template of the current prediction block.
  • the electronic device may further weight the first initial template and the second initial template to obtain the initial template corresponding to the current prediction block.
  • the electronic device can first determine a part of the initial template by using the reference frame type of the sub-prediction block and the reference frame type of the adjacent block, and then use the historical optimal reference template as another part of the initial template.
  • the first initial template is determined, and the historical optimal reference frame type is inherited as the second initial template, and then the The first initial template and the second initial template are integrated into the initial template, so that the electronic device realizes the process of determining the initial template.
  • the first initial template is determined according to the reference frame type of the sub-prediction block and the reference frame type of the adjacent block, that is, the specific implementation process of S1031a may include: S201-S203, as follows:
  • S201 Determine at least one historically selected reference frame type by using the reference frame type of the adjacent block and the reference frame type of the sub-prediction block.
  • the electronic device can compare the reference frame type of the adjacent block and the reference frame type of the sub-prediction block.
  • the frame types are merged to clarify which candidate reference frame types are selected in the historical prediction mode among all the candidate reference frame types, so that the candidate reference frame type selected in the historical prediction mode is determined as the historically selected reference frame type. Since there may be more than one selected candidate reference frame type, the electronic device can obtain at least one historically selected reference frame type.
  • the electronic device After obtaining at least one historically selected reference frame type, the electronic device will count the selection times of each historically selected reference frame type in the historical prediction mode, so as to obtain the selected times of each historically selected reference frame type.
  • the electronic device when counting the selection times of historically selected reference frames, separately counts the number of times each historically selected reference frame is used as the reference frame type of the sub-prediction block, and counts each historically selected reference frame as a relative frame. The number of times of reference frame types of adjacent blocks, and then for each historically selected reference frame type, the two times are added to obtain the selected times of each historically selected reference frame type.
  • LAST_FRAME is a historically selected reference frame type, wherein LAST_FRAME is selected 3 times by the sub-prediction block and 2 times by the adjacent block, so the number of times LAST_FRAME is selected is 5 times.
  • the electronic device when counting the selection times of historically selected reference frames, will count the number of times each historically selected reference frame is used as the reference frame type of the sub-prediction block, and count each historically selected reference frame as the corresponding reference frame. The number of reference frame types of adjacent blocks is averaged to obtain the selection times of each historically selected reference frame type.
  • the electronic device can sort the selection times of each historically selected reference frame type by size, thereby determining the order of the selection times of each historically selected reference frame type, and then select the historically selected reference frame type corresponding to the largest selection times. , as the first initial template.
  • the electronic device may also compare the selection times of each historically selected reference frame type with the set threshold, and use the historically selected reference frame type whose selection times are greater than the set threshold as the first initial template.
  • the electronic device can first determine to give at least one historically selected reference frame type, and then select a first initial template from each historically selected reference frame type according to the number of times of each historically selected reference frame type. , so that the historically selected reference frame type selected by most adjacent blocks and sub-prediction blocks is used as the first initial template, so that the first initial template is more accurate.
  • the selection times are used to select the first initial template from each historically selected reference frame type, that is, the specific implementation process of S203 may include: S2031-S2034, as follows:
  • S2032 Enlarge the selection times of each historically selected reference frame type to obtain the enlarged selection times.
  • the electronic device can amplify the selection times according to a preset multiple, or can amplify the selection times according to a random multiple.
  • each historically selected reference frame type that is, as long as a certain historically selected reference frame type has been selected for a certain number of times. It can be used as a reference frame template, and it can be selected as the first initial template by a method of magnifying according to a preset multiple, so as to be added to the reference frame template.
  • the preset multiple can be set to 4, can also be set to 6, and can also be set to other values as required, which is not limited in this application.
  • the comparison result indicates whether the number of times of selection after magnification is greater than or equal to the maximum number of times of selection.
  • the electronic device compares the magnified selection times with the maximum selection times, so as to determine the relationship between the enlarged selection times of each historically selected reference frame type and the largest selection times of all historically selected reference frame types. , and use the historically selected reference frame type whose enlarged selection times are greater than or equal to the maximum selection times as the first initial template.
  • the selection times of each historically selected reference frame type can be represented by ref_num[i]
  • the maximum selection times can be represented by ref_num[0] (the first one after sorting by size)
  • the preset multiple is 4,
  • (1 ⁇ i) (the frame number i of the historically selected reference frame that satisfies the condition, is stored in the number used to record the first initial template. string).
  • the electronic device can amplify the selection times of each historically selected reference frame type and compare it with the largest selection times, so as to select the first initial template from each historically selected reference frame type , to facilitate subsequent generation of the initial template.
  • the method May also include: S1031d, as follows:
  • the electronic device When the initial template is empty, in order to ensure the validity of the final reference frame template, the electronic device will correct and supplement the initial template. At this time, the electronic device will add the set at least one preset reference frame type to the empty initial template, so that the initial template to which at least one preset reference frame type is added is recorded as the corrected initial template.
  • the reference frame template corresponding to the current prediction block is generated by using the initial template, the main template and the enhanced template, that is, the implementation process of S1034 becomes: using the corrected initial template, the main template and the enhanced template to generate The reference frame template corresponding to the current prediction block.
  • At least one preset reference frame type may be a selected candidate reference frame type, for example, LAST_FRAME, BWDREF_FRAME, ALTREF_FRAME, or a certain video frame selected from the video frames, for example, the first A video frame, etc., is not limited in this application.
  • At least one preset reference frame type may include only one preset reference frame type, for example, only LAST_FRAME; may also include three preset reference frame types, for example, including LAST_FRAME, BWDREF_FRAME, ALTREF_FRAME, This application is not limited here.
  • the electronic device can also add at least one preset reference frame type to the initial template when the initial template is empty, so as to correct and supplement the initial template to ensure the validity of the final reference frame template.
  • the adjacent block information includes: the reference frame type of the adjacent block
  • the sub-prediction block information includes: the reference frame type of the sub-prediction block
  • obtain the adjacent block information of the adjacent block of the current prediction block , the sub-prediction block information of the sub-prediction block, that is, the specific implementation process of S102 may include: S1021-S1023, as follows:
  • the first judgment result represents whether the optimal prediction mode of the adjacent block is the first preset mode
  • the second judgment result represents whether the optimal prediction mode of the sub-prediction block is the second preset mode
  • the electronic device When the electronic device obtains the adjacent block information corresponding to the adjacent block, it first judges the optimal prediction mode of the adjacent block, and only when it is judged that the optimal prediction mode of the adjacent block is the first preset mode. , the reference frame type of the adjacent frame will be recorded. Similarly, the electronic device will record the reference frame type of the sub-prediction block only when it determines that the optimal prediction mode of the sub-prediction block is the second preset mode, so as to obtain adjacent blocks in this way. information and sub-prediction block information. When the optimal prediction mode of the adjacent block is not the first preset mode, the reference frame type of the adjacent frame is empty, and when the optimal prediction mode of the sub-prediction block is not the second preset mode, the sub-prediction block The reference frame type of is empty.
  • the first preset mode may be a NEARMV mode or a NEARESTMV mode
  • the second preset mode may be a NEARESTMV mode or a NEARMV mode, which is not limited in this application.
  • the electronic device may obtain the reference frame type of the adjacent block and the reference frame type of the sub-prediction block only when the optimal prediction mode of the adjacent block and the sub-prediction block satisfies the conditions, so as to realize the adjacent block information and sub-prediction The process of acquiring block information.
  • the main template corresponding to the current prediction block is generated according to the motion vector of the adjacent block, that is, the specific implementation process of S1032 may include: S1032a-S1032d, as follows:
  • S1032a Calculate selection parameters for each candidate reference frame type in the full amount of candidate reference frame types by using the motion vectors of adjacent blocks.
  • the full candidate reference frame types represent all available reference frame types during inter-frame prediction.
  • the full candidate reference frame types may be the seven reference frame types given in Table 1, and also It can be several reference frame types selected from the seven reference frame types.
  • the selection parameter characterizes the difference between the input value and the predicted value of adjacent blocks, which can be calculated using SAD or SATD.
  • SATD The accuracy of SATD is higher, but the computational complexity is correspondingly larger.
  • S1032b Divide the full number of candidate reference frame types into candidate forward reference frame types, candidate backward reference frame types, and candidate long-term reference frame types, respectively.
  • the electronic device will divide the full number of candidate reference frame types into three groups according to the reference direction of each candidate reference frame type, namely the candidate forward reference frame type, the candidate backward reference frame type and the candidate long-term reference frame type, so as to Subsequently, the selection process of the reference frame type is performed for the three groups, that is, the forward reference frame type, the backward reference frame type and the long-term reference frame type are respectively selected from the three groups.
  • the electronic device divides LAST_FRAME, LAST_FRAME2 and LAST3_FRAME into candidate forward reference frames according to the reference direction of each candidate reference frame type.
  • Type, BWDREF_FRAME, ALTREF2_FRAME, and ALTREF_FRAME are divided into candidate backward reference frame types
  • GOLDEN_FRAME is divided into candidate long-term reference frame types.
  • the electronic device compares the size of the selection parameters of the candidate reference frame types in each group, and selects a forward reference frame type from the candidate forward reference frame types according to the comparison result, and selects a forward reference frame type from the candidate backward reference frame types.
  • the long-term reference frame type is selected from the candidate long-term reference frame types.
  • the electronic device collects the selected forward reference frame type, backward reference frame type and long-term reference frame type into one set, and the obtained set is the master template.
  • the electronic device may further perform weighted fusion of the forward reference frame type, the backward reference frame type, and the long-term reference frame type to obtain the main template of the current prediction block.
  • the forward reference frame type is represented by ref_list0
  • the backward reference frame type is represented by ref_list1
  • the main template mask_main ref_list0
  • the electronic device can calculate the selection parameter for each candidate reference frame type according to the motion vector of the adjacent block, and then according to the selection parameter, from the candidate front type obtained by dividing the full number of candidate reference frame types according to the reference direction From the forward reference frame type, the candidate backward reference frame type and the candidate long-term reference frame type, the forward reference frame type, the backward reference frame type and the long-term reference frame type are selected, so that these selected reference frame types are aggregated. main template.
  • the selection parameters corresponding to the candidate forward reference frame types, the selection parameters corresponding to the candidate backward reference frame types, and the selection parameters corresponding to the candidate long-term reference frame types are used to select
  • the specific implementation process of the reference frame type and the long-term reference frame type, that is, S1032c, may include: S301-S303, as follows:
  • the electronic device compares the selection parameters of each candidate reference frame type in the candidate forward reference frame types, selects the smallest selection parameter, and then uses the candidate reference frame type corresponding to the smallest selection parameter as the forward reference frame type.
  • the electronic device compares the selection parameters of each candidate reference frame type in the candidate backward reference frame types, selects the smallest selection parameter, and compares the candidate reference frame type corresponding to the smallest selection parameter in the candidate backward reference frame types. , as the backward reference frame type.
  • the electronic device may perform S302 first, Execute S301 again, or execute S301 and S302 simultaneously.
  • a candidate reference frame type whose selection parameter is less than the sum of the selection parameter corresponding to the forward reference frame type and the selection parameter corresponding to the backward reference frame type is used as the long-term reference frame type.
  • the electronic device sums the previously selected selection parameters of the forward reference frame type and the selection parameters of the candidate reference frame types to obtain a summation result, and then sums the selection parameters of each candidate reference frame type in the candidate long-term reference frame types , and compare the size with the summation result, so as to select a candidate reference frame type whose selection parameter is smaller than the summation result among the candidate long-term reference frame types as the long-term reference frame type.
  • the selection parameter of the current reference frame type is represented by sad_list0
  • the selection parameter of the backward reference frame type is represented by sad_list1.
  • GLODEN_FRAME When there is only one candidate reference frame type GLODEN_FRAME in the candidate long-term reference frame type, when the selection parameter of GLODEN_FRAME is less than When sad_list0+sad_list1, use GLODEN_FRAME as the long-term reference frame type.
  • the electronic device may find the smallest selection parameter of the candidate forward reference frame type and the candidate backward reference frame type, respectively, so as to determine the forward reference frame type and the backward reference frame type, and then determine the forward reference frame type and the backward reference frame type according to the forward reference frame type.
  • the motion vectors of adjacent blocks are used to calculate selection parameters for each candidate reference frame type in the full number of candidate reference frame types, that is, the specific implementation process of S1032a may include: S401-S403, as follows:
  • the matching result indicates whether there is a matching motion vector for each candidate reference frame type.
  • the electronic device can determine a candidate reference frame type and match the reference frame type of the adjacent block to determine the adjacent block's reference frame type. Whether the reference frame type is the same as the candidate reference frame type. When the same, the electronic device considers that the candidate reference frame type matches the motion vector of the adjacent block, that is, there is a matching motion vector. According to this process, the electronic device will divide the motion vectors of adjacent blocks into each candidate reference frame type, so as to obtain a matching result of each candidate reference frame type.
  • the electronic device When the matching result shows that when there is no matching motion vector for a certain candidate reference frame type, the electronic device will initialize the selection parameters of the candidate reference frame type, that is, use the preset value as each post-selection parameter.
  • Reference frame type selection parameter When the matching result shows that when there is no matching motion vector for a certain candidate reference frame type, the electronic device will initialize the selection parameters of the candidate reference frame type, that is, use the preset value as each post-selection parameter. Reference frame type selection parameter.
  • the preset value may be INT32_MAX, that is, the maximum number of 32 bits, or may be a binary representation of 50000, which is not limited in this application.
  • the electronic device When the matching result of a candidate reference frame type indicates that there is a matching motion vector for the candidate reference frame type, the electronic device will use the prediction value obtained by using the motion vector of the adjacent block to predict the adjacent block, and the adjacent block The input value itself is used to calculate the selection parameters for each candidate reference frame type.
  • the matched motion vectors there may be multiple matched motion vectors.
  • these matched motion vectors can be regarded as sub-vectors, and the matched motion vectors are used as a general term. Therefore, the matched motion vectors include Multiple subvectors.
  • each sub-vector corresponds to a predicted value, and the predicted value corresponding to each sub-vector is recorded as a sub-predicted value, so that the predicted value includes multiple sub-predicted values, and the multiple sub-vectors and the multiple sub-predicted values correspond to each other.
  • the selection parameter of each candidate reference frame type is calculated, that is, the specific implementation process of S403 may include: S4031S4033, as follows :
  • S4032 Accumulate the absolute values of the pixel difference values corresponding to each sub-vector to obtain a temporary selection parameter corresponding to each sub-vector.
  • (i,j) is the pixel
  • (m,n) is the size of the adjacent block
  • dst(i,j) is the predicted value of the candidate reference frame type when using a certain sub-vector for prediction
  • src(i,j ) is the input value of the adjacent block
  • sad is the calculated temporary selection parameter.
  • the electronic device For each candidate reference frame type, the electronic device selects the smallest temporary selection parameter from the temporary selection parameters in each sub-vector, and uses the smallest temporary selection parameter as the final selection parameter. Selection parameters for each candidate reference frame type.
  • each candidate reference frame type in the full number of candidate reference frame types is matched with motion vectors of adjacent blocks to obtain a matching result, that is, the specific implementation process of S401 may include: S4011, as follows :
  • the optimal prediction mode of the adjacent block is the second preset mode
  • the reference frame type of the adjacent block is the same as the type of each candidate reference frame, determine that there is a matching frame type for each reference frame motion vector.
  • the electronic device When the electronic device matches each candidate reference frame type with the motion vector of the adjacent block, it first determines whether the adjacent block is available, and when the adjacent block is available, it determines whether the optimal prediction mode of the adjacent block is the second. Preset mode, and determine whether the reference frame type of adjacent blocks is the same as each candidate reference frame type. Since there is a correspondence between the motion vector of the adjacent block and the reference frame type of the adjacent block, the electronic device determines that the optimal prediction mode of each adjacent block is indeed the second preset mode, and the When the reference frame type is the same as each candidate reference frame type, the electronic device will consider that the motion vector of the adjacent block corresponding to the reference frame type of the adjacent block matches the type of each candidate reference frame, so that each candidate reference frame A matching motion vector exists for the frame type.
  • the electronic device may firstly match the motion vectors of adjacent blocks with each candidate reference frame type, and select each candidate reference frame according to whether there is a matching motion vector for each candidate reference frame type.
  • the frame type is determined by calculating the selection parameter, so that the electronic device can obtain the selection parameter corresponding to each candidate reference frame type.
  • the frame type of the current prediction block, the number of adjacent blocks and the number of sub-prediction blocks are used to determine the enhancement template corresponding to the current prediction block, that is, the specific implementation process of S1033 may include: S1033a-S1033d ,as follows:
  • S1033a Determine the frame type weight corresponding to the current prediction block according to the frame type of the current prediction block and the preset frame type weight correspondence.
  • the video frame to which the current prediction block belongs is determined, and the frame type of each video frame is determined before prediction. Different frame types have different reference relationships and thus different importance. The importance of video frames referenced by more video frames is higher than that of video frames referenced by fewer video frames.
  • the electronic device may determine the importance according to the frame type of the video frame corresponding to the current prediction block, and then determine the corresponding frame type weight according to the importance.
  • FIG. 11 is a schematic diagram of a reference relationship between an I frame, a P frame, a B frame, a b frame, and a non-reference B frame provided by an embodiment of the present application.
  • these frame types are The determined importance order is: I frame>P frame>B frame>b frame>non-reference B frame.
  • the importance is also related to the structure of the Group of Pictures (GOP).
  • Figure 12 is a schematic diagram of the reference relationship of GOP16 provided by the embodiment of the present application. It can be seen from Figure 12 that POC16 refers to POC0; POC8 refers to POC0 and POC16; POC4 refers to POC0 and POC8; POC2 refers to POC0 and POC4; are not referenced.
  • the weight levels shown in Table 2 can be determined, see Table 2
  • the order of the weights of each video frame in GOP16 is: POC0>POC16>POC8>POC4>POC2>POC1.
  • the electronic device can select the frame type weight according to the frame type.
  • the electronic device can select a certain number of video frames, and then use the frame type weight of the selected video frame as the enhancement threshold.
  • the frame type weight corresponds to the threshold parameter, as yes will enhance the threshold.
  • an embodiment of the present application provides a formula for generating an enhancement threshold, see formula (3):
  • param is the threshold parameter
  • thr is the generated enhancement threshold
  • slice_level is the frame type weight, so , the enhancement threshold can be generated by looking up the table.
  • the electronic device adds the number of adjacent blocks and the number of sub-prediction blocks to obtain a sum result, and then compares the sum result with the enhancement threshold in size.
  • the electronic device finds that the sum value result is less than or equal to the enhancement threshold, it acquires at least one preset reference frame type, and then uses the at least one preset reference frame type as an enhancement template. In this way, the electronic device completes the process of generating the enhanced template.
  • the at least one preset reference frame type may be LAST_FRAME, BWDREF_FRAME, ALTREF_FRAME, or may be LAST_FRAME, BWDREF_FRAME, GOLDEN_FRAME, which is not limited in this application.
  • the electronic device may first determine the frame type weight according to the frame type of the current prediction block, and then generate the enhancement threshold according to the frame type weight, so that the sum of the number of adjacent blocks and the number of sub-prediction blocks is equal to the generated
  • the embodiments of the present application are implemented in a scenario where an encoder (electronic device) generates a reference frame template for a NEWMV mode (preset prediction mode).
  • an encoder electronic device
  • NEARSETMV NEARMV
  • GLOBAALMV set prediction mode
  • the optimal reference frame for these modes can be inherited, as well as the information of adjacent blocks and predicted different CU partitions; according to each reference frame Frame adjacent MVs, calculate the SAD corresponding to each MV, and then use the smallest SAD as the SAD of the reference frame, divide the reference frame into three groups: forward reference, backward reference and long-term reference, and select the smallest SAD in each group.
  • FIG. 13 is a schematic diagram of a process of generating a reference frame template for NEWMV mode provided by an embodiment of the present application. Referring to FIG. 13 , the process may include:
  • the encoder initializes the SAD corresponding to each reference frame to INT32_MAX (a preset value), where INT32_MAX is the maximum number of 32 bits.
  • S501 consists of 3 parts:
  • the positional relationship between the current block (current prediction block) and the adjacent blocks may be as shown in FIG. 8 .
  • the electronic device judges the position of each adjacent block at a time, if the current adjacent block position is available, and the optimal mode is inter prediction, the reference frame is the same as the current reference frame (the type of each candidate reference frame), then record the Current MV. There may be multiple MVs or none of the MVs in the reference frame of the current adjacent block (for each candidate reference frame type to determine the matching motion vector, the matching motion vector of a candidate reference frame type may have multiple sub-frames) vector).
  • S5012 Calculate the SAD (temporary selection parameter) corresponding to each MV (sub-vector) in turn.
  • the encoder can calculate the SAD according to equation (2).
  • SATD is more accurate, but the computational complexity is also greater.
  • S5013 Select the minimum SAD (minimum temporary selection parameter) corresponding to each reference frame.
  • the SAD of the current reference frame is INT32_MAX (when there is no matching motion vector for each candidate reference frame type, the preset value is used as the selection parameter for each candidate reference frame type).
  • S502 includes three parts, namely:
  • the positional relationship between the current block and the adjacent blocks is shown in FIG. 8 . If the adjacent block exists and the optimal mode is the inter mode (the first preset mode), the reference frame information of the adjacent block is recorded (obtaining the reference frame type of the adjacent block, that is, obtaining the adjacent block information).
  • the division is shown in Figure 9.
  • the encoding and prediction process is performed in sequence. Therefore, when the current division type (current sub-block division type) is used, there may be other CU division types. (The previous sub-block division type) has been done. For example, when the current HORZ_A division type is used, the NONE division type has been done before. Therefore, the division information of the NONE division type can be used.
  • inter_total_num the number of inter prediction blocks divided according to adjacent blocks and different CUs (the sum of the number of adjacent blocks and the number of sub-prediction blocks), and denote it as inter_total_num.
  • S503 includes the following steps:
  • mask_init it is generated according to the collected reference frame information (that is, the reference frame type of the adjacent block and the reference frame type of the sub-prediction block).
  • the generation process mainly includes:
  • FIG. 14 provides a schematic diagram 1 of the process of generating an initialization template. Referring to Figure 14, the process includes:
  • S603. Determine whether the serial number is less than or equal to the number of reference frame types.
  • the number of reference frame types refers to the number of types of reference frame types in the collected reference frame information.
  • S604. Determine whether 4 times the number of selections (the number of times of selection after magnification) is greater than or equal to the maximum number of selections (the maximum number of times of selection).
  • serial number of the selected reference frames is determined after the reference frames are sorted from large to small according to the selected number. Therefore, as the selected number becomes smaller and smaller, the reference value becomes smaller and smaller. Initializing the template will increase unnecessary reference frames and slow down the encoding speed.
  • Figure 15 provides a second schematic diagram of the process of generating an initialization template. Referring to Figure 15, the process includes:
  • mask_init is 0 (the initial template is empty).
  • the values corresponding to LAST_FRAME, BWDREF_FRAME, ALTREF_FRAME can be written into mask_init (for example, , you can record the value of LAST_FRAME in mask_init by moving the LAST_FRAME bit to the left in mask_init, so as to add these frames to the initialization template.
  • mask_init is 0, probably because the reference frame information is not collected, and other single reference frame modes have not selected the optimal mode. At this time, it is necessary to force LAST_FRAME, BWDREF_FRAME, and ALTREF_FRAME to be added.
  • the 7 reference frames (full candidate reference frame types) in Table 1 are divided into three types: forward reference frame, backward reference frame and long-term reference frame.
  • Forward reference frame includes LAST_FRAME, LAST2_FRAME and LAST3_FRAME (these are candidates Forward reference frame type);
  • backward reference frame includes: BWDREF_FRAME, ALTREF2_FRAME and ALTREF_FRAME (these are candidate backward reference frame types);
  • long-term reference frame includes GOLDEN_FRAME (ie, candidate long-term reference frame type). Then find the reference frame with the smallest SAD for each type.
  • Step1 Find the forward reference frame (forward reference frame type).
  • Step2 Find the backward reference frame (backward reference frame type).
  • Step3 Find the long-term reference frame (long-term reference frame type).
  • the SAD corresponding to such a current reference frame is INT32_MAX, indicating that the reference frame is not important and can be skipped directly.
  • mask_add initialized to 0, it is related to the frame type weight and the number of collected inter blocks.
  • the current frame type weight is obtained to generate the threshold thr.
  • the current frame type has been determined before the prediction, therefore, the weight of the current frame type can also be determined, denoted as slice_level, and then the threshold value can be generated according to formula (3).
  • FIG. 16 is a schematic diagram of a process of applying a reference frame template in the NEWMV mode provided by an embodiment of the present application. Referring to FIG. 16 , the process may include:
  • the encoder does not introduce new calculations, fully considers the characteristics of the NEWMV mode, and directly uses the collected information to generate a reference frame template. Compared with the encoding speed in the related art, it can speed up the encoding of 65 video frames. 15%, the speedup is very high. Moreover, the obtained reference frame template has a high adaptive capacity, and the reference frame does not need to be eliminated in the generation process, so the quality of the code stream is also guaranteed.
  • the software modules stored in the inter-frame prediction apparatus 455 of the memory 450 can include:
  • the mode determination module 4551 is configured to determine the historical prediction mode corresponding to the current prediction block when the current prediction mode of the current prediction block is a preset prediction mode; the historical prediction mode is completed before the preset prediction mode the forecasting mode of the forecast;
  • Information acquisition module 4552 configured to acquire adjacent block information of adjacent blocks of the current prediction block, sub-prediction block information of sub-prediction blocks, and historical optimal reference of the current prediction block in the historical prediction mode frame type; the sub-prediction block is obtained by dividing the current prediction block into blocks by using the sub-block division type before the current sub-block division type;
  • Template generation module 4553 configured to generate a reference frame template based on the historical optimal reference frame type, the adjacent block information, the sub-prediction block information, and the frame type corresponding to the current prediction block;
  • the information prediction module 4554 is configured to use the reference frame template to determine the reference frame of the preset prediction mode, use the reference frame to perform inter-frame prediction on the current prediction block, and obtain the corresponding prediction block of the current prediction block. Predictive value.
  • the adjacent block information includes: the motion vector of the adjacent block, the reference frame type of the adjacent block, the number of the adjacent block; the sub-prediction block information Including: the reference frame type of the sub-prediction block, the number of the sub-prediction block;
  • the template generation module 4553 is further configured to determine the initial value corresponding to the current prediction block based on the historical optimal reference frame type, the reference frame type of the sub-prediction block, and the reference frame type of the adjacent block.
  • template according to the motion vector of the adjacent block, generate the main template corresponding to the current prediction block; use the frame type of the current prediction block, the number of the adjacent blocks and the number of the sub-prediction blocks to determine
  • the enhancement template corresponding to the current prediction block is obtained; the reference frame template corresponding to the current prediction block is generated by using the initial template, the main template and the enhancement template.
  • the template generation module 4553 is further configured to determine a first initial template according to the reference frame type of the sub-prediction block and the reference frame type of the adjacent block; The historical optimal reference frame type is used as the second initial template; the initial template corresponding to the current prediction block is determined by using the first initial template and the second initial template.
  • the template generation module 4553 is further configured to determine at least one historically selected reference frame type by using the reference frame type of the adjacent block and the reference frame type of the sub-prediction block; The number of times of selection of each historically selected reference frame type in the at least one historically selected reference frame type is counted; and the first initial template is screened from each of the historically selected reference frame types by using the number of times of selection.
  • the template generation module 4553 is further configured to select the maximum selection times from the selection times of each historically selected reference frame type; select the reference frame for each historical selection The selected times of the type are amplified to obtain the amplified selection times; the amplified selection times are compared with the maximum selection times to obtain the comparison result corresponding to each historically selected reference frame type; the comparison The result indicates whether the magnified selection times are greater than or equal to the maximum selection times; the comparison result indicates that the enlarged selection times are greater than or equal to the maximum selection times.
  • the historical selection reference frame type is determined as the selected reference frame type. Describe the first initial template.
  • the initial template is empty; the template generation module 4553 is further configured to add at least one preset reference frame type to the initial template to obtain a corrected initial template;
  • the template generation module 4553 is further configured to generate the reference frame template corresponding to the current prediction block by using the corrected initial template, the main template and the enhanced template.
  • the adjacent block information includes: a reference frame type of the adjacent block
  • the sub-prediction block information includes: a reference frame type of the sub-prediction block
  • the template generation module 4553 is further configured to judge the optimal prediction mode of the adjacent block to obtain a first judgment result, and judge the optimal prediction mode of the sub-prediction block to obtain a second judgment result
  • the first judgment The result represents whether the optimal prediction mode of the adjacent block is the first preset mode
  • the second judgment result represents whether the optimal prediction mode of the sub-prediction block is the second preset mode; when the first When the judgment result indicates that the optimal prediction mode of the adjacent block is the first preset mode, the reference frame type of the adjacent block is obtained; when the second judgment result indicates the optimal prediction mode of the sub-prediction block When the prediction mode is the second preset mode, the reference frame type of the sub-prediction block is acquired.
  • the template generation module 4553 is further configured to use the motion vectors of the adjacent blocks to calculate selection parameters for each candidate reference frame type in the full number of candidate reference frame types;
  • the The full candidate reference frame types represent all available candidate reference frame types during inter-frame prediction, and the selection parameter represents the difference between the input value and the predicted value of the adjacent blocks;
  • the full candidate reference frame types are respectively divided into candidate reference frame types forward reference frame type, candidate backward reference frame type, and candidate long-term reference frame type; use the selection parameters corresponding to the candidate forward reference frame type, the selection parameters of the candidate backward reference frame type, and the candidate long-term reference frame type
  • the selection parameter corresponding to the frame type select the forward reference frame type, the backward reference frame type and the long-term reference frame type; using the forward reference frame type, the backward reference frame type and the long-term reference frame type,
  • the main template corresponding to the current prediction block is integrated.
  • the template generation module 4553 is further configured to select the candidate reference frame type with the smallest parameter among the candidate forward reference frame types as the forward reference frame type; Among the candidate backward reference frame types, the candidate reference frame type with the smallest parameter is selected as the backward reference frame type; in the candidate long-term reference frame type, the selection parameter is smaller than the selection parameter corresponding to the forward reference frame type and The candidate reference frame type of the sum of the selection parameters corresponding to the backward reference frame type is used as the long-term reference frame type.
  • the template generation module 4553 is further configured to match each candidate reference frame type in the full amount of candidate reference frame types with the motion vector of the adjacent block, to obtain Matching result; the matching result indicates whether there is a matching motion vector for each candidate reference frame type; when the matching result indicates that each candidate reference frame type does not have a matching motion vector, the preset The value is used as the selection parameter of each candidate reference frame type; when the matching result indicates that there is a matching motion vector for each candidate reference frame type, the adjacent block is used in the calculation based on the adjacent block. The prediction value and the input value when the motion vector is predicted is used to calculate the selection parameter of each candidate reference frame type.
  • the template generation module 4553 is further configured to compare the pixel of the sub-prediction value corresponding to each of the plurality of sub-vectors with the pixel of the input value of the adjacent block Calculate the difference value to obtain the pixel difference value corresponding to each sub-vector; Accumulate the absolute value of the pixel difference value corresponding to each sub-vector to obtain the temporary selection parameter corresponding to each sub-vector; The smallest temporary selection parameter among the temporary selection parameters of the vector is used as the selection parameter corresponding to each candidate reference frame type.
  • the template generation module 4553 is further configured to determine the frame type corresponding to the current prediction block according to the frame type of the current prediction block and the preset frame type weight correspondence weight; generate an enhancement threshold according to the frame type weight; sum the number of the adjacent blocks and the number of the sub-prediction blocks to obtain a sum result; when the sum result is less than or equal to the enhancement threshold , at least one preset reference frame type is used as the enhancement template corresponding to the current prediction block.
  • inter-frame prediction apparatus provided by the embodiment of the present application is similar to the description of the inter-frame prediction method provided by the embodiment of the present application, and has similar beneficial effects.
  • Embodiments of the present application provide a computer program product or computer program, where the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the computer device (electronic device for inter-frame prediction) reads the computer instruction from the computer-readable storage medium, and the processor executes the computer instruction, so that the computer device executes the inter-frame prediction method in the embodiment of the present application.
  • the embodiments of the present application provide a computer-readable storage medium storing executable instructions, wherein the executable instructions are stored, and when the executable instructions are executed by a processor, the processor will cause the processor to execute the inter-frame prediction provided by the embodiments of the present application.
  • a method for example, as shown in FIG. 7 .
  • the computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; it may also include one or any combination of the foregoing memories Various equipment.
  • executable instructions may take the form of programs, software, software modules, scripts, or code, written in any form of programming language, including compiled or interpreted languages, or declarative or procedural languages, and which Deployment may be in any form, including as a stand-alone program or as a module, component, subroutine, or other unit suitable for use in a computing environment.
  • executable instructions may, but do not necessarily correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, a Hyper Text Markup Language (HTML, Hyper Text Markup Language) document
  • HTML Hyper Text Markup Language
  • One or more scripts in stored in a single file dedicated to the program in question, or in multiple cooperating files (eg, files that store one or more modules, subroutines, or code sections).
  • executable instructions may be deployed to be executed on one computing device, or on multiple computing devices located at one site, or alternatively, distributed across multiple sites and interconnected by a communication network execute on.

Abstract

本申请提供了一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品;该方法包括:在当前预测块的当前预测模式为预设预测模式时,确定出当前预测块对应的历史预测模式;获取当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及当前预测块在历史预测模式时的历史最优参考帧类型;基于历史最优参考帧类型、相邻块信息、子预测块信息,以及当前预测块对应的帧类型,生成参考帧模板;利用参考帧模板,确定出预设预测模式的参考帧,利用参考帧对当前预测块进行帧间预测,得到当前预测块对应的预测值。通过本申请,能够提高视频编码的效率。

Description

一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请基于申请号为202011629460.2、申请日为2020年12月31日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及视频编码技术,涉及一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品。
背景技术
视频编码广泛应用于视频传输中,未来视频的发展趋势是高清晰度、高帧率、高压缩率。在利用帧间预测对视频帧进行编码时,需要将视频帧分割成一个个的编码单元(Coding Unit,CU),当CU利用帧间预测计算预测值时,需要为CU为预测模式选择出合适的参考帧,才能开始预测,得到预测值。
在选择合适的参考帧时,先要强制性的剔除一些认为不重要的参考帧模式,然后对预测模式和剩余的参考帧模式进行组合,对每个模式组合经过高复杂度的择优过程之后,才能得到合适的参考帧模式。然而,择优过程的计算复杂度较大,使得视频编码效率较低。
发明内容
本申请实施例提供一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提高视频编码的效率。
本申请实施例的技术方案是这样实现的:
本申请实施例提供一种帧间预测方法,包括:
在当前预测块的当前预测模式为预设预测模式时,确定出所述当前预测块对应的历史预测模式;所述历史预测模式为在所述预设预测模式之前完成预测的预测模式;
获取所述当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及所述当前预测块在所述历史预测模式时的历史最优参考帧类型;所述子预测块是利用当前的子块划分类型之前的子块划分类型,对所述当前预测块分块得到的;
基于所述历史最优参考帧类型、所述相邻块信息、所述子预测块信息,以及所述当前预测块对应的帧类型,生成参考帧模板;
利用所述参考帧模板,确定出所述预设预测模式的参考帧,利用所述参考帧对所述当前预测块进行帧间预测,得到所述当前预测块对应的预测值。
本申请实施例提供一种帧间预测装置,包括:
模式确定模块,配置为在当前预测块的当前预测模式为预设预测模式时,确定出所述当前预测块对应的历史预测模式;所述历史预测模式为在所述预设预测模式之前完成预测的预测模式;
信息获取模块,配置为获取所述当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及所述当前预测块在所述历史预测模式时的历史最优参考帧类型;所述子预测块是利用当前的子块划分类型之前的子块划分类型,对所述当前预测块分块得到的;
模板生成模块,配置为基于所述历史最优参考帧类型、所述相邻块信息、所述子预测块信息,以及所述当前预测块对应的帧类型,生成参考帧模板;
信息预测模块,配置为利用所述参考帧模板,确定出所述预设预测模式的参考帧,利用所述参考帧对所述当前预测块进行帧间预测,得到所述当前预测块对应的预测值。
本申请实施例提供一种用于帧间预测的电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的帧间预测方法。
本申请实施例提供一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现本申请实施例提供的帧间预测方法。
本申请实施例提供一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现本申请实施例提供的帧间预测方法。
本申请实施例具有以下有益效果:在当前预测模式为预设好的预测模式时,确定出当前预测预设模式对应的历史预测模式,然后获取当前预测块在历史预测模式时的历史最优参考帧类型,当前预测块的相邻块在历史预测模式时的信息、当前预测块在历史预测模式时经过分块所得到的子预测块所对应的子预测块信息,以使得当前预测块直接继承在预设预测模式之前的就已经进行过的历史预测模式对应的各种信息,然后在预设预测模式时,针对当前预测块自适应生成参考帧模板,这样,充分考虑了预设预测模式在视频编码时的特性,直接利用一些已有的信息生成参考帧模板,极大地减小了计算复杂度,从而提高了视频编码的效率。
附图说明
图1是AV1的编码框架示例图;
图2是CU的分割规则示例图;
图3是不同单参考帧模式的MVP选择示意图;
图4是择优选择最合适的模式组合的过程示意图;
图5是本申请实施例提供的视频编码系统的一个可选的架构示意图;
图6是本申请实施例提供的用于帧间预测的电子设备的结构示意图;
图7是本申请实施例提供的帧间预测方法的一个可选的流程示意图一;
图8是本申请实施例提供的当前预测块与相邻块的位置关系示意图;
图9是本申请实施例提供的10种不同的子块划分类型的一种顺序示意图;
图10是本申请实施例提供的帧间预测方法的一个可选的流程示意图二;
图11是本申请实施例提供的I帧、P帧、B帧、b帧和非参考B帧之间的参考关系示意图;
图12是本申请实施例提供过的GOP16的参考关系示意图;
图13是本申请实施例提供的为NEWMV模式生成参考帧模板的过程示意图;
图14提供了生成初始化模板的过程示意图一;
图15提供了生成初始化模板的过程示意图二;
图16是本申请实施例提供的NEWMV模式的参考帧模板应用的过程示意图。
具体实施方式
为了使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请作进一步地详细描述,所描述的实施例不应视为对本申请的限制,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本申请保护的范围。
在以下的描述中,涉及到“一些实施例”,其描述了所有可能实施例的子集,但是可以理解,“一些实施例”可以是所有可能实施例的相同子集或不同子集,并且可以在不冲突的情况下相互结合。
在以下的描述中,所涉及的术语“第一\第二”仅仅是区别类似的对象,不代表针对对象的特定排序,可以理解地,“第一\第二”在允许的情况下可以互换特定的顺序或先后次序,以使这里描述的本申请实施例能够以除了在这里图示或描述的以外的顺序实施。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同。本文中所使用的术语只是为了描述本申请实施例的目的,不是旨在限制本申请。
对本申请实施例进行进一步详细说明之前,对本申请实施例中涉及的名词和术语进行说明,本申请实施例中涉及的名词和术语适用于如下的解释。
1)帧内编码,是利用视频图像的一个视频帧内相邻像素之间所存在的相关性,来减少相邻像素之间的空间冗余的编码方式。
2)帧间编码,是利用视频图像中的相邻帧之间的相似性,来消除相邻帧之间的时间冗余,从而提高视频编码效率的编码方式。
3)运动向量(Motion Vector,MV),是在进行帧间预测时标记当前块和参考块的位置关系的一个向量。由于在帧间编码中,相邻帧之中的图像内容存在着一定的相关性,将帧图像分为若干块,搜索每个块在邻近的视频帧之中的位置,并计算出两者之间的空间位置的相对偏移量,得到的相对偏移量即为运动向量。
4)运动估计(Motion Estimation,ME),是指估计运动向量的过程。
5)运动矢量的预测值(Motion Vector Prediction,MVP)是指根据当前块的相邻块推导出来的MV的初始位置。
6)运动矢量差(Motion Vector Difference,MVD),是指MV和MVP之间的差值,即MVD=MV-MVP。在使用中,还能将MV预测值和实际值的差值(MVD)编码,以减少比特数的消耗。
7)率失真代价(Rate Distortion Cost,RDCost),用于来对多种编码模式进行择优。RDCost的计算公式如式(1)所示:
RDCost=dist+bit+λ     (1)
其中,dist为失真,即像素块的原始像素和预测像素之间的残差信号,bit是信息的最小单位,λ为拉格朗日乘子。
8)绝对误差和(Sum of Absolute Difference,SAD),反映了残差信号的时域差异,不能有效反映码流的大小。
9)哈达玛变换算法(Sum of Absolute Transformed Difference,SATD),是计算失真的一种方式,是将残差信号进行Hadamard变换之后,再求各元素绝对值之和所得到的。相对于SAD,SATD的计算量更大一些,但是计算精度会更高。
10)误差平方和(Sum of the Squared Errors,SSE),是计算失真的另一种方式,为原始像素和重构像素的误差的平方和。计算SSE需要对残差信号进行变换、量化、逆量化和逆变换的过程,虽然计算时复杂度较大,但是所估算出的码字和真实编码相同,选出的编码模式最节省码字。
视频编码广泛应用于视频传输中,未来视频的发展趋势是高清晰度、高帧率、高压缩率,这要求视频编码的压缩效率不断升级。第一代视频编码标准(AV1)自推出以来,获得了巨大的关注。
相对于其他的视频编码技术,例如高效率视频编码(High Efficiency Video Codinh,HEVC)和高级视频编码(Advance Video Coding,AVC), AV1具有更高的压缩率,在传输质量相同时,所占用的带宽可以减少30%,并且,无论是对于流媒体还是图片,均可以利用AV1进行编码传输,可以广泛使用在屏幕共享和视频游戏流中。
图1是AV1的编码框架示例图,如图1所示,电子设备先对送入的当前视频帧1-1分割成多个128×128的编码树单元(Coding Tree Unit,CTU),然后再将每个CTU按照10种不同的分割规则分割为矩形的编码单元(Coding Unit,CU),每个CU包含了多种预测模式和变换单元(Transform Unit,TU)。电子设备对每个CU进行帧间预测1-2或者帧内预测1-3,得到预测值。其中,帧间预测1-2包括运动估计(Motion Estimation,ME)1-21和运动补偿(Motion Compensation,MC)1-22,需要用到参考帧1-12;帧内预测1-3包括预测模式选择1-31和预测1-32。电子设备将预测值与每个CU的输入值相减,得到残差值,然后对残差值进行变换1-4、量化1-5,得到残差系数,再对残差系数进行熵编码1-6,得到输出的码流。与此同时,电子设备还会对残差系数进行逆量化1-7、逆变换1-8,从而得到重构图像的残差值,电子设备再将重构图像的残差值和预测值进行加和,就能得到重构图像,并依据重构图像来进行帧内预测模式选择1-31,进行帧内的预测1-32。电子设备还需要对重构图像进行滤波1-9,滤波后的重构图像即为当前视频帧1-1对应的重建帧1-11,重建帧1-11会进入到参考帧队列中,以作为下一个视频帧的参考帧,从而依次向后编码。
进一步的,CU的分割规则不止一种,图2是CU的分割规则示例图,如图2所示,CU共有10种分割规则,分别为不分割(NONE)2-1、四等分(SPLIT)2-2、水平二等分(HORZ)2-3、垂直二等分(VERT)2-4、水平四分(HORZ_4)2-5、第一水平三分(HORZ_A)2-6、第二水平三分(HORZ_B)2-7、第一垂直三分(VERT_A)2-8、第二垂直三分(VERT_B)2-9、垂直四等分(VERT_4)2-10。这10种分割规则对应着22种块大小,分别为4×4、4×8、8×4、8×8、8×16、16×8、16×16、16×32、32×16、32×32、32×64、64×32、64×64、64×128、128×64、128×128、4×16、16×4、8×32、32×8、16×64、64×16。
在预测时,每个CU包含多种预测模式,即每个CU均包含了帧内预测模式和帧间预测模式。电子设备先在相同预测类型内,不同的预测模式间进行比较,得到最优的预测模式,然后再将帧内预测模式和帧间预测模式进行比较,找到每个CU的最优预测模式。同时,由于每个CU包括了多种TU,电子设备需要为每个CU所包括的多种TU中选择出最优的TU,然后将当前视频帧分成一个个CU。
帧内预测模式包括以下几种:基于上方和左方的参考像素的均值预测(DC_PRED)、水平与垂直差值结合预测(SMOOTH_PRED)、垂直插值预测(SMOOTH_V_PRED)、水平插值预测(SMOOTH_H_PRED)、梯度最小方向预测(PEATH_PRED)以及8个不同主方向的预测,这8个不同主 方向的预测分别为:垂直方向预测(V_PRED)、水平方向预测(H_PRED)、45度角方向预测(D45_PRED)、67度角方向预测(D67_PRED)、113度角方向预测(D113_PRED)、135度角方向预测(D135_PRED)、157度角方向预测(D157_PRED)、203度角方向预测(D203_PRED)。每个主方向又包括了6个角度偏移,分别为加减3度、加减6度、加减9度。在一些情况中,帧内预测模式还可以包括调色板预测模式和帧内(intra)块拷贝预测。
帧间预测有4种单参考帧模式和8种组合参考帧模式。其中,4种单参考帧模式是指利用单个的参考帧进行预测,4种但参考帧模式可以包括:NEARESTMV、NEARMV、GLOBALMV和NEWMV;8种组合参考帧模式是指利用组合的参考帧进行预测,其包括:NEAREST_NEARESTMV、NEAR_NEARMV、NEAREST_NEWMV、NEW_NEARESTMV、NEAR_NEWMV、NEW_NEARMV、GLOBAL_GLOBALMV以及NEW_NEWMV。其中,NEARESTMV和NEARMV是指预测块的MV根据周围块信息推导得到,不需要传输MVD,NEWMV是指根据传输的MVD得到,GLOBALMV则是指预测块的MV信息根据全局运动推导得到。由此可见,NEARESTMV、NEARMV和NEWMV都依赖于MVP的推导,而对于给定的参考帧,AV1会按照MVP的推导规则计算出4个MVP。MVP的推导规则如下:
电子设备按照一定方式跳跃式扫描当前块的左边1、3、5列和上方1、3、5行的块信息,先选出与当前块使用相同参考帧的块,以对MV进行去重,如果去重后的MV的数目不足8个,则选择同方向的参考帧,继续添加MV;若是MV的数量仍然不足8个,利用全局运动矢量来填充,直至选出8个MV。之后,电子设备会对选择出的8个MV进行排序,依据排序结果选择出最重要的4个MV。之后,电子设备会从这4个MV中,为NEARESTMV、NEARMV和NEWMV这三个单参考帧模式选择出对应的MVP。图3是不同单参考帧模式的MVP选择示意图,参见图3,在单参考帧列表中有多张参考帧,分别为ref1、ref2、ref3、……。电子设备选择使用ref1参考帧的块,选出8个MV后,再选择出最重要的4个MV1(MV2、MV3分别为ref2和ref3的MV),然后将第0个MV1作为NEARESTMV对应的MVP,使用第1~3中的一个MV1作为NEARMV的MVP,使用第0~2中的一个MV1作为NEWMV的MVP。同时,电子设备还能将ZEROMV确定为{0,0}。
对于帧间预测模式的4种单参考帧模式中的每种单参考帧模式,都对应有不同的7个参考帧类型,这7个参考帧类型及其含义如表1所示:
表1
Figure PCTCN2021139051-appb-000001
Figure PCTCN2021139051-appb-000002
对于帧间预测模式的8种组合参考帧模式中的每个组合参考帧模式,均具有{LAST_FRAME,ALTREF_FRAME}、{LAST2_FRAME,ALTREF_FRAME},{LAST3_FRAME,ALTREF_FRAME}、{GOLDEN_FRAME,ALTREF_FRAME},{LAST_FRAME,BWDREF_FRAME}、{LAST2_FRAME,BWDREF_FRAME}、{LAST3_FRAME,BWDREF_FRAME}、{GOLDEN_FRAME,BWDREF_FRAME}、{LAST_FRAME,ALTREF2_FRAME}、{LAST2_FRAME,ALTREF2_FRAME}、{LAST3_FRAME,ALTREF2_FRAME}、{GOLDEN_FRAME,ALTREF2_FRAME}、{LAST_FRAME,LAST2_FRAME}、{LAST_FRAME,LAST3_FRAME}、{LAST_FRAME,GOLDEN_FRAME}、{BWDREF_FRAME,ALTREF_FRAME}。
由此可见,对于帧间预测模式而言,共有156种(即7×4+16×8)模式组合(即预测模式和参考帧模式所组成的组合),对于每种模式组合而言,均会最多对应3种MVP,然后对当前MVP进行运动估计(只有预测模式包含NEWMV模式时,才会进行运动估计)、inter_inter择优、插值方式择优以及运动模式择优这4个过程,从而选择出最合适的模式组合。
示例性的,图4是择优选择最合适的模式组合的过程示意图,参见图4,选择最合适的模式组合的过程可以包括:
S1、开始。
S2、获取MVP的总个数ref_set,令n=0。
其中,n是当前MVP的序号。
S3、判断n是否小于ref_set,是则执行S4,否则执行S5。
S4、获取MVP,n自增1。
S5、结束。
S6、判断当前预测模式是否包含NEWMV,否则执行S7,是则执行S8,即进行运动估计。
这是因为,运动估计的计算量较大,速度较慢,并不是所有的预测模式都需要进行运动估计,只有在预测模式包含NEWMV时,才需要做运动估计。
S7、判断是否双参考帧,否则执行S9,是则执行S10。
S8、运动估计。
S9、快速退出。
在S9之后,紧接着执行S11。
S10、内部(inter_inter)抉择。
S11、最优运动向量下插值方向择优。
S12、运动模式择优。
在运动模式择优之后,返回S3,以进行循环,直至选择出最合适的参考帧模式。
由上述分析可知,每种模式组合的择优过程的计算复杂度都非常大,尤其是包含了NEWMV模式的模式组合,还需要进行运动估计,从而使得编码速度较慢,视频编码效率较低。若是通过强制剔除一些参考帧模式,来减少模式组合的数量,会减少预测模式本身的特征的考虑,也会减少视频场景的特征的考虑,针对预测场景的自适应程度较低,从而使得参考帧模板的自适应能力较低,且极有可能会导致编码损失随着编码时间的增长而增大,所得到的码流质量不高。
本申请实施例提供一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品,能够提高视频编码的效率,下面说明本申请实施例提供的用于帧间预测的电子设备的示例性应用,本申请实施例提供的电子设备可以实施为终端,也可以实施为服务器。其中,服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器;终端可以是智能手机、平板电脑、笔记本电脑、台式计算机、智能音箱、智能手表、智能家电、车载终端等,但并不局限于此。终端以及服务器可以通过有线或无线通信方式进行直接或间接地连接,本申请在此不作限制。下面,将说明电子设备的示例性应用。
参见图5,图5是本申请实施例提供的视频编码系统的一个可选的架构示意图,为实现支撑一个帧间预测应用,在视频编码系统100中,电子设备400读入等待编码的视频帧200,然后将视频帧200分割成多个图像块,从中选择一个作为当前预测块300。电子设备400先确定当前预测块300的当前预测模式,在当前预测块的当前预测模式为预设预测模式时,确定出当前预测块对应的历史预测模式,也即,确定出在当前预测块在预设预测模式之前完成预测的预测模式。电子设备400获取当前预测块300的相邻 块信息、子预测块的子预测块信息,以及当前预测块在历史预测模式时的历史最优参考帧类型,其中,子预测块是利用当前子块划分类型之前的子块划分类型,对当前预测块分块得到的。电子设备400基于历史最优参考帧类型、相邻块信息子预测块信息,以及当前预测块对应的帧类型,生成参考帧模板500。然后,电子设备400利用参考帧模板500,确定出预设预测模式对应的参考帧,利用参考帧对当前预测块进行帧间预测,得到当前预测块对应的预测值。电子设备400在得到预测值之后,会计算出预测值对应的残差,进而通过对残差进行变换、量化和熵编码,得到最终的码流。
参见图6,图6是本申请实施例提供的用于帧间预测的电子设备的结构示意图,图6所示的电子设备400包括:至少一个处理器410、存储器450、至少一个网络接口420和用户接口430。电子设备400中的各个组件通过总线系统440耦合在一起。可理解,总线系统440用于实现这些组件之间的连接通信。总线系统440除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图6中将各种总线都标为总线系统440。
处理器410可以是一种集成电路芯片,具有信号的处理能力,例如通用处理器、数字信号处理器(DSP,Digital Signal Processor),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等,其中,通用处理器可以是微处理器或者任何常规的处理器等。
用户接口430包括使得能够呈现媒体内容的一个或多个输出装置431,包括一个或多个扬声器和/或一个或多个视觉显示屏。用户接口430还包括一个或多个输入装置432,包括有助于用户输入的用户接口部件,比如键盘、鼠标、麦克风、触屏显示屏、摄像头、其他输入按钮和控件。
存储器450可以是可移除的,不可移除的或其组合。示例性的硬件设备包括固态存储器,硬盘驱动器,光盘驱动器等。存储器450可选地包括在物理位置上远离处理器410的一个或多个存储设备。
存储器450包括易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。非易失性存储器可以是只读存储器(ROM,Read Only Memory),易失性存储器可以是随机存取存储器(RAM,Random Access Memory)。本申请实施例描述的存储器450旨在包括任意适合类型的存储器。
在一些实施例中,存储器450能够存储数据以支持各种操作,这些数据的示例包括程序、模块和数据结构或者其子集或超集,下面示例性说明。
操作系统451,包括用于处理各种基本系统服务和执行硬件相关任务的系统程序,例如框架层、核心库层、驱动层等,用于实现各种基础业务以及处理基于硬件的任务;
网络通信模块452,用于经由一个或多个(有线或无线)网络接口420到达其他计算设备,示例性的网络接口420包括:蓝牙、无线相容性认证 (Wi-Fi)、和通用串行总线(USB,Universal Serial Bus)等;
呈现模块453,用于经由一个或多个与用户接口430相关联的输出装置431(例如,显示屏、扬声器等)使得能够呈现信息(例如,用于操作外围设备和显示内容和信息的用户接口);
输入处理模块454,用于对一个或多个来自一个或多个输入装置432之一的一个或多个用户输入或互动进行检测以及翻译所检测的输入或互动。
在一些实施例中,本申请实施例提供的帧间预测装置可以采用软件方式实现,图6示出了存储在存储器450中的帧间预测装置455,其可以是程序和插件等形式的软件,包括以下软件模块:模式确定模块4551、信息获取模块4552、模板生成模块4553和信息预测模块4554,这些模块是逻辑上的,因此根据所实现的功能可以进行任意的组合或进一步拆分。将在下文中说明各个模块的功能。
在另一些实施例中,本申请实施例提供的帧间预测装置可以采用硬件方式实现,作为示例,本申请实施例提供的帧间预测装置可以是采用硬件译码处理器形式的处理器,其被编程以执行本申请实施例提供的帧间预测方法,例如,硬件译码处理器形式的处理器可以采用一个或多个应用专用集成电路(ASIC,Application Specific Integrated Circuit)、DSP、可编程逻辑器件(PLD,Programmable Logic Device)、复杂可编程逻辑器件(CPLD,Complex Programmable Logic Device)、现场可编程门阵列(FPGA,Field-Programmable Gate Array)或其他电子元件。
示例性的,本申请实施例提供一种用于帧间预测的电子设备,包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,实现本申请实施例提供的帧间预测方法。
下面,将结合本申请实施例提供的电子设备的示例性应用和实施,说明本申请实施例提供的帧间预测方法。需要说明的是,本申请的实施例可以借助于云技术实现。其中,云技术是至在广域网或局域网内将硬件、软件、网络等系列资源统一起来,实现数据的计算、存储、处理和共享的一种托管技术。
参见图7,图7是本申请实施例提供的帧间预测方法的一个可选的流程示意图一,将结合图7示出的步骤进行说明。
S101、在当前预测块的当前预测模式为预设预测模式时,确定出当前预测块对应的历史预测模式。
本申请实施例是在对视频进行编码的场景下实现的。电子设备先将输入的视频帧划分为多个图像块,当前预测块是多个图像块中在当前时刻进行预测的图像块。电子设备每次利用不同的参考帧模式来对当前预测块进行预测,当前预测模式即在当前时刻用来为当前预测块进行预测的参考帧模式。当电子设备判断出当前预测模式为预设预测模式时,其就会收集在 预设预测模式之前就已经完成(确定出了对应的参考帧模板)的预测模式,将这些预测模式确定为历史预测模式。也就是说,历史预测模式为在预设预测模式之前完成预测的预测模式。
需要说明的是,本申请实施例中,预设预测模式可以为NEWMV这一单参考帧模式,还可以为包含NEWMV的组合参考帧模式,本申请实施例在此不作限定。
可以理解的是,由于在帧间预测的4种单参考帧模式的预测具有一定的先后顺序,例如,在NEWMV模式预测之前,NEAEESTMV、NEARMV、GLOBALMV已经完成了预测,因而,本申请实施例中的历史预测模式,可以为NEAEESTMV、NEARMV、GLOBALMV模式中的任意一个,也可以为NEAEESTMV、NEARMV、GLOBALMV中的多个所组成的组合。
S102、获取当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及当前预测块在历史预测模式时的历史最优参考帧类型。
由于当前预测块已经利用历史预测模式预测过了,而在利用历史预测模式预测之后,会确定出当前预测块在历史预测模式下的历史最优参考帧类型,电子设备可以直接获取历史最优参考帧类型。同时,由于针对当前预测块,有多种不同的子块划分方式,而子预测块是指利用当前的子块划分类型之前的子块划分类型,对当前预测块进行分块得到的,从子预测块是已知的,以及子预测块的信息均是已知的,同理,相邻块所对应的相邻块信息也是已知的。因此,本申请实施例中,电子设备可以直接获取子预测块信息和相邻块信息。
需要说明的是,相邻块信息可以包括相邻块的运动向量、相邻块对应的参考帧类型,即相邻参考帧类型,还可以包括相邻块的数量等,本申请在此不作限定。
子预测块信息可以包括子预测块的参考帧类型,即历史子参考帧类型,还可以包括子预测块的数量等,本申请在此不作限定。
可以理解的是,相邻块可以是指当前预测块的左边、左上方、上方、右上方的图像块,还可以是指位于当前预测块左边的第1、3、5列的图像块,以及位于当前预测块上方的第1、3、5行的图像块,本申请在此不作限定。
示例性的,本申请实施例提供了当前预测块与相邻块的位置关系示意图,参见图8,图像块E为当前预测块,图像块A、图像块B、图像块C和图像块D,均为当前预测块的相邻块。
S103、基于历史最优参考帧类型、相邻块信息和子预测块信息,以及当前预测块对应的帧类型,生成参考帧模板。
同一个图像块的不同的预测模式所选择出的最优参考帧类型,有可能都是同一个参考帧类型,如此,为了减少计算复杂度,可以直接继承历史最优参考帧类型,作为参考帧模板的一部分。另外,相邻块和当前预测块 比较接近,其内容可能是类似的,因而,可以基于相邻块的参考帧类型、运动向量等信息,来为当前预测块生成一部分的参考帧模板。同时,子预测块是利用当前的子块划分类型之前的子块划分类型所确定出的,与当前预测块关系紧密,因而,也可以直接利用子预测块信息,来生成一部分的参考帧模板。
示例性的,针对当前预测块可以由10种不同的子块划分类型,分别为NONE划分预测、HORZ划分预测、VERT划分预测、SPLIT划分预测、HORZ_4划分预测、HORZ_A划分预测、HORZ_B划分预测、VERT_A划分预测、VERT_B划分预测和VERT_4划分预测。这10种子块划分类型的顺序不固定,可以有多种顺序的组合情况。本申请实施例提供了10种不同的子块划分类型的一种顺序示意图,参见图9,这10种子块划分类型可以按照NONE划分预测9-1、HORZ划分预测9-2、VERT划分预测9-3、SPLIT划分预测9-4、HORZ_A划分预测9-5、HORZ_B划分预测9-6、VERT_A划分预测9-7、VERT_B划分预测9-8、HORZ_4划分预测9-9和VERT_4划分预测9-10的顺序进行,在当前的子块划分类型为HORZ_A划分预测9-5时,电子设备可以直接获取NONE划分预测9-1、HORZ划分预测9-2、VERT划分预测9-3、SPLIT划分预测9-4这几种子块划分类型所对应的子预测块的子预测块信息。
S104、利用参考帧模板,确定出预设预测模式的参考帧,利用参考帧对当前预测块进行帧间预测,得到当前预测块对应的预测值。
电子设备在确定出参考帧模板之后,就会将候选参考帧类型与参考帧模板进行一致性的对比,当候选参考帧类型与参考帧模板一致时,将候选参考帧类型作为当前预测块的参考帧,然后依据参考帧,来对当前预测块进行帧间预测,从而得到当前预测块对应的预测值,直至循环完所有的候选参考帧类型。从而完成参考帧模板的应用。
本申请实施例中,在当前预测模式为预设好的预测模式时,确定出当前预测预设模式对应的历史预测模式,然后获取当前预测块在历史预测模式时的历史最优参考帧类型,当前预测块的相邻块在历史预测模式时的信息、当前预测块在历史预测模式时经过分块所得到的子预测块所对应的子预测块信息,以使得当前预测块直接继承在预设好的预测模式之前的就已经进行过的历史预测模式对应的各种信息,然后在预设预测模式时,针对当前预测块自适应生成参考帧模板,这样,充分考虑了预设好的预测模式在视频编码时的特性,直接利用一些已有的信息生成参考帧模板,极大地减小了计算复杂度,从而提高了视频编码的效率。
在本申请的一些实施例中,相邻块信息包括:相邻块的运动向量、相邻块的参考帧类型、相邻块的数量;子预测块信息包括:子预测块的参考帧类型、子预测块的数量。在此情况下,参见图10,图10是本申请实施例提供的帧间预测方法的一个可选的流程示意图二,基于历史最优参考帧类 型、相邻块信息、子预测块信息,以及当前预测块对应的帧类型,生成参考帧模板,即S103的具体实现过程,可以包括:S1031-S1034,如下:
S1031、基于历史最优参考帧类型、子预测块的参考帧类型和相邻块的参考帧类型,确定出当前预测块对应的初始模板。
候选参考帧类型的范围其实是有限的,电子设备先利用子预测块的参考帧类型和相邻块的参考帧类型,判断出每个候选参考帧类型被选中的次数,然后基于选中的次数的多少,从每个候选参考帧类型选择出能够作为当前预测块在预设好的预测模式下的合适的参考帧类型。同时,由于当前预测块在不同的预测模式下的最合适的参考帧类型可能是相同的,因而,电子设备还可以直接继承历史最优参考帧类型,即当前预测块在历史预测模式时最匹配的参考帧类型。电子设备将选择出的合适的参考帧类型,和继承到的历史最优参考帧类型,确定为当前预测块的初始模板。
S1032、根据相邻块的运动向量,生成当前预测块对应的主模板。
电子设备依据相邻块的运动向量,构造出在利用每个候选参考帧类型进行预测时的误差情况,然后依据该误差情况,从每个候选参考帧类型中,确定出在对当前预测块进行预测时,产生的较小误差的候选参考帧类型,将该候选参考帧类型,作为当前预测块对应的主模板。
可以理解的是,在本申请的一些实施例中,电子设备还可以将每个候选参考帧类型进行类别划分,例如,划分为前向参考类别、后向参考类别和长期参考类型,然后在每个参考类别中,选择出最合适的候选参考帧类型,利用选择出的候选参考帧类型,生成当前预测块对应的主模板。
S1033、利用当前预测块的帧类型、相邻块数量和子预测块的数量,确定出当前预测块对应的增强模板。
除了确定出初始模板和主模板之外,电子设备还需要为当前预测块确定出增强模板,从而避免在初始模板和主模板都效果不好时,能够利用增强模板,来保证最终确定出的参考帧模板的质量。电子设备先利用当前预测块的帧类型,确定出一个阈值,然后再利用相邻块数量和子预测块数量计算出帧间预测块的数量,将该帧间预测块的数量与确定出的阈值进行比较,从而确定出是否要将一些候选参考帧类型,作为增强模板。这样,电子设备就完成了增强模板的生成。
S1034、利用初始模板、主模板和增强模板,生成当前预测块对应的参考帧模板。
在一些实施例中,电子设备在得到初始模板、主模板和增强模板之后,就会将初始模板、主模板和增强模板整合为一个集合,这个集合就组成了当前预测块在预设好的预测模式时的参考帧模板。
示例性的,当初始模板用mask_init表示、主模板用mask_main表示、增强模板用mask_add表示时,参考帧模板就可以表示为mask_newmv=mask_init|mask_main|mask_add(|表示并集)。
在另一些实施例中,电子设备还可以将初始模板、主模板和增强模板进行融合,得到当前预测块对应的参考帧模板。
本申请实施例中,电子能够利用继承得到的信息、参数,为当前预测块分别生成主模板、初始模板和增强模板,然后将这三种模板整合在一起,得到当前预测块对应的参考帧模板,以便于后续利用参考帧模板确定出参考帧。
在本申请的一些实施例中,基于历史最优参考帧类型、子预测块的参考帧类型和相邻块的参考帧类型,确定出当前预测块对应的初始模板,即S1031的具体实现过程,可以包括:S1031a-S1031c,如下:
S1031a、依据子预测块的参考帧类型和相邻块的参考帧类型,确定出第一初始模板。
需要说明的是,电子设备可以依据子预测块的参考帧类型,以及相邻块的参考帧类型,从所有的候选参考帧类型中,确定出被选中的作为子预测块和相邻块的候选参考帧类型有哪些,然后依据被选中的候选参考帧类型的选中次数,来确定被选中的候选参考帧类型,来确定出第一初始模板是被选中的参考帧类型,还是空模板。
S1031b、将历史最优参考帧类型,作为第二初始模板。
电子设备直接继承历史最优参考帧类型,将最优参考帧类型作为第二初始模板。可以理解的是,当历史预测模式为NEARESTMV模式、NEARMV模式和GLOBALMV模式时,电子设备可以依次对这三个的模式的最优参考帧类型的值(最优参考帧类型均属于表1给出的7个参考帧类型)与0进行比较,当某个模式的最优参考帧类型的值大于0时,就将该模式的最优参考帧类型添加至第二初始模板中。
示例性的,当NEARESTMV模式对应的最优参考帧类型的值大于0时,将NEAREST模式下的最优参考帧类型添加至第二初始模板中。
S1031c、利用第一初始模板和第二初始模板,确定出当前预测块对应的初始模板。
在一些实施例中,电子设备在得到第一初始模板和第二初始模板之后,将第一初始模板和第二初始模板集合在一起,所得到的集合就为当前预测块的初始模板。
示例性的,当第一初始模板用mask_init1表示,第二初始模板用mask_init2表示时,初始模板就可以表示为mask_init=mask_init1|mask_init2。
在另一些实施例中,电子设备还可以将第一初始模板和第二初始模板进行加权,以得到当前预测块对应的初始模板。
本申请的其他实施例中,电子设备能够先利用子预测块的参考帧类型和相邻块的参考帧类型,确定出初始模板的一部分,然后再将历史最优参考模板,作为初始模板的另一部分,将这两部分整合在一起,就实现了初始模板的确定过程。
本申请实施例中,依据所述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出第一初始模板,继承历史最优参考帧类型,作为第二初始模板,然后将第一初始模板和第二初始模板进行整合为初始模板,如此,电子设备就实现了初始模板的确定过程。
在本申请的一些实施例中,依据子预测块的参考帧类型和相邻块的参考帧类型,确定出第一初始模板,即S1031a的具体实现过程,可以包括:S201-S203,如下:
S201、利用相邻块的参考帧类型和子预测块的参考帧类型,确定出至少一个历史选中参考帧类型。
由于相邻块的参考帧类型和子预测块的参考帧类型,实际上都是从所有的候选参考帧类型中确定出的,因而,电子设备可以将相邻块的参考帧类型和子预测块的参考帧类型进行合并,明确所有的候选参考帧类型中,有哪些候选参考帧类型在历史预测模式中被选中,从而将历史预测模式时被选中的候选参考帧类型,确定为历史选中参考帧类型。由于选中的候选参考帧类型很可能不止一个,因而,电子设备可以得到至少一个历史选中参考帧类型。
S202、统计出至少一个历史选中参考帧类型中的每个历史选中参考帧类型的选中次数。
电子设备在得到至少一个历史选中参考帧类型之后,就会对每个历史选中参考帧类型在历史预测模式时的选中次数来进行统计,从而得到每个历史选中参考帧类型的选中次数。
在一些实施例中,在统计历史选中参考帧的选中次数时,电子设备是分别统计每个历史选中参考帧作为子预测块的参考帧类型的次数,以及统计每个历史选中参考参考帧作为相邻块的参考帧类型的次数,然后针对每个历史选中参考帧类型,将这两个次数相加,就得到了每个历史选中参考帧类型的选中次数。
示例性的,LAST_FRAME为一个历史选中参考帧类型,其中,LAST_FRAME被子预测块选中了3次,被相邻块选中了2次,那么,LAST_FRAME的选中次数就为5次。
在另一些实施例中,在统计历史选中参考帧的选中次数时,电子设备会对每个历史选中参考帧作为子预测块的参考帧类型的次数,以及统计每个历史选中参考参考帧作为相邻块的参考帧类型的次数求均值,得到每个历史选中参考帧类型的选中次数。
S203、利用选中次数,从每个历史选中参考帧类型中,筛选出第一初始模板。
电子设备可以将每个历史选中参考帧类型的选中次数进行大小排序,从而确定出每个历史选中参考帧类型的选中次数的顺序,然后最大的一个选中次数所对应的历史选中参考帧类型挑选出来,作为第一初始模板。电 子设备还可以将每个历史选中参考帧类型的选中次数,都与设定好的阈值进行比较,将选中次数大于设定好的阈值的历史选中参考帧类型,作为第一初始模板。
本申请实施例中,电子设备能够先确定给出至少一个历史选中参考帧类型,然后依据每个历史参考帧类型的选中次数,来从每个历史选中参考帧类型中,选择出第一初始模板,从而将被大多数的相邻块、子预测块所选中的历史选中参考帧类型,作为第一初始模板,使得第一初始模板更加准确。
在本申请的一些实施例中,利用选中次数,从每个历史选中参考帧类型中,筛选出第一初始模板,即S203的具体实现过程,可以包括:S2031-S2034,如下:
S2031、从每个历史选中参考帧类型的选中次数中,筛选出最大的选中次数。
S2032、将每个历史选中参考帧类型的选中次数进行放大,得到放大后的选中次数。
可以理解的是,电子设备可以按照预设倍数对选中次数进行放大,也可以按照随机倍数对选中次数进行放大。
需要说明的是,将选中次数按照预设倍数放大,能够尽可能多的从每个历史选中参考帧类型中,挑选出参考帧,即只要某个历史选中参考帧类型被选中的次数达到了一定程度,有可能作为参考帧模板,其就可以通过按照预设倍数放大的方法,被选择为第一初始模板,从而加入到参考帧模板中去。
可以理解的是,预设倍数可以设置为4,也可以设置为6,还可以按照需要设置为其他值,本申请在此不作限定。
S2033、将放大后的选中次数与最大的选中次数进行比较,得到每个历史选中参考帧类型对应的比较结果。
其中,比较结果表征放大后的选中次数是否大于等于最大的选中次数。
S2034、将比较结果表征放大后的选中次数大于等于最大的选中次数的历史选中参考帧类型,作为第一初始模板。
电子设备将放大后的选中次数和挑选出的最大选中次数进行比较,从而判断出每个历史选中参考帧类型放大后的选中次数,和所有历史选中参考帧类型中最大的一个选中次数的大小关系,将放大后的选中次数大于或者等于最大的选中次数的历史选中参考帧类型,作为第一初始模板。
示例性的,每个历史选中参考帧类型的选中次数可用ref_num[i]表示,最大的选中次数可用ref_num[0]表示(经过大小排序之后排在第一位的),预设倍数为4,则当ref_num[i]*4≥ref_num[0]时,将mask_init1|=(1<<i)(即将满足条件的历史选中参考帧的帧号i,存入用于记录第一初始模板的数串中)。
本申请实施例中,电子设备能够将每个历史选中参考帧类型的选中次数经过放大之后,与最大的一个选中次数进行比较,从而从每个历史选中参考帧类型中,选择出第一初始模板,以便于后续生成初始模板。
在本申请的一些实施例中,存在收集不到子预测信息和相邻块信息的情况(即子预测信息和相邻块信息为空信息),从而子预测块的参考帧类型和相邻块的参考帧类型均为空,使得第一初始模板为空,并且当没有收集到历史最优参考帧类型(例如,在NEWMV模式之前并未进行NEARESTMV、NEARMV和GLOBALMV模式的预测),使得第二初始模板也为空的情况,这时,初始模板为空,在这种情况下,在利用第一初始模板和第二初始模板,组成当前预测块的初始模板之后,即在S1031c之后,该方法还可以包括:S1031d,如下:
S1031d、在初始模板中,添加至少一个预设参考帧类型,得到校正后的初始模板。
在初始模板为空时,电子设备为了保证最终的参考帧模板的有效性,会对初始模板进行矫正补充。此时,电子设备会将设置好的至少一个预设参考帧类型,加入到空的初始模板中,从而将添加了至少一个预设参考帧类型的初始模板,记为校正后的初始模板。在此情况下,利用初始模板、主模板和增强模板,生成当前预测块对应的参考帧模板,即S1034的实现过程,就会变为:利用校正后的初始模板、主模板和增强模板,生成当前预测块对应的参考帧模板。
需要说明的是,至少一个预设参考帧类型,可以是挑选出的候选参考帧类型,例如,LAST_FRAME、BWDREF_FRAME、ALTREF_FRAME,还可以是将从视频帧中挑选出的某个视频帧,例如,第一个视频帧等,本申请在此不作限定。
可以理解的是,至少一个预设参考帧类型,可以仅包括1个预设参考帧类型,例如,仅包括LAST_FRAME;也可以包括3个预设参考帧类型,例如,包括LAST_FRAME、BWDREF_FRAME、ALTREF_FRAME,本申请在此不作限定。
本申请实施例中,电子设备还能够在初始模板为空时,向初始模板中增加至少一个预设参考帧类型,从而实现对初始模板的校正补充,以保证最终的参考帧模板的有效性。
在本申请的一些实施例中,相邻块信息包括:相邻块的参考帧类型,子预测块信息包括:子预测块的参考帧类型;获取当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,即S102的具体实现过程,可以包括:S1021-S1023,如下:
S1021、对相邻块的最优预测模式进行判断,得到第一判断结果,对子预测块的最优预测模式进行判断,得到第二判断结果。
需要说明的是,第一判断结果表征相邻块的最优预测模式是否为第一 预设模式,第二判断结果表征子预测块的最优预测模式是否为第二预设模式。
S1022、当第一判断结果表征相邻块的最优预测模式为第一预设模式时,获取相邻块的参考帧类型。
S1023、当第二判断结果表征子预测块的最优预测模式为第二预设模式时,获取子预测块的参考帧类型。
电子设备对在获取相邻块所对应的相邻块信息时,是先对相邻块的最优预测模式进行判断,只有在判断出相邻块的最优预测模式为第一预设模式时,才会对相邻帧的参考帧类型进行记录。同理的,电子设备只有在判断出子预测块的最优预测模式为第二预设模式时,才会对子预测块的参考帧类型进行记录,从而通过此种方式来获取到相邻块信息和子预测块信息。当相邻块的最优预测模式不为第一预设模式时,相邻帧的参考帧类型即为空,当子预测块的最优预测模式不为第二预设模式时,子预测块的参考帧类型为空。
需要说明的是,第一预设模式可以是NEARMV模式,也可以为NEARESTMV模式,第二预设模式可以为NEARESTMV模式,也可以为NEARMV模式,本申请在此不作限定。
示例性的,当相邻块为图8所示出的4个相邻块时,当每个相邻块的最优预测模式为inter模式时,电子设备可以记录下相邻块的的参考帧类型,得到ref_nb[4]={ref_A,ref_B,ref_C,ref_D},从而得到相邻块信息。
本申请实施例中,电子设备可以在相邻块和子预测块的最优预测模式满足条件时,才获取相邻块的参考帧类型和子预测块的参考帧类型,从而实现相邻块信息和子预测块信息的获取过程。
在本申请的一些实施例中,根据相邻块的运动向量,生成当前预测块对应的主模板,即S1032的具体实现过程,可以包括:S1032a-S1032d,如下:
S1032a、利用相邻块的运动向量,为全量候选参考帧类型中的每个候选参考帧类型计算出选择参数。
需要说明的是,全量候选参考帧类型表征帧间预测时所有可用的参考帧类型,在本申请的一些实施例中,全量候选参考帧类型可以是表1给出的7个参考帧类型,还可以是从这7个参考帧类型中挑选出来的若干个参考帧类型。选择参数表征了相邻块的输入值和预测值的差异,这种差异可以利用SAD或是SATD来计算。SATD的精确度更高,但是计算复杂度也相应的较大一些。
S1032b、将全量候选参考帧类型分别划分为候选前向参考帧类型、候选后向参考帧类型和候选长期参考帧类型。
电子设备将按照每个候选参考帧类型的参考方向,将全量候选参考帧类型划分为三个组别,分别为候选前向参考帧类型、候选后向参考帧类型 和候选长期参考帧类型,以便于后续对这三个组别,分别进行参考帧类型的选择过程,即从这三个组别中,分别选择出前向参考帧类型、后向参考帧类型和长期参考帧类型。
示例性的,当全量候选参考帧类型为表1所提供的7个候选参考帧类型时,电子设备按照每个候选参考帧类型的参考方向,将LAST_FRAME、LAST_FRAME2和LAST3_FRAME划分为候选前向参考帧类型,将BWDREF_FRAME、ALTREF2_FRAME和ALTREF_FRAME划分到候选后向参考帧类型,将GOLDEN_FRAME划分到候选长期参考帧类型。
S1032c、利用候选前向参考帧类型对应的选择参数、候选后向参考帧类型的选择参数,以及候选长期参考帧类型对应的选择参数,挑选出前向参考帧类型、后向参考帧类型和长期参考帧类型。
S1032d、利用前向参考帧类型、后向参考帧类型和长期参考帧类型,整合出当前预测块对应的主模板。
电子设备对每个组别中的候选参考帧类型的选择参数进行大小比较,依据比较结果,从候选前向参考帧类型中选择出前向参考帧类型,从候选后向参考帧类型中,选择出后向参考帧类型,从候选长期参考帧类型中,选择出长期参考帧类型。之后,电子设备将选择出的前向参考帧类型、后向参考帧类型和长期参考帧类型集中在一个集合中,所得到的集合,即为主模板。
在另一些实施例中,电子设备还可以将前向参考帧类型、后向参考帧类型和长期参考帧类型进行加权融合,得到当前预测块的主模板。
示例性的,当前向参考帧类型用ref_list0表示,后向参考帧类型用ref_list1表示,长期参考帧类型为GOLDEN_FRAME时,主模板mask_main=ref_list0|ref_list1|GOLDEN_FRAME。
本申请实施例中,电子设备能够依据相邻块的运动向量,为每个候选参考帧类型计算出选择参数,然后依据选择参数,从依据参考方向对全量候选参考帧类型所划分得到的候选前向参考帧类型、候选后向参考帧类型和候选长期参考帧类型中,选择出前向参考帧类型、后向参考帧类型和长期参考帧类型,从而将这些选择出的参考帧类型,集合起来的主模板。
在本申请的一些实施例中,利用候选前向参考帧类型对应的选择参数、候选后向参考帧类型的选择参数,以及候选长期参考帧类型对应的选择参数,挑选出前向参考帧类型、后向参考帧类型和长期参考帧类型,即S1032c的具体实现过程,可以包括:S301-S303,如下:
S301、将候选前向参考帧类型中选择参数最小的候选参考帧类型,作为前向参考帧类型。
电子设备将候选前向参考帧类型中的各个候选参考帧类型的选择参数相互比较,从中挑选出最小的一个选择参数,然后将最小的选择参数所对应的候选参考帧类型,作为前向参考帧类型。
S302、将候选后向参考帧类型中选择参数最小的候选参考帧类型,作为后向参考帧类型。
电子设备将候选后向参考帧类型中的各个候选参考帧类型的选择参数相比较,从中挑选出最小的一个选择参数,将候选后向参考帧类型中最小的选择参数所对应的候选参考帧类型,作为后向参考帧类型。
需要说明的是,在本申请中,先执行S301还是先执行S302并不会影响前向参考帧类型和后向参考帧类型的选择,因而,在一些实施例中,电子设备可以先执行S302,再执行S301,或者是S301和S302同时执行。
S303、将候选长期参考帧类型中,选择参数小于前向参考帧类型对应的选择参数和后向参考帧类型对应的选择参数之和的候选参考帧类型,作为长期参考帧类型。
电子设备对之前所选择出的前向参考帧类型的选择参数和候选参考帧类型的选择参数进行求和,得到求和结果,然后将候选长期参考帧类型中的各个候选参考帧类型的选择参数,与求和结果进行大小比较,从而将候选长期参考帧类型中选择参数小于求和结果的候选参考帧类型选择出来,作为长期参考帧类型。
示例性的,当前向参考帧类型的选择参数用sad_list0表示,后向参考帧类型的选择参数用sad_list1表示,候选长期参考帧类型中仅有一个候选参考帧类型GLODEN_FRAME时,当GLODEN_FRAME的选择参数小于sad_list0+sad_list1时,将GLODEN_FRAME作为长期参考帧类型。
本申请实施例中,电子设备可以分别寻找出候选前向参考帧类型、候选后向参考帧类型中的最小的选择参数,从而确定出前向参考帧类型和后向参考帧类型,然后再依据前向参考帧类型的选择参数和后向参考帧类型的选择参数之和,从候选长期参考帧类型中,选择出长期参考帧类型,这样,电子设备就能够选择出前向参考帧类型、后向参考帧类型和长期参考帧类型。
在本申请的一些实施例中,利用相邻块的运动向量,为全量候选参考帧类型中的每个候选参考帧类型计算出选择参数,即S1032a的具体实现过程,可以包括:S401-S403,如下:
S401、将全量候选参考帧类型中的每个候选参考帧类型与相邻块的运动向量进行匹配,得到匹配结果。
其中,匹配结果表征每个候选参考帧类型是否存在匹配的运动向量。
相邻块的运动向量和相邻块的参考帧类型之间存在着对应关系,电子设备可以判断某个候选参考帧类型,与相邻块的参考帧类型进行匹配,从而判断出相邻块的参考帧类型是否与该候选参考帧类型相同。当相同时,电子设备就会认为该候选参考帧类型与相邻块的运动向量匹配上了,即存在匹配的运动向量。按照该过程,电子设备就会将相邻块的运动向量划分到各个候选参考帧类型之下,从而得到每个候选参考帧类型的匹配结果。
需要说明的是,经过匹配之后,有些候选参考帧类型可能拥有匹配的运动向量,而有些候选参考帧类型并没有匹配的运动向量,因而,需要一个匹配结果来对这些情况进行记录。
S402、当匹配结果表征每个候选参考帧类型不存在匹配的运动向量时,将预设好的数值,作为每个候选参考帧类型的选择参数。
当匹配结果表明,当某个候选参考帧类型并不存在匹配的运动向量时,电子设备就会对该候选参考帧类型的选择参数进行初始化,即用预设好的数值,作为每个后选参考帧类型的选择参数。
可以理解的是,预设好的数值可以是INT32_MAX,即32位的最大数,也可以是50000的二进制表示,本申请在此不作限制。
S403、当匹配结果表征每个候选参考帧类型存在匹配的运动向量时,利用相邻块在基于相邻块的运动向量进行预测时的预测值和输入值,计算出每个候选参考帧类型的选择参数。
当某个候选参考帧类型的匹配结果表明该候选参考帧类型存在匹配的运动向量时,电子设备会利用相邻块在利用相邻块的运动向量进行预测所得到的预测值,以及相邻块本身的输入值,来针对每个候选参考帧类型,计算出选择参数。
在本申请的一些实施例中,匹配的运动向量可能有多个,这时,这些匹配的运动向量都可以看作是子向量,将匹配的运动向量作为一个统称,从而,匹配的运动向量包括多个子向量。并且,每个子向量都会对应一个预测值,将每个子向量对应的预测值记为子预测值,从而,预测值包括多个子预测值,且多个子向量和多个子预测值相互对应。此时,利用相邻块在基于相邻块的运动向量进行预测时的预测值和输入值,计算出每个候选参考帧类型的选择参数,即S403的具体实现过程,可以包括:S4031S4033,如下:
S4031、对多个子向量中的每个子向量所对应的子预测值的像素,与相邻块的输入值的像素求差值,得到每个子向量对应的像素差值。
S4032、将每个子向量对应的像素差值的绝对值进行累加,得到每个子向量对应的临时选择参数。
示例性的,本申请实施例提供了一种计算临时选择参数的方式,参见式(2):
Figure PCTCN2021139051-appb-000003
其中,(i,j)为像素,(m,n)为相邻块的大小,dst(i,j)为候选参考帧类型在利用某个子向量进行预测时的预测值,src(i,j)为相邻块的输入值,sad为计算出的临时选择参数。
S4033、将每个子向量的临时选择参数中最小的临时选择参数,作为每个候选参考帧类型对应的选择参数。
电子设备针对每个候选参考帧类型,从每个子向量中的临时选择参数 中,选择出最小的一个临时选择参数,将最小的临时选择参数作为最终的选择参数,如此,电子设备就计算出每个候选参考帧类型的选择参数。
在本申请的一些实施例中,将全量候选参考帧类型中的每个候选参考帧类型与相邻块的运动向量进行匹配,得到匹配结果,即S401的具体实现过程,可以包括:S4011,如下:
S4011、当相邻块可用,相邻块的最优预测模式是第二预设模式,并且相邻块的参考帧类型与每个候选参考帧类型相同时,确定每个参考帧类型存在匹配的运动向量。
电子设备在将每个候选参考帧类型和相邻块的运动向量进行匹配时,是先判断相邻块是否可用,在相邻块可用时,判断相邻块的最优预测模式是否为第二预设模式,以及判断相邻块的参考帧类型和每个候选参考帧类型是否相同。由于相邻块的运动向量和相邻块的参考帧类型之间存在对应关系,电子设备在判断出每个当相邻块的最优预测模式确实是第二预设模式,并且相邻块的参考帧类型和每个候选参考帧类型相同时,电子设备会认为该相邻块的参考帧类型所对应的相邻块的运动向量,与每个候选参考帧类型相匹配,从而每个候选参考帧类型存在匹配的运动向量。
本申请实施例中,电子设备可以利用先将相邻块的运动向量和每个候选参考帧类型进行匹配,依据每个候选参考帧类型是否存在匹配的运动向量的情况,来为每个候选参考帧类型计算出选择参数确定,如此,电子设备就能够得到每个候选参考帧类型所对应的选择参数了。
在本申请的一些实施例中,利用当前预测块的帧类型、相邻块数量和子预测块的数量,确定出当前预测块对应的增强模板,即S1033的具体实现过程,可以包括:S1033a-S1033d,如下:
S1033a、根据当前预测块的帧类型和预设好的帧类型权重对应关系,确定出当前预测块对应的帧类型权重。
当前预测块所属的视频帧是确定的,而每个视频帧的帧类型,是在预测前就已经确定好的。不同的帧类型,参考关系不同,从而重要性也会不同。被较多的视频帧所参考的视频帧的重要性,就会比较少的视频帧所参考的视频帧的重要性高。电子设备可以依据当前预测块对应的视频帧帧类型,确定出重要性,然后依据重要性来确定出对应的帧类型权重。
示例性的,图11是本申请实施例提供的I帧、P帧、B帧、b帧和非参考B帧之间的参考关系示意图,由图11可以看出,按照参考关系为这些帧类型所确定出的重要性顺序为:I帧>P帧>B帧>b帧>非参考B帧。
在一些实施例中,重要性还与画面组(GOP)的结构有关。图12是本申请实施例提供过的GOP16的参考关系示意图,从图12可知,POC16参考了POC0;POC8参考POC0和POC16;POC4参考了POC0和POC8;POC2参考了POC0和POC4;而剩余的POC均未被参考。由此,可以确定出表2所示出的权重等级,参见表2
表2
Figure PCTCN2021139051-appb-000004
从而,GOP16中的各个视频帧的权重的排序,就为:POC0>POC16>POC8>POC4>POC2>POC1。
从而,电子设备可以根据帧类型选择出帧类型权重。
S1033b、依据帧类型权重,生成增强阈值;
电子设备可以选取一定数量的视频帧,然后将选定的视频帧的帧类型权重,作为是增强阈值,还可以先为不同的帧类型权重设定不同的阈值参数,然后将选定的视频帧的帧类型权重所对应的阈值参数,作为是将增强阈值。
示例性的,本申请实施例提供了一种生成增强阈值的公式,参见式(3):
thr=param[slice_level]    (3)
其中,param是阈值参数,param的值可以自行定义,例如可为param[6]={5,5,5,5,4,4},thr为生成的增强阈值,slice_level为帧类型权重,这样,就可以通过查表的方式,来生成增强阈值。
S1033c、对相邻块的数量和子预测块的数量进行求和,得到和值结果。
S1033d、当和值结果小于等于增强阈值时,将至少一个预设参考帧类型,作为当前预测块对应的增强模板。
电子设备将相邻块的数量与子预测块的数量相加,得到和值结果,然后将和值结果和增强阈值进行大小比较。当电子设备发现和值结果小于或者等于该增强阈值时,就会获取至少一个预设参考帧类型,然后将至少一个预设参考帧类型,作为增强模板。这样,电子设备就完成了增强模板的生成过程。
需要说明的是,至少一个预设参考帧类型可以为LAST_FRAME、BWDREF_FRAME、ALTREF_FRAME,也可以是LAST_FRAME、BWDREF_FRAME、GOLDEN_FRAME,本申请在此不作限定。
本申请实施例中,电子设备可以先根据当前预测块的帧类型确定出帧类型权重,然后依据帧类型权重生成增强阈值,从而依据相邻块的数量和子预测块的数量之和,与生成的增强阈值之间的大小关系,选择是否要将 至少一个预设参考帧类型加入到增强模板中,从而保证了最终生成的参考帧模板的有效性。
下面,将说明本申请实施例在一个实际的应用场景中的示例性应用。
本申请实施例是在编码器(电子设备)为NEWMV模式(预设预测模式)生成参考帧模板的场景下实现的。该过程的思想是:在NEWMV模式预测之前,NEARSETMV、NEARMV和GLOBAALMV已经确定,可以继承这些模式的最优参考帧,以及继承相邻块、已预测过的不同CU划分的信息;根据每个参考帧相邻MV,计算每个MV对应的SAD,然后将最小SAD作为该参考帧的SAD,将参考帧分为前向参考、后向参考和长期参考3组,每组均选出组内最小SAD对应的参考帧作为改组参考帧代表,形成NEWMV模式的主模板;再结合收集到的信息,整合成最终的NEWMV模式的参考帧模板。图13是本申请实施例提供的为NEWMV模式生成参考帧模板的过程示意图,参见图13,该过程可以包括:
S501、确定每个参考帧(候选参考帧类型)的sad(选择参数)。
S502、获取相邻块及不同CU划分下(得到子预测块)参考帧信息。
S503、生成NEWMV模式的参考帧模板。
S504、参考帧模板应用。
下面,对每个步骤进行解释。
在S501中,编码器初始化每个参考帧对应的SAD为INT32_MAX(预设好的数值),其中,INT32_MAX为32位的最大数。S501包括3个部分:
S5011、获取相邻块运动向量候选列表。
当前块(当前预测块)与相邻块的位置关系可如图8所示。电子设备一次判断每个相邻块的位置,如果当前的相邻块位置可用,且最优模式是帧间预测,参考帧与当前的参考帧(每个候选参考帧类型)相同,则记录下当前MV。当前的相邻块的参考帧下可能有多个MV,也可能一个MV都没有(为每个候选参考帧类型确定匹配的运动向量的,一个候选参考帧类型的匹配的运动向量可能有多个子向量)。
S5012、依次计算每个MV(子向量)对应下的SAD(临时选择参数)。
编码器可以按照式(2)来计算SAD。这里,也可以是计算每个MV下的SATD。SATD更精确些,但是计算复杂度也较大。
S5013、选出每个参考帧对应的最小SAD(最小的临时选择参数)。
由于当前参考帧下可能有多个MV,因此会得到多个SAD,取最小的SAD作为当前的参考帧的SAD;当前参考帧下也可能一个MV都没有,这时,当前参考帧的SAD为INT32_MAX(每个候选参考帧类型不存在匹配的运动向量时,将预设好的数值,作为每个候选参考帧类型的选择参数)。
S502包括了三部分,分别为:
S5021、获取相邻块参考帧信息。
当前块与相邻块的位置关系如图8所示。若相邻块存在,且最优模式 为inter模式(第一预设模式),则记录下相邻块的参考帧信息(获取相邻块的参考帧类型,即获取相邻块信息)。
S5022、获取不同CU划分类型下的参考帧信息。
对当前块有10种类型的CU划分类型,划分如图9所示,编码预测过程是按照顺序依次进行,因此,做当前划分类型(当前的子块划分类型)时,有可能其他CU划分类型(之前的子块划分类型)已经做过,比如,当前是HORZ_A划分类型时,前面已经做过NONE划分类型,因而,NONE划分类型的划分信息可以拿来使用。
拿到每个已确定CU划分类型下每个子块(子预测块)的参考帧信息(子预测块信息)。该过程的判断方法与相邻块类似,如果存在子块位置存在,且最优模式为inter模式(第二预设模式),则记录下每个子块的参考帧信息。
S5023、数据整理。
统计根据相邻块和不同CU划分的帧间预测块个数(相邻块的数量和子预测块的数量的和值结果),记作inter_total_num。统计收集到的参考帧(历史选中参考帧类型)以及每个参考帧被选中的个数(选中次数),并将参考帧按照选中个数由大到小排序。
S503所生成的参考帧模板记做mask_newmv,由初始化模板、主模板和增强模板三部分组成,即:mask_newmv=mask_init|mask_main|mask_add。
S503包括以下几个步骤:
S5031、生成初始化模板(初始模板)。
记做mask_init,是根据收集到的参考帧信息生成的(即相邻块的参考帧类型和子预测块的参考帧类型)。生成过程主要包括:
S50311、根据相邻块和不同CU划分类型下的参考帧信息,确定是否将当前参考帧加入初始化模板。图14提供了生成初始化模板过程的示意图一,参见图14,该过程包括:
S601、开始。
S602、获取选中的参考帧序号。
S603、判断序号是否小于等于参考帧种类数。参考帧种类数指的是判收集到的参考帧信息中的参考帧类型的种类数。
若为是,执行S604,若为否,执行S607。
S604、判断选中个数的4倍(放大后的选中次数)是否大于等于最大的选中个数(最大的选中次数)。
若为是,执行S605,若为否,执行S606。
S605、将选中的参考帧的序号记录下来。
S606、序号自增1,重新进入到S602中。
S607、结束。
需要说明,这里的选中的参考帧的序号,是参考帧按照选中个数由大到小排序之后确定的,因而,选中个数越来越小时,参考价值也越来越小,如果全部加入到初始化模板,会增加不必要的参考帧,拖累编码速度。
至此,就完成了S50311的过程,进入到S50312的过程中。
S50312、根据已预测过的模式的参考帧信息,生成初始化模板。
图15提供了生成初始化模板的过程示意图二,参见图15,该过程包括:
S701、开始。
S702、判断是否收集到NEARESTMV模式的最优参考帧(历史最优参考帧类型)。
若为是,则执行S703,若为否,则执行S704。
S703、将NEARESTMV模式的最优参考帧的加入到初始化模板。
S704、判断是否收集到NEARMV模式的最优参考帧(历史最优参考帧类型)。
若为是,则执行S705,若为否,则执行S706。
S705、将NEARMV模式的最优参考帧加入到初始化模板。
S706、判断是否收集到GLOBALMV模式的最优参考帧(历史最优参考帧类型)。
若为是,则执行S707,若为否,则执行S708。
S707、将GLOBALMV模式的最优参考帧加入到初始化模板。
S708、结束。
至此,就完成了S50312的过程,进入到S50313的过程中。
S50313、矫正补充。
经过S50311和S50312之后,可能存在mask_init为0的情况(初始模板为空),此时,可将LAST_FRAME、BWDREF_FRAME、ALTREF_FRAME(至少一个预设参考帧类型)对应的值,写入到mask_init中(例如,可以通过将mask_init,向左移动LAST_FRAME位,来实现在mask_init中记录LAST_FRAME的值),从而实现将这些帧加入到初始化模板中。
需要说明的是,mask_init为0,可能是因为没有收集到参考帧信息,且其他单参考帧模式均没有选出最优模式,此时,需要强制加入LAST_FRAME、BWDREF_FRAME、ALTREF_FRAME。
至此,就完成了初始化模板的生成过程,需要进入S5032,即生成主模板的过程。
S5032、生成主模板。
记做mask_main,初始化为0,根据SAD生成。
将表1中的7个参考帧(全量候选参考帧类型)分为前向参考帧、后向参考帧和长期参考帧三种类型,前向参考帧包括LAST_FRAME、LAST2_FRAME和LAST3_FRAME(这些即为候选前向参考帧类型);后向参考帧包括:BWDREF_FRAME、ALTREF2_FRAME和ALTREF_FRAME(这些 即为候选后向参考帧类型);长期参考帧包括GOLDEN_FRAME(即为候选长期参考帧类型)。然后找出每个类型SAD最小的参考帧。
具体流程如下:
Step1:寻找前向参考帧(前向参考帧类型)。
比较LAST_FRAME、LAST2_FRAME和LAST3_FRAME对应的SAD,找到最小SAD且不为INT32_MAX的参考帧,记做ref_list0,并记录下前向最小SAD,记做sad_list0。
Step2:寻找后向参考帧(后向参考帧类型)。
比较BWDREF_FRAME、ALTREF2_FRAME和ALTREF_FRAME对应下的SAD,找到最小SAD且不为INT32_MAX的参考帧,记做ref_list1,并记录下后向最小SAD,记做sad_list1。
Step3:寻找长期参考帧(长期参考帧类型)。
如果参考帧GOLDEN_FRAME对应的SAD不等于INT32_MAX,且小于sad_list0+sad_list1,那么长期参考帧就为GOLDEN_FRAME。
在上述过程中,如果当前参考帧可能没有一个满足的MV,那么这样的当前参考帧对应的SAD则为INT32_MAX,说明该参考帧不重要,可以直接跳过。
至此,就完成了主模板的生成过程,需要进入S5033,即生成少增强模板的过程。
S5033、生成增强模板。
记做mask_add,初始化为0,与帧类型权重和收集到的inter块个数有关。
首先,结果当前帧类型权重,生成阈值thr。其中,当前帧类型在预测前已经确定,因而,当前帧类型权重也可以确定出来,记做slice_level,然后,可以按照式(3)来生成阈值。
若是inter_total_num<=thr,则将LAST_FRAME、BWDREF_FRAME、ALTREF_FRAME的值写入到mask_add中(通过对mask_add左移完成)。
至此,就完成了NEWMV模式的参考帧模板的生成,需要进入参考帧模板应用的过程中。
S504、参考帧模板应用。
循环所有参考帧,并比较当前参考帧与参考帧模板是否一致,比如参考帧模板mask_newmv的比特位第1和4位均为1,若当前参考帧为1或4,则当前参考帧可以用来预测,否咋,继续判断下一个参考帧。图16是本申请实施例提供的NEWMV模式的参考帧模板应用的过程示意图,参见图16,该过程可以包括:
S801、开始。
S802、获取当前参考帧的序号。
S803、判断当前参考帧的序号是否小于等于7。
若为是,执行S804,若为否,执行S807。
S804、判断当前参考帧与是否与参考帧模板一致。
若为是,执行S805,若为否,执行S806。
S805、预测当前参考帧。
S806、当前参考帧的序号自增1,并重新进入到S802的过程中。
S807、结束。
通过上述方式,编码器不引入新的计算,充分考虑了NEWMV模式的特性,直接利用收集到的信息生成参考帧模板,相比于相关技术中的编码速度,在编码65帧视频帧,能够加速15%,加速比非常高。并且,所获得的参考帧模板具有很高的自适应能力,而且,在生成过程中不需要剔除参考帧,从而码流的质量也会得到保证。
下面继续说明本申请实施例提供的帧间预测装置455的实施为软件模块的示例性结构,在一些实施例中,如图6所示,存储在存储器450的帧间预测装置455中的软件模块可以包括:
模式确定模块4551,配置为在当前预测块的当前预测模式为预设预测模式时,确定出所述当前预测块对应的历史预测模式;所述历史预测模式为在所述预设预测模式之前完成预测的预测模式;
信息获取模块4552,配置为获取所述当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及所述当前预测块在所述历史预测模式时的历史最优参考帧类型;所述子预测块是利用当前的子块划分类型之前的子块划分类型,对所述当前预测块分块得到的;
模板生成模块4553,配置为基于所述历史最优参考帧类型、所述相邻块信息、所述子预测块信息,以及所述当前预测块对应的帧类型,生成参考帧模板;
信息预测模块4554,配置为利用所述参考帧模板,确定出所述预设预测模式的参考帧,利用所述参考帧对所述当前预测块进行帧间预测,得到所述当前预测块对应的预测值。
在本申请的一些实施例中,所述相邻块信息包括:所述相邻块的运动向量、所述相邻块的参考帧类型、所述相邻块的数量;所述子预测块信息包括:所述子预测块的参考帧类型、所述子预测块的数量;
所述模板生成模块4553,还配置为基于所述历史最优参考帧类型、所述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出所述当前预测块对应的初始模板;根据所述相邻块的运动向量,生成所述当前预测块对应的主模板;利用所述当前预测块的帧类型、所述相邻块的数量和所述子预测块的数量,确定出所述当前预测块对应的增强模板;利用所述初始模板、所述主模板和所述增强模板,生成所述当前预测块对应的所述参考帧模板。
在本申请的一些实施例中,所述模板生成模块4553,还配置为依据所 述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出第一初始模板;将所述历史最优参考帧类型,作为第二初始模板;利用所述第一初始模板和所述第二初始模板,确定出所述当前预测块对应的所述初始模板。
在本申请的一些实施例中,所述模板生成模块4553,还配置为利用所述相邻块的参考帧类型和所述子预测块的参考帧类型,确定出至少一个历史选中参考帧类型;统计出所述至少一个历史选中参考帧类型中的每个历史选中参考帧类型的选中次数;利用所述选中次数,从所述每个历史选中参考帧类型中,筛选出第一初始模板。
在本申请的一些实施例中,所述模板生成模块4553,还配置为从所述每个历史选中参考帧类型的选中次数中,筛选出最大的选中次数;将所述每个历史选中参考帧类型的选中次数进行放大,得到放大后的选中次数;将所述放大后的选中次数与所述最大的选中次数进行比较,得到所述每个历史选中参考帧类型对应的比较结果;所述比较结果表征所述放大后的选中次数是否大于等于所述最大的选中次数;将所述比较结果表征所述放大后的选中次数大于等于所述最大的选中次数的历史选中参考帧类型,确定为所述第一初始模板。
在本申请的一些实施例中,所述初始模板为空;所述模板生成模块4553,还配置为在所述初始模板中,添加至少一个预设参考帧类型,得到校正后的初始模板;
所述模板生成模块4553,还配置为利用所述校正后的初始模板、所述主模板和所述增强模板,生成所述当前预测块对应的所述参考帧模板。
在本申请的一些实施例中,所述相邻块信息包括:所述相邻块的参考帧类型,所述子预测块信息包括:所述子预测块的参考帧类型;所述模板生成模块4553,还配置为对所述相邻块的最优预测模式进行判断,得到第一判断结果,对所述子预测块的最优预测模式进行判断,得到第二判断结果;所述第一判断结果表征所述相邻块的最优预测模式是否为第一预设模式,所述第二判断结果表征所述子预测块的最优预测模式是否为第二预设模式;当所述第一判断结果表征所述相邻块的最优预测模式为所述第一预设模式时,获取所述相邻块的参考帧类型;当所述第二判断结果表征所述子预测块的最优预测模式为所述第二预设模式时,获取所述子预测块的参考帧类型。
在本申请的一些实施例中,所述模板生成模块4553,还配置为利用所述相邻块的运动向量,为全量候选参考帧类型中的每个候选参考帧类型计算出选择参数;所述全量候选参考帧类型表征帧间预测时所有可用的候选参考帧类型,所述选择参数表征了所述相邻块的输入值和预测值的差异;将所述全量候选参考帧类型分别划分为候选前向参考帧类型、候选后向参考帧类型和候选长期参考帧类型;利用所述候选前向参考帧类型对应的选择参数、所述候选后向参考帧类型的选择参数,以及所述候选长期参考帧 类型对应的选择参数,挑选出前向参考帧类型、后向参考帧类型和长期参考帧类型;利用所述前向参考帧类型、所述后向参考帧类型和所述长期参考帧类型,整合出所述当前预测块对应的所述主模板。
在本申请的一些实施例中,所述模板生成模块4553,还配置为将所述候选前向参考帧类型中选择参数最小的候选参考帧类型,作为所述前向参考帧类型;将所述候选后向参考帧类型中选择参数最小的候选参考帧类型,作为所述后向参考帧类型;将所述候选长期参考帧类型中,选择参数小于所述前向参考帧类型对应的选择参数和后向参考帧类型对应的选择参数之和的候选参考帧类型,作为所述长期参考帧类型。
在本申请的一些实施例中,所述模板生成模块4553,还配置为将所述全量候选参考帧类型中的所述每个候选参考帧类型与所述相邻块的运动向量进行匹配,得到匹配结果;所述匹配结果表征所述每个候选参考帧类型是否存在匹配的运动向量;当所述匹配结果表征所述每个候选参考帧类型不存在匹配的运动向量时,将预设好的数值,作为所述每个候选参考帧类型的选择参数;当所述匹配结果表征所述每个候选参考帧类型存在匹配的运动向量时,利用所述相邻块在基于所述相邻块的运动向量进行预测时的预测值和输入值,计算出所述每个候选参考帧类型的选择参数。
在本申请的一些实施例中,所述模板生成模块4553,还配置为对所述多个子向量中的每个子向量所对应的子预测值的像素,与所述相邻块的输入值的像素求差值,得到所述每个子向量对应的像素差值;将所述每个子向量对应的像素差值的绝对值进行累加,得到所述每个子向量对应的临时选择参数;将所述每个子向量的临时选择参数中最小的临时选择参数,作为所述每个候选参考帧类型对应的选择参数。
在本申请的一些实施例中,所述模板生成模块4553,还配置为根据所述当前预测块的帧类型和预设好的帧类型权重对应关系,确定出所述当前预测块对应的帧类型权重;依据所述帧类型权重,生成增强阈值;对所述相邻块的数量和所述子预测块的数量进行求和,得到和值结果;当所述和值结果小于等于所述增强阈值时,将至少一个预设参考帧类型,作为所述当前预测块对应的增强模板。
需要说明的是,本申请实施例提供的帧间预测装置的描述,与本申请实施例提供的帧间预测方法的描述是类似的,具有相似的有益效果。
本申请实施例提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。计算机设备(用于帧间预测的电子设备)的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该计算机设备执行本申请实施例上述的帧间预测方法。
本申请实施例提供一种存储有可执行指令的计算机可读存储介质,其中存储有可执行指令,当可执行指令被处理器执行时,将引起处理器执行 本申请实施例提供的帧间预测方法,例如,如图7示出的方法。
在一些实施例中,计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、闪存、磁表面存储器、光盘、或CD-ROM等存储器;也可以是包括上述存储器之一或任意组合的各种设备。
在一些实施例中,可执行指令可以采用程序、软件、软件模块、脚本或代码的形式,按任意形式的编程语言(包括编译或解释语言,或者声明性或过程性语言)来编写,并且其可按任意形式部署,包括被部署为独立的程序或者被部署为模块、组件、子例程或者适合在计算环境中使用的其它单元。
作为示例,可执行指令可以但不一定对应于文件系统中的文件,可以可被存储在保存其它程序或数据的文件的一部分,例如,存储在超文本标记语言(HTML,Hyper Text Markup Language)文档中的一个或多个脚本中,存储在专用于所讨论的程序的单个文件中,或者,存储在多个协同文件(例如,存储一个或多个模块、子程序或代码部分的文件)中。
作为示例,可执行指令可被部署为在一个计算设备上执行,或者在位于一个地点的多个计算设备上执行,又或者,在分布在多个地点且通过通信网络互连的多个计算设备上执行。
以上所述,仅为本申请的实施例而已,并非用于限定本申请的保护范围。凡在本申请的精神和范围之内所作的任何修改、等同替换和改进等,均包含在本申请的保护范围之内。

Claims (16)

  1. 一种帧间预测方法,所述方法由电子设备执行,包括:
    在当前预测块的当前预测模式为预设预测模式时,确定出所述当前预测块对应的历史预测模式;所述历史预测模式为在所述预设预测模式之前完成预测的预测模式;
    获取所述当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及所述当前预测块在所述历史预测模式时的历史最优参考帧类型;所述子预测块是利用当前的子块划分类型之前的子块划分类型,对所述当前预测块分块得到的;
    基于所述历史最优参考帧类型、所述相邻块信息、所述子预测块信息,以及所述当前预测块对应的帧类型,生成参考帧模板;
    利用所述参考帧模板,确定出所述预设预测模式的参考帧,利用所述参考帧对所述当前预测块进行帧间预测,得到所述当前预测块对应的预测值。
  2. 根据权利要求1所述的方法,其中,所述相邻块信息包括:所述相邻块的运动向量、所述相邻块的参考帧类型、所述相邻块的数量;所述子预测块信息包括:所述子预测块的参考帧类型、所述子预测块的数量;
    所述基于所述历史最优参考帧类型、所述相邻块信息、所述子预测块信息,以及所述当前预测块对应的帧类型,生成参考帧模板,包括:
    基于所述历史最优参考帧类型、所述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出所述当前预测块对应的初始模板;
    根据所述相邻块的运动向量,生成所述当前预测块对应的主模板;
    利用所述当前预测块的帧类型、所述相邻块的数量和所述子预测块的数量,确定出所述当前预测块对应的增强模板;
    利用所述初始模板、所述主模板和所述增强模板,生成所述当前预测块对应的所述参考帧模板。
  3. 根据权利要求2所述的方法,其中,所述基于所述历史最优参考帧类型、所述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出所述当前预测块对应的初始模板,包括:
    依据所述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出第一初始模板;
    将所述历史最优参考帧类型,作为第二初始模板;
    利用所述第一初始模板和所述第二初始模板,确定出所述当前预测块对应的所述初始模板。
  4. 根据权利要求3所述的方法,其中,所述依据所述子预测块的参考帧类型和所述相邻块的参考帧类型,确定出第一初始模板,包括:
    利用所述相邻块的参考帧类型和所述子预测块的参考帧类型,确定出 至少一个历史选中参考帧类型;
    统计出所述至少一个历史选中参考帧类型中的每个历史选中参考帧类型的选中次数;
    利用所述选中次数,从所述每个历史选中参考帧类型中,筛选出第一初始模板。
  5. 根据权利要求4所述的方法,其中,所述利用所述选中次数,从所述每个历史选中参考帧类型中,筛选出第一初始模板,包括:
    从所述每个历史选中参考帧类型的选中次数中,筛选出最大的选中次数;
    将所述每个历史选中参考帧类型的选中次数进行放大,得到放大后的选中次数;
    将所述放大后的选中次数与所述最大的选中次数进行比较,得到所述每个历史选中参考帧类型对应的比较结果;所述比较结果表征所述放大后的选中次数是否大于等于所述最大的选中次数;
    将所述比较结果表征所述放大后的选中次数大于等于所述最大的选中次数的历史选中参考帧类型,确定为所述第一初始模板。
  6. 根据权利要求3至5任一项所述的方法,其中,所述初始模板为空;在所述利用所述第一初始模板和所述第二初始模板,确定出所述当前预测块对应的所述初始模板之后,所述方法还包括:
    在所述初始模板中,添加至少一个预设参考帧类型,得到校正后的初始模板;
    所述利用所述初始模板、所述主模板和所述增强模板,生成所述当前预测块对应的所述参考帧模板,包括:
    利用所述校正后的初始模板、所述主模板和所述增强模板,生成所述当前预测块对应的所述参考帧模板。
  7. 根据权利要求1所述的方法,其中,所述相邻块信息包括:所述相邻块的参考帧类型,所述子预测块信息包括:所述子预测块的参考帧类型;所述获取所述当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,包括:
    对所述相邻块的最优预测模式进行判断,得到第一判断结果,对所述子预测块的最优预测模式进行判断,得到第二判断结果;所述第一判断结果表征所述相邻块的最优预测模式是否为第一预设模式,所述第二判断结果表征所述子预测块的最优预测模式是否为第二预设模式;
    当所述第一判断结果表征所述相邻块的最优预测模式为所述第一预设模式时,获取所述相邻块的参考帧类型;
    当所述第二判断结果表征所述子预测块的最优预测模式为所述第二预设模式时,获取所述子预测块的参考帧类型。
  8. 根据权利要求2所述的方法,其中,所述根据所述相邻块的运动向 量,生成所述当前预测块对应的主模板,包括:
    利用所述相邻块的运动向量,为全量候选参考帧类型中的每个候选参考帧类型计算出选择参数;所述全量候选参考帧类型表征帧间预测时所有可用的候选参考帧类型,所述选择参数表征了所述相邻块的输入值和预测值的差异;
    将所述全量候选参考帧类型分别划分为候选前向参考帧类型、候选后向参考帧类型和候选长期参考帧类型;
    利用所述候选前向参考帧类型对应的选择参数、所述候选后向参考帧类型的选择参数,以及所述候选长期参考帧类型对应的选择参数,挑选出前向参考帧类型、后向参考帧类型和长期参考帧类型;
    利用所述前向参考帧类型、所述后向参考帧类型和所述长期参考帧类型,整合出所述当前预测块对应的所述主模板。
  9. 根据权利要求8所述的方法,其中,所述利用所述候选前向参考帧类型对应的选择参数、所述候选后向参考帧类型的选择参数,以及所述候选长期参考帧类型对应的选择参数,挑选出前向参考帧类型、后向参考帧类型和长期参考帧类型,包括:
    将所述候选前向参考帧类型中选择参数最小的候选参考帧类型,作为所述前向参考帧类型;
    将所述候选后向参考帧类型中选择参数最小的候选参考帧类型,作为所述后向参考帧类型;
    将所述候选长期参考帧类型中,选择参数小于所述前向参考帧类型对应的选择参数和后向参考帧类型对应的选择参数之和的候选参考帧类型,作为所述长期参考帧类型。
  10. 根据权利要求8所述的方法,其中,所述利用所述相邻块的运动向量,为全量候选参考帧类型中的每个候选参考帧类型计算出选择参数,包括:
    将所述全量候选参考帧类型中的所述每个候选参考帧类型与所述相邻块的运动向量进行匹配,得到匹配结果;所述匹配结果表征所述每个候选参考帧类型是否存在匹配的运动向量;
    当所述匹配结果表征所述每个候选参考帧类型不存在匹配的运动向量时,将预设好的数值,作为所述每个候选参考帧类型的选择参数;
    当所述匹配结果表征所述每个候选参考帧类型存在匹配的运动向量时,利用所述相邻块在基于所述相邻块的运动向量进行预测时的预测值和输入值,计算出所述每个候选参考帧类型的选择参数。
  11. 根据权利要求10所述的方法,其中,所述匹配的运动向量包括多个子向量,所述预测值包括多个子预测值,所述多个子向量和多个子预测值相互对应;所述利用所述相邻块在基于所述相邻块的运动向量进行预测时的预测值和输入值,计算出所述每个候选参考帧类型的选择参数,包括:
    对所述多个子向量中的每个子向量所对应的子预测值的像素,与所述相邻块的输入值的像素求差值,得到所述每个子向量对应的像素差值;
    将所述每个子向量对应的像素差值的绝对值进行累加,得到所述每个子向量对应的临时选择参数;
    将所述每个子向量的临时选择参数中最小的临时选择参数,作为所述每个候选参考帧类型对应的选择参数。
  12. 根据权利要求2所述的方法,其中,所述利用所述当前预测块的帧类型、所述相邻块的数量和所述子预测块的数量,确定出所述当前预测块对应的增强模板,包括:
    根据所述当前预测块的帧类型和预设好的帧类型权重对应关系,确定出所述当前预测块对应的帧类型权重;
    依据所述帧类型权重,生成增强阈值;
    对所述相邻块的数量和所述子预测块的数量进行求和,得到和值结果;
    当所述和值结果小于等于所述增强阈值时,将至少一个预设参考帧类型,作为所述当前预测块对应的增强模板。
  13. 一种帧间预测装置,包括:
    模式确定模块,配置为在当前预测块的当前预测模式为预设预测模式时,确定出所述当前预测块对应的历史预测模式;所述历史预测模式为在所述预设预测模式之前完成预测的预测模式;
    信息获取模块,配置为获取所述当前预测块的相邻块的相邻块信息、子预测块的子预测块信息,以及所述当前预测块在所述历史预测模式时的历史最优参考帧类型;所述子预测块是利用当前的子块划分类型之前的子块划分类型,对所述当前预测块分块得到的;
    模板生成模块,配置为基于所述历史最优参考帧类型、所述相邻块信息、所述子预测块信息,以及所述当前预测块对应的帧类型,生成参考帧模板;
    信息预测模块,配置为利用所述参考帧模板,确定出所述预设预测模式的参考帧,利用所述参考帧对所述当前预测块进行帧间预测,得到所述当前预测块对应的预测值。
  14. 一种用于帧间预测的电子设备,包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至12任一项所述的帧间预测方法。
  15. 一种计算机可读存储介质,存储有可执行指令,用于被处理器执行时,实现权利要求1至12任一项所述的帧间预测方法。
  16. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至12任一项所述的帧间预测方法。
PCT/CN2021/139051 2020-12-31 2021-12-17 一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品 WO2022143215A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP21913956.5A EP4246970A4 (en) 2020-12-31 2021-12-17 INTER-FRAME PREDICTION METHOD AND APPARATUS, ELECTRONIC DEVICE, COMPUTER-READABLE STORAGE MEDIUM, AND COMPUTER PROGRAM PRODUCT
JP2023518518A JP2023543200A (ja) 2020-12-31 2021-12-17 インター予測方法及び装置、電子機器並びにコンピュータプログラム
US18/079,216 US20230107111A1 (en) 2020-12-31 2022-12-12 Inter prediction method and apparatus, electronic device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011629460.2 2020-12-31
CN202011629460.2A CN112312131B (zh) 2020-12-31 2020-12-31 一种帧间预测方法、装置、设备及计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/079,216 Continuation US20230107111A1 (en) 2020-12-31 2022-12-12 Inter prediction method and apparatus, electronic device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2022143215A1 true WO2022143215A1 (zh) 2022-07-07

Family

ID=74487670

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/139051 WO2022143215A1 (zh) 2020-12-31 2021-12-17 一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品

Country Status (5)

Country Link
US (1) US20230107111A1 (zh)
EP (1) EP4246970A4 (zh)
JP (1) JP2023543200A (zh)
CN (1) CN112312131B (zh)
WO (1) WO2022143215A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112312131B (zh) * 2020-12-31 2021-04-06 腾讯科技(深圳)有限公司 一种帧间预测方法、装置、设备及计算机可读存储介质
CN116684610A (zh) * 2023-05-17 2023-09-01 北京百度网讯科技有限公司 确定长期参考帧的参考状态的方法、装置及电子设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905702A (zh) * 2017-12-11 2019-06-18 腾讯科技(深圳)有限公司 一种视频编码中参考信息确定的方法、装置及存储介质
CN110662074A (zh) * 2018-06-28 2020-01-07 杭州海康威视数字技术股份有限公司 一种运动矢量确定方法和设备
US20200314446A1 (en) * 2018-01-16 2020-10-01 Samsung Electronics Co., Ltd. Method and device for video decoding, and method and device for video encoding
CN111818342A (zh) * 2020-08-28 2020-10-23 浙江大华技术股份有限公司 帧间预测方法及预测装置
CN112312131A (zh) * 2020-12-31 2021-02-02 腾讯科技(深圳)有限公司 一种帧间预测方法、装置、设备及计算机可读存储介质

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1312927C (zh) * 2002-07-15 2007-04-25 株式会社日立制作所 动态图像编码方法及解码方法
CN101820547A (zh) * 2009-02-27 2010-09-01 源见科技(苏州)有限公司 帧间模式选择方法
CN102843554A (zh) * 2011-06-21 2012-12-26 乐金电子(中国)研究开发中心有限公司 帧间图像预测编解码方法及视频编解码器
CN103581685B (zh) * 2013-10-09 2015-06-10 合一网络技术(北京)有限公司 H264参考帧选择方法及其装置
KR101789954B1 (ko) * 2013-12-27 2017-10-25 인텔 코포레이션 차세대 비디오 코딩을 위한 콘텐츠 적응적 이득 보상된 예측
CN103813166B (zh) * 2014-01-28 2017-01-25 浙江大学 一种低复杂度的hevc编码多参考帧的选择方法
CN104038768B (zh) * 2014-04-30 2017-07-18 中国科学技术大学 一种场编码模式的多参考场快速运动估计方法及系统
CN106034236B (zh) * 2015-03-19 2019-07-19 阿里巴巴集团控股有限公司 一种hevc编码最佳参考帧的选择方法、装置及编码器
US11750832B2 (en) * 2017-11-02 2023-09-05 Hfi Innovation Inc. Method and apparatus for video coding
KR102075208B1 (ko) * 2017-12-14 2020-02-10 전자부품연구원 참조 프레임을 적응적으로 제한하는 영상 부호화 방법 및 장치
CN117319650A (zh) * 2018-08-28 2023-12-29 华为技术有限公司 编码方法、解码方法以及编码装置、解码装置
CN111263151B (zh) * 2020-04-26 2020-08-25 腾讯科技(深圳)有限公司 视频编码方法、装置、电子设备和计算机可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109905702A (zh) * 2017-12-11 2019-06-18 腾讯科技(深圳)有限公司 一种视频编码中参考信息确定的方法、装置及存储介质
US20200314446A1 (en) * 2018-01-16 2020-10-01 Samsung Electronics Co., Ltd. Method and device for video decoding, and method and device for video encoding
CN110662074A (zh) * 2018-06-28 2020-01-07 杭州海康威视数字技术股份有限公司 一种运动矢量确定方法和设备
CN111818342A (zh) * 2020-08-28 2020-10-23 浙江大华技术股份有限公司 帧间预测方法及预测装置
CN112312131A (zh) * 2020-12-31 2021-02-02 腾讯科技(深圳)有限公司 一种帧间预测方法、装置、设备及计算机可读存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4246970A4

Also Published As

Publication number Publication date
EP4246970A1 (en) 2023-09-20
CN112312131A (zh) 2021-02-02
US20230107111A1 (en) 2023-04-06
CN112312131B (zh) 2021-04-06
EP4246970A4 (en) 2024-04-24
JP2023543200A (ja) 2023-10-13

Similar Documents

Publication Publication Date Title
US10659803B2 (en) Picture prediction method and related apparatus
KR102252816B1 (ko) 부호화유닛 심도 확정 방법 및 장치
CN110178373B (zh) 用于训练分类器以及用于编码和解码视频帧的方法和装置
US11943457B2 (en) Information processing apparatus and method
WO2022143215A1 (zh) 一种帧间预测方法、装置、电子设备、计算机可读存储介质及计算机程序产品
US20190335194A1 (en) Image predictive encoding and decoding system
CN102113328B (zh) 确定用于比较运动补偿视频编码中的图像块的度量的方法和系统
US20080205515A1 (en) Video encoding with reduced complexity
US11323700B2 (en) Encoding video using two-stage intra search
TWI806199B (zh) 特徵圖資訊的指示方法,設備以及電腦程式
EP4207766A1 (en) Entropy encoding/decoding method and device
EP3675496A1 (en) Method and device for determining motion vector of affine coding block
WO2020058957A1 (en) General applications related to affine motion
Bhat et al. A case study of machine learning classifiers for real-time adaptive resolution prediction in video coding
US20150350670A1 (en) Coding apparatus, computer system, coding method, and computer product
CN117528069A (zh) 位移矢量预测方法、装置及设备
WO2022111233A1 (zh) 帧内预测模式的译码方法和装置
US20240089494A1 (en) Video encoding and decoding method and apparatus, storage medium, electronic device, and computer program product
CN115495677B (zh) 视频的时空定位方法和存储介质
WO2019150411A1 (ja) 映像符号化装置、映像符号化方法、映像復号装置、映像復号方法、及び映像符号化システム
Menon et al. Efficient Multi-Encoding Algorithms for HTTP Adaptive Bitrate Streaming
Mercat et al. Machine learning based choice of characteristics for the one-shot determination of the HEVC intra coding tree
CN113453020B (zh) 位元率控制方法与视频处理装置
CN112565789B (zh) 视频解码及编码方法、装置、计算机可读介质及电子设备
CN115243042A (zh) 一种量化参数确定方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21913956

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2023518518

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2021913956

Country of ref document: EP

Effective date: 20230615

NENP Non-entry into the national phase

Ref country code: DE