WO2024109816A1 - 视频数据处理方法及装置、显示装置和存储介质 - Google Patents

视频数据处理方法及装置、显示装置和存储介质 Download PDF

Info

Publication number
WO2024109816A1
WO2024109816A1 PCT/CN2023/133303 CN2023133303W WO2024109816A1 WO 2024109816 A1 WO2024109816 A1 WO 2024109816A1 CN 2023133303 W CN2023133303 W CN 2023133303W WO 2024109816 A1 WO2024109816 A1 WO 2024109816A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
area
current video
motion vector
display
Prior art date
Application number
PCT/CN2023/133303
Other languages
English (en)
French (fr)
Inventor
崔腾鹤
舒龙
张乾
Original Assignee
京东方科技集团股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 京东方科技集团股份有限公司 filed Critical 京东方科技集团股份有限公司
Publication of WO2024109816A1 publication Critical patent/WO2024109816A1/zh

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors

Definitions

  • Embodiments of the present disclosure relate to a video data processing method, a video data processing device, a display device, and a computer-readable storage medium.
  • Digital video capabilities may be incorporated into a wide variety of devices, including digital televisions, digital live broadcast systems, wireless broadcast systems, laptop or desktop computers, tablet computers, e-readers, digital cameras, digital recording devices, digital media players, video gaming devices, video game consoles, smart phones, video teleconferencing devices, and video streaming devices.
  • Digital video devices may implement video codec technologies, such as those described in the standards defined by MPEG-2, MPEG-4, ITU-T H.263, ITU-T H.264/MPEG-4, Part 10, Advanced Video Codec (AVC), High Efficiency Video Codec (HEVC), ITU-T H.265/High Efficiency Video Codec, and extensions of such standards.
  • video devices may more efficiently send, receive, encode, decode, and/or store digital video information.
  • At least one embodiment of the present disclosure provides a method for processing video data.
  • the method comprises: for a current video block of a video, determining to use a first inter-frame prediction mode for encoding and decoding, and based on the determination, performing conversion between the current video block and a bit stream of the video.
  • the first inter-frame prediction mode derivation of a motion vector of the current video block is based on a base area in the video corresponding to a first display mode.
  • the first display area along the expansion direction from the video expansion start position defined by the first display mode is used as the basic area.
  • the motion vector in response to the current video block being located in the first display area, is within a first motion vector prediction range.
  • the first motion vector prediction range is determined based on the position of the current video block, the motion vector prediction accuracy, and the boundary of the first display area.
  • the current video frame includes the first display area and at least one display sub-area sequentially arranged adjacent to each other along the expansion direction, and the expansion direction is from left to right,
  • the motion vector is within a second motion vector prediction range.
  • the second motion vector prediction range is determined based on the position of the current video block, the motion vector prediction accuracy, the boundary of the first display area, and the width of the first display sub-area.
  • the first right side boundary of the first motion vector prediction range is different from the second right side boundary of the second motion vector prediction range.
  • the current video frame includes the first display area and at least one display sub-area sequentially arranged adjacent to each other along the expansion direction, and the expansion direction is from top to bottom,
  • the motion vector is within a third motion vector prediction range.
  • the third motion vector prediction range is determined based on the position of the current video block, the motion vector prediction accuracy, the boundary of the first display area, and the height of the second display sub-area.
  • the third motion vector prediction range is equal to the first motion vector prediction range.
  • the first lower boundary of the first motion vector prediction range is different from the third lower boundary of the third motion vector prediction range.
  • the temporal candidate motion vector prediction value in the motion vector prediction candidate list of the current video block is calculated based on the spatial candidate motion vector prediction value.
  • all reference pixels used by the current video block are within the basic region.
  • the first inter-frame prediction mode includes a Merge prediction mode, an advanced motion vector prediction AMVP mode, a Merge mode with motion vector difference, a bidirectional weighted prediction mode or an affine prediction mode.
  • At least one embodiment of the present disclosure further provides a method for processing video data.
  • the method for processing video data includes: receiving a bit stream of a video; determining that a current video block of the video is encoded and decoded using a first inter-frame prediction mode; and decoding the bit stream based on the determination.
  • the derivation of the motion vector of the current video block is based on a base area in the video corresponding to a first display mode.
  • decoding the bit stream includes: determining a to-be-decoded area of a current video frame of the video, the to-be-decoded area including at least a first display area corresponding to the basic area.
  • determining the area to be decoded includes: determining the area to be decoded based on at least one of the number of pixels and the number of coding units to be displayed in the current video frame, the number of coding units in the first display area, the number of displayed pixels of the previous video frame, and the number of coding units in the decoded area of the previous video frame.
  • determining the area to be decoded includes: in response to the number of coding units to be displayed in the current video frame being greater than the number of coding units in the decoded area of the previous video frame, or in response to the number of coding units to be displayed in the current video frame being equal to the number of coding units in the decoded area of the previous video frame, and the number of pixels to be displayed in the current video frame being greater than the number of pixels displayed in the previous video frame, determining that the area to be decoded includes the decoded area of the previous video frame and a new display sub-area.
  • determining the area to be decoded includes: in response to the number of coding units to be displayed in the current video frame being greater than the number of coding units in the first display area, and the number of coding units to be displayed in the current video frame being less than the number of coding units in the decoded area of the previous video frame, or in response to the number of coding units to be displayed in the current video frame being greater than the number of coding units in the first display area, the number of coding units to be displayed in the current video frame is equal to the number of coding units in the decoded area of the previous video frame, and the number of pixels to be displayed in the current video frame is not greater than the number of displayed pixels in the previous video frame, determining that the area to be decoded in the current video frame includes the area to be displayed in the current video frame.
  • At least one embodiment of the present disclosure further provides a video data processing device, including a determination module and an execution module.
  • the determination module is configured to determine to use a first inter-frame prediction mode for encoding and decoding for a current video block of the video.
  • the execution module is configured to perform conversion between the current video block and a bit stream of the video based on the determination.
  • the derivation of the motion vector of the current video block is based on a base area in the video corresponding to a first display mode.
  • At least one embodiment of the present disclosure further provides a display device, including a video data processing device and a scrolling screen.
  • the video data processing device is configured to decode a received bit stream according to the method according to any one of claims 1 to 20, and send the decoded pixel values to the scrolling screen for display.
  • the video data processing device in response to the scrolling screen including a display area and a non-display area during operation, decodes the bit stream based on the size of the display area at the current moment and the previous frame moment.
  • the display device provided by at least one embodiment of the present disclosure further includes a curling state judgment device.
  • the curling state judgment device is configured to detect the size of the display area of the scrolling screen, and send the size of the display area to the video data processing device, so that the video data processing device decodes the bit stream based on the size of the display area at the current moment and the previous frame moment.
  • At least one embodiment of the present disclosure further provides a video data processing device, comprising: a processor and a memory including one or more computer program modules.
  • the one or more computer program modules are stored in the memory and configured to be executed by the processor, and the one or more computer program modules include instructions for executing the video data processing method provided in any of the above embodiments.
  • At least one embodiment of the present disclosure further provides a computer-readable storage medium on which computer instructions are stored.
  • the instructions are executed by a processor, the steps of the video data processing method provided in any of the above embodiments are implemented.
  • FIG1 is a schematic structural diagram of a scroll screen provided by at least one embodiment of the present disclosure.
  • FIG2 is a block diagram of an example video encoding and decoding system provided by at least one embodiment of the present disclosure
  • FIG3 is a block diagram of an example video encoder provided by at least one embodiment of the present disclosure.
  • FIG4 is a block diagram of an example video decoder provided by at least one embodiment of the present disclosure.
  • FIG5 is a schematic diagram of a coding structure of a full-frame intra-configuration provided by at least one embodiment of the present disclosure
  • FIG6 is a schematic diagram of a coding structure of a low-latency configuration provided by at least one embodiment of the present disclosure
  • FIG7A is a schematic diagram of inter-frame prediction coding provided by at least one embodiment of the present disclosure.
  • FIG7B is a schematic flow chart of an inter-frame prediction technology provided by at least one embodiment of the present disclosure.
  • FIG8A is a schematic diagram of affine motion compensation provided by at least one embodiment of the present disclosure.
  • FIG8B is a schematic diagram of another affine motion compensation provided by at least one embodiment of the present disclosure.
  • FIG9 is a schematic diagram of a video data processing method provided by at least one embodiment of the present disclosure.
  • FIG10 is a schematic diagram of a scrolling video codec provided by at least one embodiment of the present disclosure.
  • FIG11 is a schematic diagram of a method for dividing a current video frame provided by at least one embodiment of the present disclosure
  • FIG12A is a schematic diagram of a rotation axis direction of a scroll screen provided by at least one embodiment of the present disclosure
  • FIG12B is a schematic diagram of the rotation axis direction of another sliding screen provided by at least one embodiment of the present disclosure.
  • FIG13 is a schematic diagram of encoding where a current video block is located within a first display area, provided by at least one embodiment of the present disclosure
  • FIG14 is a schematic diagram of encoding a current video block located at a boundary of a first display area provided by at least one embodiment of the present disclosure
  • FIG15 is a schematic diagram of another video data processing method provided by at least one embodiment of the present disclosure.
  • FIG16 is a schematic block diagram of a video encoding and decoding system in a low-latency configuration according to at least one embodiment of the present disclosure
  • FIG17 is a schematic flow chart of a method for processing video data in a low-latency configuration according to at least one embodiment of the present disclosure
  • FIG18 is a schematic flow chart of a video data processing apparatus according to at least one embodiment of the present disclosure.
  • FIG19 is a schematic block diagram of another video data processing device according to at least one embodiment of the present disclosure.
  • FIG20 is a schematic block diagram of yet another video data processing device provided by at least one embodiment of the present disclosure.
  • FIG. 21 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure.
  • Video coding methods and techniques are ubiquitous in modern technology due to the increasing demand for high-resolution video.
  • Video codecs generally include electronic circuits or software that compress or decompress digital video and are constantly being improved to provide greater coding efficiency.
  • Video codecs convert uncompressed video into a compressed format and vice versa. There is a complex relationship between video quality, the amount of data used to represent the video (determined by the bit rate), the complexity of the encoding and decoding algorithms, sensitivity to data loss and errors, ease of editing, random access, and end-to-end latency (delay time).
  • the compression format typically conforms to a standard video compression specification, such as the High Efficiency Video Codec (HEVC) standard (also known as H.265), the yet-to-be-finalized Versatile Video Codec (VVC) standard (also known as H.266), or other current and/or future video codec standards.
  • HEVC High Efficiency Video Codec
  • VVC Versatile Video Codec
  • embodiments of the technology involved in the present disclosure can be applied to existing video codec standards (e.g., AVC, HEVC, and VVC) and future standards to improve compression performance.
  • AVC advanced video codec
  • HEVC High Efficiency Video Coding
  • VVC future standards to improve compression performance.
  • the description of the coding and decoding operations herein can refer to the existing video coding and decoding standards, and it is understood that the methods provided in the present disclosure are not limited to the described video coding and decoding standards.
  • FIG 1 is a structural schematic diagram of a sliding screen provided by at least one embodiment of the present disclosure.
  • the sliding screen generally includes a fully expanded state and a partially expanded state.
  • the unrolled portion of the sliding screen is regarded as the display area, and the rolled-up portion is regarded as the undisplayed area.
  • the sliding screen can be any type of display screen with a variable display area, including but not limited to the sliding screen structure shown in Figure 1.
  • the video picture of the rolled part of the sliding screen does not need to be displayed, but the part of the picture will still be decoded, resulting in a waste of decoding resources.
  • At least one embodiment of the present disclosure provides a video data processing method, the method comprising: for a current video block of a video, determining to use a first inter-frame prediction mode for encoding and decoding; based on the determination, performing conversion between the current video block and a bit stream of the video.
  • the first inter-frame prediction mode derivation of a motion vector of the current video block is based on a base area in the video corresponding to a first display mode.
  • At least one embodiment of the present disclosure further provides a video data processing device, a display device, and a computer-readable storage medium corresponding to the above-mentioned video data processing method.
  • the bit stream of the video can be partially decoded according to the display area actually displayed, thereby reducing the decoding resource consumption of the undisplayed part, effectively improving the efficiency of video encoding and decoding, and further improving the user's product experience.
  • the words used to describe the position of the adjacent blocks or reference pixels relative to the current video block such as “above”, “below”, “left”, “right”, etc., have the same meaning as defined in the video coding standards (e.g., AVC, HEVC and VVC).
  • AVC video coding standards
  • HEVC High Efficiency Video Coding
  • VVC video coding standards
  • “left” and “right” respectively refer to the two sides in the horizontal direction
  • “above” and “below” respectively refer to the two sides in the vertical direction.
  • At least one embodiment of the present disclosure provides a coding and decoding system. It is understandable that in the present disclosure, the coding end and the decoding end may be implemented using a codec with the same structure.
  • FIG. 2 is a block diagram illustrating an example video codec system 1000 that may perform some embodiments of the present disclosure.
  • the technology of the present disclosure generally relates to encoding and decoding (encoding and/or decoding) video data.
  • video data includes any data used to process video, and therefore, video data may include unencoded original video, encoded video, decoded (e.g., reconstructed) video, and video metadata such as grammatical data.
  • a video may include one or more pictures, or a picture sequence.
  • the system 1000 includes a source device 102 for providing encoded video data to be decoded by a destination device 116 for display, the encoded video data being used to form a bitstream for transmission to a decoding end, wherein the bitstream may also be referred to as a bitstream.
  • the source device 102 provides the encoded video data to the destination device 116 via a computer-readable medium 110.
  • the source device 102 and the destination device 116 may be implemented as a variety of devices, such as desktop computers, notebook (i.e., portable) computers, tablet computers, mobile devices, set-top boxes, smart phones, handheld phones, televisions, cameras, display devices, digital media players, video game consoles, video streaming devices, etc.
  • the source device 102 and the destination device 116 may be equipped for wireless communication, and therefore may also be referred to as wireless communication devices.
  • source device 102 comprises video source 104, memory 106, video encoder 200 and output interface 108.
  • Destination device 116 comprises input interface 122, video decoder 300, memory 120 and display device 118.
  • the video encoder 200 of source device 102 and the video decoder 300 of destination device 116 can be configured to be used for implementing the coding method and decoding method according to some embodiments of the present disclosure. Therefore, source device 102 represents an example of video coding device, and destination device 116 represents an example of video decoding device.
  • source device 102 and destination device 116 may include other components or configurations.
  • source device 102 can receive video data from external video sources such as external cameras.
  • destination device 116 can be connected to an external display device without built-in integrated display device 118.
  • the system 1000 shown in FIG. 2 is only an example.
  • any digital video encoding and/or decoding device can perform the encoding method and decoding method according to some embodiments of the present disclosure.
  • the source device 102 and the destination device 116 are only examples of such codec devices, where the source device 102 generates a bit stream for transmission to the destination device 116.
  • the present disclosure refers to a "codec" device as a device that performs data encoding and decoding (encoding and/or decoding). Therefore, the video encoder 200 and the video decoder 300 represent examples of codec devices, respectively.
  • devices 102 and 116 operate in a substantially symmetrical manner, such that both devices 102 and 116 include video encoding and decoding components, i.e., both devices 102 and 116 can implement video encoding and decoding processes. Therefore, system 1000 can support one-way or two-way video transmission between video devices 102 and 116, such as for video streaming, video playback, video broadcasting, or video telephony communication.
  • the video source 104 represents a source of video data (i.e., raw video data that has not been encoded) and provides a continuous series of pictures (also referred to as "frames") of video data to the video encoder 200, which encodes the data of the pictures.
  • the video source 104 of the source device 102 may include a video capture device, such as a video camera, a video archive containing previously captured raw video, and/or a video feed interface for receiving video from a video content provider.
  • the video source 104 may generate computer graphics-based data as a source video or a combination of live video, archived video, and computer-generated video.
  • the video encoder 200 encodes the captured, pre-captured, or computer-generated video data.
  • the video encoder 200 may rearrange the pictures from the order in which they were received (sometimes referred to as "display order") into an encoding order for encoding.
  • the video encoder 200 may generate a bitstream comprising encoded video data.
  • the source device 102 may then output the generated bitstream onto the computer-readable medium 110 via the output interface 108 for receipt and/or retrieval by, for example, the input interface 122 of the destination device 116 .
  • the memory 106 of the source device 102 and the memory 120 of the destination device 116 represent general purpose memories.
  • the memory 106 and the memory 120 may store raw video data, such as raw video data from the video source 104 and decoded video data from the video decoder 300.
  • the memory 106 and the memory 120 may store software instructions that can be executed by the video encoder 200 and the video decoder 300, etc., respectively.
  • the video encoder 200 and the video decoder 300 may also include internal memory to achieve functionally similar or equivalent purposes.
  • the memory 106 and the memory 120 may store encoded video data output from the video encoder 200 and input to the video decoder 300, etc. In some examples, some portions of the memory 106 and the memory 120 may be allocated as one or more video buffers, such as to store decoded raw video data and/or encoded raw video data.
  • the computer-readable medium 110 may represent any type of medium or device capable of transmitting the encoded video data from the source device 102 to the destination device 116.
  • the computer-readable medium 110 represents a communication medium so that the source device 102 can directly transmit the bit stream to the destination device 116 in real time via a radio frequency network or a computer network, etc.
  • the output interface 108 can modulate a transmission signal including the encoded video data
  • the input interface 122 can modulate the received transmission signal.
  • the communication medium may include a wireless or wired communication medium, or both, such as a radio frequency (RF) spectrum or one or more physical transmission lines.
  • RF radio frequency
  • the communication medium may form part of a packet-based network, such as a local area network, a wide area network, or a global network such as the Internet.
  • the communication medium may include a router, a switch, a base station, or any other device that can be used to facilitate communication from the source device 102 to the destination device 116.
  • source device 102 may output the encoded data from output interface 108 to storage device 112.
  • destination device 116 may access the encoded data from storage device 112 via input interface 122.
  • Storage device 112 may include various distributed data storage media or locally accessed data storage media, such as a hard drive, a Blu-ray disc, a digital video disc (DVD), a compact disc-read only drive (CD-ROM), flash memory, volatile or non-volatile memory, or any other suitable digital storage medium for storing encoded video data.
  • source device 102 may output the encoded data to file server 114 or another intermediate storage device that may store the encoded video generated by source device 102.
  • Destination device 116 may access the stored video data from file server 114 via online or download.
  • File server 114 may be any type of server device capable of storing the encoded data and transmitting the encoded data to destination device 116.
  • File server 114 may represent a network server (such as for a website), a file transfer protocol (FTP) server, a content distribution network device, or a network attached storage (NAS) device.
  • Destination device 116 may access the encoded data from file server 114 via any standard data connection including an Internet connection.
  • This may include a wireless channel such as a Wi-Fi connection, a wired connection such as a digital subscriber line (DSL) and a cable modem, or a combination of a wireless channel and a wired connection suitable for accessing the encoded video data stored on file server 114.
  • File server 114 and input interface 122 may be configured to operate according to a streaming protocol, a downloading transmission protocol, or a combination thereof.
  • the output interface 108 and the input interface 122 may represent a wired networking component such as a wireless transmitter/receiver, a modem, an Ethernet card, a wireless communication component or other physical component that operates according to any of the various IEEE 802.11 standards.
  • the output interface 108 and the input interface 122 may be configured to transmit data such as encoded data according to the fourth generation mobile communication technology (4G), 4G long-term evolution (4G-LTE), advanced LTE (LTE Advanced), the fifth generation mobile communication technology (5G) or other cellular communication standards.
  • 4G fourth generation mobile communication technology
  • 4G-LTE 4G long-term evolution
  • LTE Advanced advanced LTE
  • 5G fifth generation mobile communication technology
  • the output interface 108 and the input interface 122 may be configured to transmit data such as encoded data according to other wireless standards such as IEEE 802.11 specifications, IEEE 802.15 specifications (e.g., ZigBee TM ), Bluetooth standards, etc.
  • the source device 102 and/or the destination device 116 may include corresponding system-on-chip (SoC) devices.
  • SoC system-on-chip
  • the source device 102 may include a SoC device to perform functions such as the video encoder 200 and/or the output interface 108
  • the destination device 116 may include a SoC device to perform functions such as the video decoder 300 and/or the input interface 122 .
  • the technology disclosed in the present invention can be applied to video encoding that supports a variety of multimedia applications, such as wireless television broadcasting, cable television transmission, satellite television transmission, Internet streaming video transmission such as HTTP-based dynamic adaptive streaming, digital video encoded on data storage media, decoding of digital video stored on data storage media, or other applications.
  • multimedia applications such as wireless television broadcasting, cable television transmission, satellite television transmission, Internet streaming video transmission such as HTTP-based dynamic adaptive streaming, digital video encoded on data storage media, decoding of digital video stored on data storage media, or other applications.
  • the input interface 122 of the destination device 116 receives a bitstream from the computer-readable medium 110 (such as the storage device 112 and the file server 114).
  • the bitstream may include signaling information defined by the video encoder 200 and used by the video decoder 300, such as syntax elements with values describing properties and/or processing of video blocks or other coding units (such as slices, pictures, groups of pictures, sequences, etc.).
  • the display device 118 displays the decoded pictures of the decoded video data to the user.
  • the display device 118 can be various types of display devices, such as cathode ray tube (CRT) based devices, liquid crystal displays (LCD), plasma displays, organic light emitting diode (OLED) displays, or other types of display devices.
  • CTR cathode ray tube
  • LCD liquid crystal displays
  • plasma displays organic light emitting diode (OLED) displays, or other types of display devices.
  • the video encoder 200 and the video decoder 300 may each be integrated with an audio encoder and/or an audio decoder, and may include an appropriate multiplexing-demultiplexing (MUX-DEMUX) unit or other hardware and/or software to handle multiplexed streams including both audio and video in a common data stream.
  • MUX-DEMUX multiplexing-demultiplexing
  • the MUX-DEMUX unit may conform to the ITU H.223 multiplexer protocol or other protocols such as the User Datagram Protocol (UDP).
  • UDP User Datagram Protocol
  • the video encoder 200 and the video decoder 300 may be implemented as any suitable codec circuit, such as a microprocessor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), discrete logic elements, software, hardware, firmware, or any combination thereof.
  • the device may store instructions for the software in a suitable non-transitory computer-readable medium, and use one or more processors to execute the instructions in hardware to perform the technology of the present disclosure.
  • the video encoder 200 and the video decoder 300 may be included in one or more encoders or decoders, either of which may be integrated as part of a combined encoder/decoder (CODEC) in the corresponding device.
  • the device including the video encoder 200 and/or the video decoder 300 may be an integrated circuit, a microprocessor, and/or a wireless communication device such as a cellular phone.
  • the video encoder 200 and the video decoder 300 may operate according to a video codec standard, for example, according to a video codec standard such as ITU-T H.265 (also known as High Efficiency Video Codec (HEVC)), or according to an extension of HEVC such as multi-view and/or scalable video codec extensions.
  • a video codec standard such as ITU-T H.265 (also known as High Efficiency Video Codec (HEVC)), or according to an extension of HEVC such as multi-view and/or scalable video codec extensions.
  • HEVC High Efficiency Video Codec
  • the video encoder 200 and the video decoder 300 may operate according to other proprietary or industry standards, such as the Joint Exploration Test Model (JEM) or the Versatile Video Codec (VVC) standard currently under development.
  • JEM Joint Exploration Test Model
  • VVC Versatile Video Codec
  • the video encoder 200 and the video decoder 300 can encode and decode video data represented in a YUV (e.g., Y, Cb, Cr) format. That is, the video encoder 200 and the video decoder 300 can encode and decode luminance and chrominance components instead of encoding and decoding red, green, and blue (RGB) data of picture samples, where the chrominance components may include chrominance components of red and blue tones.
  • the video encoder 200 converts the received RGB formatted data into a YUV format before encoding
  • the video decoder 300 converts the YUV format into an RGB format.
  • a pre-processing unit and a post-processing unit can perform these conversions.
  • the video encoder 200 and the video decoder 300 may perform a block-based encoding and decoding process for a picture.
  • block or “video block” generally refers to a structure that includes data to be processed (such as encoded, decoded, or otherwise used in an encoding and/or decoding process).
  • a block may include a two-dimensional matrix of luminance and/or chrominance data samples.
  • a picture may first be divided into a plurality of blocks for encoding and decoding processing, and a block in the picture that is being encoded and decoded may be referred to as a "current block” or "current video block.”
  • embodiments of the present disclosure may also involve encoding and decoding a picture to include a process of encoding or decoding picture data.
  • the present disclosure may involve encoding a block of a picture to include a process of encoding or decoding data of the block, such as prediction and/or residual coding.
  • the bitstream obtained by the encoding process typically includes a series of values for syntax elements, which represent coding decisions (such as coding modes) and information on dividing the picture into blocks. Therefore, encoding a picture or a block can generally be understood as encoding the values of the syntax elements that form the picture or the block.
  • HEVC defines various blocks, including coding units (CUs), prediction units (PUs), and transform units (TUs).
  • a video encoder such as video encoder 200 partitions a coding tree unit (CTU) into CUs according to a quadtree structure. That is, the video encoder partitions the CTU and the CU into four equal non-overlapping blocks, and each node of the quadtree has zero or four child nodes.
  • a node without child nodes may be referred to as a "leaf node", and the CU of such a leaf node may include one or more PUs and/or one or more TUs.
  • the video encoder may further partition the PU and TU.
  • the residual quadtree represents the partitioning of the TU.
  • a PU represents inter-frame prediction data
  • a TU represents residual data.
  • An intra-predicted CU includes intra-frame prediction information such as an intra-frame mode indication.
  • a quadtree with nested multi-type trees replaces the concept of multiple partition unit types, that is, it removes the separation of the concepts of CU, PU, and TU, unless a CU that is too large for the maximum transform length is required, and supports greater flexibility in the shape of CU partitions.
  • the CU can have a square or rectangular shape.
  • the CTU is divided by a quadtree structure.
  • the quadtree leaf nodes can be further divided by a multi-type tree structure.
  • the multi-type leaf nodes are called coding units CU, and unless the CU is too large for the maximum transform length, the partition is used for prediction and transform processing without any further partitioning. This means that in most cases, CU, PU, and TU have the same block size in a quadtree with a nested multi-type tree coding block structure.
  • the video encoder 200 and the video decoder 300 may be configured to use quadtree segmentation according to HEVC, perform quadtree binary tree (QTBT) segmentation according to JEM, or use other segmentation structures. It should be understood that the technology of the present disclosure may also be applied to video encoders configured to use quadtree segmentation or other segmentation types.
  • the video encoder 200 encodes the video data of the CU for representing prediction information and/or residual information and other information.
  • the prediction information indicates how to predict the CU to form a prediction block of the CU.
  • the residual information generally represents the sample-by-sample difference between the sample of the CU before encoding and the sample of the prediction block.
  • the video encoder 200 may further generate syntax data for the video decoder 300, such as block-based syntax data, picture-based syntax data, and sequence-based syntax data, for example, in a picture header, a block header, a slice header, etc., or generate other syntax data such as a sequence parameter set (SPS), a picture parameter set (PPS), or a video parameter set (VPS).
  • the video decoder 300 may also decode such syntax data to determine how to decode the corresponding video data.
  • the syntax data may include various syntax elements, flags, parameters, etc., for indicating the encoding and decoding information of the video.
  • the video encoder 200 can generate a bitstream including encoded video data, such as syntax elements describing the partitioning of a picture into blocks (such as CUs) and prediction information and/or residual information of the blocks.
  • the video decoder 300 can receive the bitstream and decode the encoded video data.
  • the video decoder 300 performs a process that is inverse to the process performed by the video encoder 200 to decode the encoded video data in the bitstream.
  • the video decoder 300 can decode the values of the syntax elements of the bitstream in a manner substantially similar to the video encoder 200.
  • the syntax elements can define the picture CTU according to the partition information, and each CTU is partitioned according to the corresponding partition structure such as the QTBT structure to define the CU of the CTU.
  • the syntax elements can further define the prediction information and residual information of the block (such as the CU) of the video data.
  • the residual information can be represented by, for example, quantized transform coefficients.
  • the video decoder 300 can inverse quantize and inverse transform the quantized transform coefficients of the block to reproduce the residual block of the block.
  • the video decoder 300 uses the prediction mode (intra-frame or inter-frame prediction) and the related prediction information (such as motion information for inter-frame prediction) signaled in the bitstream to form a prediction block of the block.
  • the video decoder 300 can then combine the prediction block and the residual block (on a sample-by-sample basis) to reproduce the original block.
  • the video decoder 300 may also perform additional processing, such as performing a deblocking process to reduce visual artifacts along block boundaries.
  • FIG3 is a block diagram showing an example video encoder according to some embodiments of the present disclosure
  • FIG4 is a block diagram showing an example video decoder according to some embodiments of the present disclosure
  • the encoder shown in FIG3 may be implemented as the video encoder 200 in FIG2
  • the decoder shown in FIG4 may be implemented as the video decoder 300 in FIG2.
  • the codec according to some embodiments of the present disclosure will be described in detail below in conjunction with FIG3 and FIG4.
  • FIG. 3 and FIG. 4 are provided for the purpose of explanation and should not be considered as limiting the techniques widely illustrated and described in the present disclosure.
  • the present disclosure describes the video encoder 200 and the video decoder 300 in the context of a developing video codec standard (such as the HEVC video codec standard or the H.266 video codec standard), but the techniques of the present disclosure are not limited to these video codec standards.
  • the units (or modules) in FIG. 3 are shown to help understand the operations performed by the video encoder 200. These units can be implemented as fixed-function circuits, programmable circuits, or a combination of the two. Fixed-function circuits refer to circuits that provide specific functions and are pre-set on executable operations. Programmable circuits refer to circuits that can be programmed to perform a variety of tasks and provide flexible functions in executable operations. For example, a programmable circuit can execute software or firmware that enables the programmable circuit to operate in a manner defined by the instructions of the software or firmware. Fixed-function circuits can execute software instructions (to receive parameters or output parameters, etc.), but the type of operation performed by the fixed-function circuit is generally fixed. In some examples, one or more units may be different circuit blocks (fixed-function circuit blocks or programmable circuit blocks), and in some examples, one or more units may be integrated circuits.
  • the video encoder 200 shown in Figure 3 may include an arithmetic logic unit (ALU), an elementary function unit (EFU), a digital circuit, an analog circuit, and/or a programmable core formed by a programmable circuit.
  • ALU arithmetic logic unit
  • EFU elementary function unit
  • the memory 106 Figure 2 may store the object code of the software received and executed by the video encoder 200, or other memory (not shown) within the video encoder 200 may be used to store such instructions.
  • the video encoder 200 may receive an input video, for example, from a video data memory, or directly from a video acquisition device.
  • the video data memory may store video data to be encoded by the video encoder 200 component.
  • the video encoder 200 may receive video data stored in the video data memory from, for example, a video source 104 (as shown in FIG. 2 ).
  • the decode cache may be used as a reference picture memory to store reference video data for use by the video encoder 200 when predicting subsequent video data.
  • the video data memory and the decode cache may be formed by a variety of memory devices, such as a dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • the video data memory and the decode cache may be provided by the same storage device or different storage devices.
  • the video data memory may be located on the same chip as other components of the video encoder 200 as shown in FIG. 3 , or may not be located on the same chip as other components.
  • references to video data storage should not be interpreted as limited to storage within the video encoder 200 (unless specifically described as such) or to storage outside the video encoder 200 (unless specifically described as such). Rather, references to video data storage should be understood as reference storage that stores video data received by the video encoder 200 for encoding (such as video data of a current block to be encoded).
  • the memory 106 in FIG. 2 can also provide temporary storage for the output of various units in the video encoder 200.
  • the mode selection unit usually coordinates multiple encoding channels to test the combinations of encoding parameters and the rate-distortion values obtained by these combinations.
  • the encoding parameters may include the partitioning of CTU to CU, the prediction mode of CU, the transform type of CU residual data, the quantization parameter of CU residual data, etc.
  • the mode selection unit may ultimately select the encoding parameter combination with a better rate-distortion value than other tested combinations.
  • the video encoder 200 may partition a picture retrieved from a video memory into a series of CTUs and encapsulate one or more CTUs into a slice.
  • the mode selection unit may partition the CTUs of the picture according to a tree structure (such as the QTBT structure described above or the quadtree structure of HEVC).
  • the video encoder 200 may form one or more CUs by partitioning the CTUs according to the tree structure.
  • Such CUs may also be generally referred to as "blocks" or "video blocks”.
  • the mode selection unit also controls its components (such as a motion estimation unit, a motion compensation unit, and an intra prediction unit) to generate a prediction block for a current block (such as an overlapping portion of a current CU or a PU and a TU in HEVC).
  • the motion estimation unit may perform a motion search to identify one or more closely matching reference blocks in one or more reference pictures (such as one or more decoded pictures stored in a decoding cache).
  • the motion estimation unit may calculate a value representing the degree of similarity between a potential reference block and the current block based on, for example, the sum of absolute differences (SAD), the sum of squared differences (SSD), the mean absolute difference (MAD), the mean squared difference (MSD), etc.
  • the motion estimation unit may typically perform these calculations using a sample-by-sample difference between the current block and the reference block under consideration.
  • the motion estimation unit may identify the reference block having the lowest value generated from these calculations, thereby indicating the reference block that matches the current block most closely.
  • the motion estimation unit may form one or more motion vectors (MVs) that define the position of a reference block in a reference picture relative to the position of a current block in a current picture.
  • the motion estimation unit may then provide the motion vectors to the motion compensation unit.
  • MVs motion vectors
  • the motion estimation unit may provide a single motion vector
  • bidirectional inter prediction the motion estimation unit may provide two motion vectors.
  • the motion compensation unit may then use the motion vectors to generate a prediction block.
  • the motion compensation unit may use the motion vectors to retrieve data for the reference block.
  • the motion compensation unit may interpolate the prediction block according to one or more interpolation filters.
  • the motion compensation unit may retrieve data for two reference blocks identified by corresponding motion vectors and combine the retrieved data by sample-by-sample averaging or weighted averaging, etc.
  • the intra prediction unit may generate a prediction block from samples adjacent to the current block.
  • the intra prediction unit may generally mathematically combine the values of the adjacent samples and fill these calculated values along a defined direction on the current block to generate a prediction block.
  • the intra prediction unit may calculate an average value of samples adjacent to the current block and generate a prediction block to include the resulting average value of each sample of the prediction block.
  • the mode selection unit may generate a prediction block for the current block being encoded via a corresponding unit associated with the coding technique.
  • the mode selection unit may not generate a prediction block, but instead generate syntax elements indicating how to reconstruct the block according to the selected palette. In such modes, the mode selection unit may provide these syntax elements to the entropy coding unit for encoding.
  • the residual unit receives a current block and a corresponding prediction block.
  • the residual unit then generates a residual block for the current block.
  • the residual unit calculates the sample-by-sample difference between the prediction block and the current block.
  • the transform unit (“Transform & Sample & Quantize” shown in FIG. 3 ) applies one or more transforms to the residual block to produce a block of transform coefficients (e.g., referred to as a “transform coefficient block”).
  • the transform unit may apply various transforms to the residual block to form a transform coefficient block.
  • the transform unit may apply a discrete cosine transform (DCT), a directional transform, a Karhunen-Loewen transform (KLT), or a conceptually similar transform to the residual block.
  • the transform unit may perform multiple transforms on the residual block, e.g., a primary transform and a secondary transform, such as a rotation transform.
  • the transform unit may not apply a transform to the residual block.
  • the transform unit may quantize the transform coefficients in the transform coefficient block to produce a quantized transform coefficient block.
  • the transform unit may quantize the transform coefficients of the transform coefficient block according to a quantization parameter (QP) value associated with the current block.
  • QP quantization parameter
  • the video encoder 200 e.g., via the mode selection unit
  • the encoder 200 may further include an encoding control unit for generating control information for operations in the encoding process.
  • the inverse quantization and inverse transform unit (“Inverse Quantization & Inverse Transform" shown in Figure 3) may apply inverse quantization and inverse transform to the quantized transform coefficient block, respectively, to obtain a reconstructed residual block from the transform coefficient block.
  • the reconstruction unit may generate a reconstructed block corresponding to the current block based on the reconstructed residual block and the prediction block generated by the mode selection unit (although there may be some degree of distortion). For example, the reconstruction unit may add the samples of the reconstructed residual block to the corresponding samples of the prediction block generated from the mode selection unit to generate a reconstructed block.
  • the reconstructed block may be subjected to a filtering process, such as the loop filtering unit shown in FIG3 , to perform one or more filtering operations.
  • the filtering process may include a deblocking operation to reduce blocking artifacts along the CU edge.
  • the filtering process operation may be skipped.
  • the video encoder 200 may store the reconstructed block in a decoding cache.
  • the reconstruction unit may store the reconstructed block in a decoding cache.
  • the filtered reconstructed block may be stored in a decoding cache.
  • the motion estimation unit and the motion compensation unit may retrieve a reference picture formed by the reconstructed (and possibly filtered) block from the decoding cache to perform inter-frame prediction on blocks of subsequently encoded pictures.
  • the intra-frame prediction unit may use the reconstructed block in the decoding cache of the current picture to perform intra-frame prediction on other blocks in the current picture.
  • the operations described above are about blocks. This description should be understood as operations for luma coding blocks and/or chroma coding blocks.
  • the luma coding blocks and chroma coding blocks are the luma components and chroma components of a CU.
  • the luma coding blocks and chroma coding blocks are the luma components and chroma components of a PU.
  • the entropy coding unit may entropy encode syntax elements received from other functional components of the video encoder 200.
  • the entropy coding unit may entropy encode a block of quantized transform coefficients from a transform unit.
  • the entropy coding unit may entropy encode a prediction syntax element from a mode selection unit (such as motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction) to generate entropy coded data.
  • a mode selection unit such as motion information for inter-frame prediction or intra-frame mode information for intra-frame prediction
  • the entropy coding unit may perform a context adaptive variable length coding (CAVLC) operation, a context adaptive binary arithmetic coding (CABAC) operation, a variable length coding operation, a syntax-based context adaptive binary arithmetic coding (SBAC) operation, a probability interval partitioning entropy (PIPE) coding operation, an exponential Golomb coding operation, or other types of entropy coding operations on the data.
  • the entropy coding unit may operate in a bypass mode where the syntax elements are not entropy encoded.
  • the video encoder 200 may output a bitstream including entropy coded syntax elements required to reconstruct a block of a slice or picture.
  • FIG. 4 is a block diagram illustrating an example video decoder according to some embodiments of the present disclosure, for example, the decoder shown in FIG. 4 may be the video decoder 300 in FIG. 2 . It is understood that FIG. 4 is provided for explanation, but not for limitation of the techniques widely exemplified and described in the present disclosure. For the purpose of explanation, the video decoder 300 is described according to the HEVC technique. However, the disclosed techniques may be performed by video decoding devices configured to other video codec standards.
  • the basic structure of the video decoder 300 can be similar to the video encoder shown in FIG. 3, so that the encoder 200 and the decoder 300 both include video encoding and decoding components, that is, the encoder 200 and the decoder 300 can both implement video encoding and decoding processes.
  • the encoder 200 and the decoder 300 can be collectively referred to as a codec. Therefore, the system consisting of the encoder 200 and the decoder 300 can support one-way or two-way video transmission between devices, such as video streaming, video playback, video broadcasting or video phone communication.
  • the video decoder 300 can include more, fewer or different functional components than the components shown in FIG. 4.
  • FIG. 4 shows components related to the decoding conversion process according to some embodiments of the present disclosure.
  • the video decoder 300 includes a memory, an entropy decoding unit, a prediction processing unit, an inverse quantization and inverse transform unit (the "inverse quantization & inverse transform unit" shown in FIG. 4 ), a reconstruction unit, a filter unit, a decoding cache, and a bit depth inverse transform unit.
  • the prediction processing unit may include a motion compensation unit and an intra-frame prediction unit.
  • the prediction processing unit may also include an addition unit to perform predictions according to other prediction modes.
  • the prediction processing unit may include a palette unit, an intra-frame block copy unit (which may form part of the motion compensation unit), an affine unit, a linear model (LM) unit, and the like.
  • the video decoder 300 may include more, fewer, or different functional components.
  • the decoder 300 may receive a bitstream including encoded video data.
  • the memory in FIG. 4 may be referred to as a codec picture buffer (CPB) to store a bitstream including the encoded video data, which is used to wait for decoding by a component of the video decoder 300.
  • the video data stored in the CPB may be obtained, for example, from a computer-readable medium 110 ( FIG. 2 ) or the like.
  • the CPB may also store temporary data such as outputs of various units of the video decoder 300.
  • the decoding buffer typically stores decoded pictures, and the video decoder 300 may output and/or use the decoded pictures as reference video data when decoding subsequent data or pictures of the bitstream.
  • the CPB memory and the decoding buffer may be formed by a variety of memory devices, such as a dynamic random access memory (DRAM) including synchronous DRAM (SDRAM), magnetoresistive RAM (MRAM), resistive RAM (RRAM), or other types of memory devices.
  • DRAM dynamic random access memory
  • SDRAM synchronous DRAM
  • MRAM magnetoresistive RAM
  • RRAM resistive RAM
  • the CPB memory and the decoding buffer may be provided by the same memory device or different memory devices.
  • the CPB memory may be located on the same chip as other components of the video decoder 300, as shown in the figure, or may not be located on the same chip as other components.
  • a fixed-function circuit refers to a circuit that provides a specific function and is pre-set on an executable operation.
  • a programmable circuit refers to a circuit that can be programmed to perform a variety of tasks and provide flexible functions in an executable operation.
  • a programmable circuit may execute software or firmware that causes the programmable circuit to operate in a manner defined by the instructions of the software or firmware.
  • a fixed-function circuit may execute software instructions (to receive parameters or output parameters, etc.), but the type of operation performed by the fixed-function circuit is generally fixed.
  • one or more units may be different circuit blocks (fixed-function circuit blocks or programmable circuit blocks), and in some examples, one or more units may be integrated circuits.
  • the video decoder 300 may include an ALU, an EFU, a digital circuit, an analog circuit, and/or a programmable core formed by a programmable circuit.
  • an on-chip or off-chip memory may store instructions (such as object code) of the software received and executed by the video decoder 300.
  • the entropy decoding unit may perform entropy decoding on the received bit stream to parse out coding information corresponding to the picture therefrom.
  • the decoder 300 may perform a decoding conversion process according to the parsed encoding information to generate display video data.
  • the operations that may be performed by the decoder 300 at the decoding end may refer to the decoding conversion process shown in FIG. 4, which may be understood to include a general decoding process to generate a display picture for display by a display device.
  • the entropy decoding unit may receive a bitstream including an encoded video from, for example, the memory 120, and entropy decode it to reproduce syntax elements.
  • the inverse quantization and inverse transform unit (“inverse quantization & inverse transform” shown in FIG4 ), the reconstruction unit, and the filter unit may generate a decoded video based on the syntax elements extracted from the bitstream, for example, generate a decoded picture.
  • the video decoder 300 reconstructs a picture block by block.
  • the video decoder 400 may perform a reconstruction operation on each block separately, where the block currently being reconstructed (ie, decoded) may be referred to as a "current block.”
  • the entropy decoding unit may entropy decode the syntax elements of the quantized transform coefficients of the quantized transform coefficient block and the transform information such as the quantization parameter (QP) and/or the transform mode indication.
  • the inverse quantization and inverse transform unit may use the QP associated with the quantized transform coefficient block to determine the degree of quantization, and may also determine the degree of inverse quantization to be applied.
  • the inverse quantization and inverse transform unit may perform a bit-by-bit left shift operation to inverse quantize the quantized transform coefficients.
  • the inverse quantization and inverse transform unit may thereby form a transform coefficient block including the transform coefficients.
  • the inverse quantization and inverse transform unit may apply one or more inverse transforms to the transform coefficient block to generate a residual block associated with the current block.
  • the inverse quantization and inverse transform unit may apply an inverse DCT, an inverse integer transform, an inverse Karhunen-Loewenstein transform (KLT), an inverse rotational transform, an inverse directional transform, or other inverse transforms to the coefficient block.
  • the prediction processing unit generates a prediction block based on the prediction information syntax element entropy decoded by the entropy decoding unit. For example, if the prediction information syntax element indicates that the current block is inter-frame predicted, the motion compensation unit may generate the prediction block. In this case, the prediction information syntax element may indicate a reference picture in the decoding cache (from which the reference block is retrieved), and a motion vector identifying the reference block in the reference picture relative to the position of the current block in the current picture.
  • the motion compensation unit may generally perform an inter-frame prediction process in a manner substantially similar to that described with respect to the motion compensation unit in FIG. 3.
  • the intra prediction unit may generate a prediction block according to the intra prediction mode indicated by the prediction information syntax element.
  • the intra prediction unit may generally perform an intra prediction process in a manner substantially similar to that described with respect to the intra prediction unit in FIG. 3.
  • the intra prediction unit may retrieve data of neighboring samples of the current block from a decoding buffer.
  • the reconstruction unit may reconstruct the current block using the prediction block and the residual block. For example, the reconstruction unit may add samples of the residual block to corresponding samples of the prediction block to reconstruct the current block.
  • the filter unit may perform one or more filter operations on the reconstructed block.
  • the filter unit may perform a deblocking operation to reduce blockiness artifacts along the edges of the reconstructed block. It will be appreciated that the filtering operation need not be performed in all examples, i.e., the filtering operation may be skipped in some cases.
  • the video decoder 300 may store the reconstructed block in a decode buffer.
  • the decode buffer may provide reference information to units such as motion compensation and motion estimation, such as samples of the current picture for intra-frame prediction and samples of previously decoded pictures for subsequent motion compensation.
  • the video decoder 300 may output decoded pictures from the decode buffer for subsequent presentation on a display device (such as the display device 118 of FIG. 2 ).
  • Figure 5 is a schematic diagram of the coding structure of the All Intra (AI) configuration provided by at least one embodiment of the present disclosure.
  • FIG6 is a schematic diagram of a coding structure of a low-delay (LD) configuration provided by at least one embodiment of the present disclosure.
  • LD low-delay
  • the low-latency LD configuration is mainly suitable for real-time communication environments with low latency requirements.
  • all P frames or B frames use generalized P/B frame prediction, and the EOC of all frames is still consistent with the POC.
  • a "1+x" solution is proposed, where "1" is a nearest neighbor reference frame and "x" is x high-quality reference frames.
  • Fig. 7A is a schematic diagram of an inter-frame prediction coding provided by at least one embodiment of the present disclosure.
  • Fig. 7B is a schematic flow chart of an inter-frame prediction technology provided by at least one embodiment of the present disclosure.
  • video prediction coding technology is mainly divided into two categories: (1) intra-frame prediction, that is, using the coded pixels in the current image (current video frame) to generate prediction values; (2) inter-frame prediction, that is, using the reconstructed pixels of the coded image before the current image (current video frame) to generate prediction values.
  • Inter-frame prediction coding refers to using the correlation in the video time domain to use the pixels of the adjacent coded images to predict the pixels of the current image in order to effectively remove the redundancy in the video time domain.
  • the inter-frame prediction coding algorithm in the HEVC/H265 standard obtains the motion information of each block of the current image in the reference image by using the coded image as the reference image of the current image.
  • the motion information is usually represented by a motion vector and a reference frame index.
  • the reference image can be forward, backward or bidirectional.
  • the motion information can be directly inherited from the adjacent blocks, or the corresponding motion information can be obtained by searching for matching blocks in the reference image through motion estimation. Then the prediction value of the current block is obtained through the motion compensation process.
  • Each pixel block of the current image searches for a best matching block in the previously encoded image. This process is called motion estimation.
  • the image used for prediction is called the reference image
  • the displacement from the reference block to the current block i.e. the current pixel block
  • the motion vector the displacement from the reference block to the current block (i.e. the current pixel block)
  • the prediction residual the difference between the current block and the reference block
  • the video codec standard defines three types of images: I-frame images, P-frame images, and B-frame images.
  • I-frame images can only use intra-frame coding, while P-frame images and B-frame images can use inter-frame prediction coding.
  • the prediction method for P-frame images is to predict the current image from the previous frame image, which is called "forward prediction". That is, to find the matching block (reference block) of the current block in the forward reference image.
  • B-frame images can use three prediction methods: forward prediction, backward prediction, and bidirectional prediction.
  • Inter-frame prediction coding relies on the correlation between frames, including processes such as motion estimation and motion compensation.
  • the main process of inter-frame prediction includes the following steps:
  • Step 1 Create a motion vector (MV) candidate list, perform Lagrangian rate-distortion optimization (RDO) calculation, and select the MV with the smallest distortion as the initial MV;
  • RDO Lagrangian rate-distortion optimization
  • Step 2 Find the point with the smallest matching error in step 1 as the starting point for the next search
  • Step 3 Start with a step size of 1 and increase exponentially by 2 to perform an 8-point diamond search. In this step, you can set the maximum number of searches (one search with a certain step size counts as one search);
  • Step 4 If the optimal step length obtained by searching in step 3 is 1, it is necessary to perform a two-point diamond search with the optimal point as the starting point, because when searching the previous 8 points, two of the 8 neighboring points of the optimal point will not be searched;
  • Step 5 If the optimal step length obtained in step 3 is greater than a certain threshold (iRaster), the point obtained in step 2 is used as the starting point, and a raster scan with a step length of iRaster is performed (i.e., all points within the range of the motion search are traversed);
  • a certain threshold iRaster
  • Step 6 After the previous steps 1-5, take the best point as the starting point and repeat steps 3 and 4 again;
  • Step 7 Save the MV corresponding to the best matching point as the final MV and the sum of absolute differences (SAD).
  • inter-frame prediction technology mainly includes Merge and advanced motion vector prediction AMVP technologies that use the idea of temporal and spatial motion video prediction.
  • the core idea of these two technologies is to establish a list of candidate motion vector prediction MVs and select the MV with the best performance as the predicted MV of the current coding block. For example, in Merge mode, an MV candidate list is established for the current prediction unit (PU), and there are 5 candidate MVs (and their corresponding reference images) in the list. By traversing these 5 candidate MVs and calculating the rate-distortion cost, the one with the lowest rate-distortion cost is finally selected as the optimal MV for the Merge mode. If the encoder/decoder establishes the candidate list in the same way, the encoder only needs to transmit the index of the optimal MV in the candidate list, which greatly saves the number of coding bits for motion information.
  • the video codec standard VVC uses the motion vector prediction technology in HEVC, but has made some optimizations, such as extending the length of the Merge motion vector candidate list, modifying the candidate list construction process, and adding some new prediction technologies, such as affine transformation technology and adaptive motion vector accuracy technology.
  • FIG8A and FIG8B show schematic diagrams of affine motion compensation provided by at least one embodiment of the present disclosure.
  • FIG8A shows an affine transformation of two control points, i.e., the affine motion vector of the current block is generated by two control points (four parameters).
  • FIG8B shows an affine transformation of three control points, i.e., the affine motion vector of the current block is generated by three control points (six parameters).
  • the reference point vectors a and b can be expressed as ( ah , av ) and ( bh , bv ) in two-dimensional space respectively, and the horizontal prediction vector of the current center pixel 2-point 4-parameter affine motion model is ( MVh , MVv ), which can be expressed by a and b vectors as follows (similarly, the vertical prediction of the center pixel 2-point 4-parameter affine motion model can be known), where w and h represent the width and height of the current block respectively.
  • the 6-parameter affine motion model adds a reference point c, and the motion vector is expressed as ( ch , cv ).
  • FIG. 9 is a schematic diagram of a video data processing method provided by at least one embodiment of the present disclosure.
  • a video data processing method 10 is provided.
  • the video data processing method 10 can be applied to various application scenarios related to video encoding and decoding, for example, it can be applied to terminals such as mobile phones and computers, and for example, it can be applied to video websites/video platforms, etc., and the embodiments of the present disclosure do not specifically limit this.
  • the video data processing method 10 includes the following operations S101 to S102.
  • Step S101 for a current video block of a video, determine to use a first inter-frame prediction mode for encoding and decoding.
  • Step S102 Based on the determination, performing conversion between the current video block and the bitstream of the video.
  • derivation of the motion vector of the current video block is based on a base region in the video corresponding to the first display mode.
  • the video may be a captured video, a video downloaded from the Internet, or a locally stored video, etc. It may also be an LDR video, an SDR video, etc., and the embodiments of the present disclosure do not impose any restrictions on this.
  • the first display mode of the video can define that the video screen gradually increases from a certain expansion starting position along a certain expansion direction (for example, horizontally or vertically).
  • a certain expansion direction for example, horizontally or vertically.
  • the video screen expands from left to right (i.e., horizontally), and for example, in some examples, the video screen expands from top to bottom (i.e., vertically), and vice versa.
  • the first display mode of the video defines the size, position and other information of the basic area. It should be noted that the "first display mode" is not limited to a specific one or some display modes, nor is it limited to a specific order.
  • any inter-frame prediction mode can be selected for the current video block of the video.
  • the current video block can select the Merge mode and the AMVP mode for inter-frame prediction coding.
  • the Merge mode a motion vector MV candidate list is constructed for the current block, and the candidate list includes 5 candidate MVs. These 5 candidate MVs generally include two types: spatial domain and time domain. The spatial domain provides up to 4 candidate MVs, and the time domain provides up to 1 candidate MV. If the number of candidate MVs in the current MV candidate list does not reach 5, it is necessary to use a zero vector (0,0) to fill in the specified number.
  • the MV candidate list constructed in the AMVP mode also includes two cases: spatial domain and time domain. The difference is that the length of the AMVP list is only 2.
  • the H.266/VVC standard expands the size of the candidate list of the Merge mode, and there can be up to 6 candidate MVs.
  • the VVC standard also introduces new inter-frame prediction technologies, such as affine prediction mode, combined intra-frame and inter-frame prediction (CIIP) mode, geometric partition prediction mode (TPM), bidirectional optical flow (BIO) method, bidirectional weighted prediction (BCW), Merge mode with motion vector difference, etc.
  • the available inter-frame prediction mode may include any one of the above-mentioned inter-frame prediction modes, and the embodiments of the present disclosure are not limited to this.
  • rate distortion optimization is used to select the best inter-frame prediction mode and motion vector.
  • first inter-frame prediction mode is used to indicate the inter-frame prediction mode for the current video block. It should be noted that the "first inter-frame prediction mode" is not limited to a specific one or some inter-frame prediction modes, nor is it limited to a specific order.
  • the first inter-frame prediction mode can be any one of the above-mentioned available inter-frame prediction modes, such as Merge mode, advanced motion vector prediction AMVP mode, Merge mode with motion vector difference, bidirectional weighted prediction mode or affine prediction mode, etc., and the embodiments of the present disclosure are not limited to this.
  • the conversion between the current video block and the bitstream may include encoding the current video block into the bitstream, or may include decoding the current video block from the bitstream.
  • the conversion process may include an encoding process, or may include a decoding process, which is not limited in the embodiments of the present disclosure.
  • the derivation of the motion vector of the current video block is based on the base area in the video corresponding to the first display mode.
  • the motion vector of the current video block is related to the position and/or size of the base area.
  • the reference pixels used by the current video block are restricted to a specific area.
  • the bit stream of the video can be partially decoded according to the display area actually displayed, thereby reducing the decoding resource consumption of the undisplayed part, effectively improving the efficiency of video encoding and decoding, and thus improving the user's product experience.
  • a base area refers to an area that is always displayed in a video, which can be determined by the first display mode of the video.
  • a fixed area with a certain length along the expansion direction from the video expansion start position defined by the first display mode is used as a base area (also referred to as the first display area herein).
  • the position of the base area may vary depending on the first display mode of the video.
  • the first display mode of the video defines that the video screen gradually increases from the leftmost dotted line along the direction from left to right.
  • the fixed area is an area extending a certain length from left to right starting from the leftmost side of the video screen.
  • the first display mode of the video defines that the video screen starts from the top and gradually increases from top to bottom.
  • the fixed area is an area extending a certain length from top to bottom starting from the top of the video screen.
  • the length ratio of the base area to the entire display area is 1:1.
  • the length ratio of the base area to the entire display area is 1:2.
  • the embodiment of the present disclosure does not limit the position/size of the base area, which can be set according to actual conditions.
  • one or more syntax elements related to the first display mode can be used to define the application of the first display mode, the expansion direction of the video, the size of the basic area, etc.
  • the embodiments of the present disclosure are not limited to this and can be set according to actual conditions.
  • FIG. 10 is a schematic diagram of a scrolling video codec provided by at least one embodiment of the present disclosure.
  • the video content corresponding to the rolled-up area may not be displayed, so that the bit stream corresponding to the rolled-up area may not be decoded.
  • the oblique line area represents the area not involved in the decoding.
  • the size of the area not involved in the decoding also gradually increases. Therefore, according to the actual display area of the sliding scroll screen, the received bit stream is partially decoded to achieve the technical effect of saving resource consumption in the decoding process, thereby improving the battery life of products with sliding scroll screens and enhancing the user's product experience.
  • partial decoding of the P frame image needs to satisfy the requirement that the MV prediction range of the current video frame to be displayed should be a subset of the display area of the reference frame.
  • the tile coding designed in the existing coding standard can divide the picture into regions, only the coding tree units CTU contained in the tile are encoded in a scanning order to improve the parallel encoding and decoding capabilities. Since the tile only limits the range of intra-frame prediction, that is, the intra-frame prediction mode will not use pixel information beyond the Tile range, however, for the inter-frame coding and loop filtering modules, it may exceed the Tile boundary.
  • At least one embodiment of the present disclosure provides a progressive coding structure.
  • FIG. 11 is a schematic diagram of a method for dividing a current video frame provided by at least one embodiment of the present disclosure.
  • the current video frame includes a first display area and at least one display sub-area arranged in sequence adjacent to each other along the expansion direction (from left to right).
  • the first display area is represented as a basic_tile area on the left
  • the at least one display sub-area is represented as at least one enhanced_tile area on the right.
  • the first display area may be a fixed display area, such as a fixed display area of a conventional display screen.
  • the first display area is a display area that is always displayed during the rolling process of the sliding scroll screen.
  • the first display area is half of the video screen, such as a number of CTUs that occupies half of the video screen.
  • the first display area is a third of the video screen, such as a number of CTUs that occupies one third of the video screen.
  • the first display area is the entire area of the video screen.
  • the "first display area” is used to indicate a display area (i.e., a basic area) of a fixed display, and is not limited to a specific display area or a specific order.
  • other display areas except the first display area in the current video frame can be evenly divided into at least one display sub-area, such as at least one enhanced_tile shown in FIG11.
  • the number of display sub-areas varies with the scrolling state of the scroll screen.
  • each of the at least one display sub-area is the same size, and the width or height of each display sub-area is greater than one CTU.
  • FIG12A is a schematic diagram of a rotation axis direction of a scrolling screen provided by at least one embodiment of the present disclosure
  • FIG12B is a schematic diagram of another rotation axis direction of a scrolling screen provided by at least one embodiment of the present disclosure.
  • the encoding direction of the video is generally from left to right in the horizontal direction and then from top to bottom in the vertical direction.
  • the rotation axis direction of the sliding scroll screen is considered to be perpendicular to the encoding direction of the video.
  • the rotation axis moves from left to right, and the area to be displayed becomes larger and larger.
  • the rotation axis direction of the sliding scroll screen is considered to be parallel to the encoding direction of the video.
  • the rotation axis moves from top to bottom, and the area to be displayed becomes larger and larger.
  • the current video frame may include the leftmost first display area (basic_tile) and at least one display sub-area (enhanced_tile) arranged adjacently in sequence along the expansion direction (from left to right).
  • the current video frame may include the topmost first display area (basic_tile) and at least one display sub-area (enhanced_tile) below the first display area arranged adjacently in sequence along the expansion direction (from top to bottom).
  • first display sub-area is used to indicate the display sub-area (enhanced_tile) on the right side of the basic area/first display area (basic_tile)
  • second display sub-area is used to indicate the display sub-area (enhanced_tile) below the basic area/first display area (basic_tile).
  • first display sub-area and the “second display sub-area” are not limited to a specific one or some display sub-areas, nor are they limited to a specific order.
  • the motion vector of the current video block in response to the current video block being located in a first display area of the current video frame, is within a first motion vector prediction range.
  • an independent encoding method is used for the basic area (basic_tile).
  • the basic area basic_tile needs to be decoded.
  • a valid MV prediction range i.e., the first MV prediction range
  • the MV associated with the current video block is within the MV prediction range.
  • FIG. 13 shows a schematic diagram of encoding a video block located in a first display area provided by at least one embodiment of the present disclosure.
  • the first motion vector prediction range is determined based on the position of the current video block, the motion vector prediction accuracy, and the boundary of the first display area.
  • the four boundaries of the first display area basic_tile are respectively represented as: left boundary basic_tile l , right boundary basic_tile r , upper boundary basic_tile t and lower boundary basic_tile b , and the coordinates of the current video block (e.g., the PU shown in FIG13 ) are (x, y).
  • the MV prediction accuracy can be expressed as 2 -n , where n is an integer.
  • B left represents the left boundary of the first MV prediction range
  • B right represents the right boundary of the first MV prediction range
  • B top represents the upper boundary of the first MV prediction range
  • B bottom represents the lower boundary of the first MV prediction range.
  • the MV prediction accuracy may be indicated by a syntax element, or may be a default accuracy, etc., and the embodiments of the present disclosure are not limited to this.
  • the first MV prediction range defined by equations (1)-(4) is both a limiting range of the initial MV (corresponding to the search start point) of the current video block and a limiting range of the final MV (corresponding to the best matching point) of the current video block. In this way, it is ensured that the final MV of the current video block is within the corresponding MV prediction range (the first MV prediction range).
  • all reference pixels used by the current video block are in the base region.
  • the base region can also be defined by the above equations (1)-(4).
  • mv_x represents the horizontal component of the MV
  • mv_y represents the vertical component of the MV.
  • each MV in the candidate list of the current video block it can be determined whether each MV in the candidate list is within the corresponding MV prediction range. For example, in some examples, if it is determined that a certain MV is not within the MV prediction range, the MV is removed from the candidate list and will not be selected as the initial MV to ensure that the search starting point of the MV is within the corresponding MV prediction range.
  • FIG. 14 is a schematic diagram showing that a current video block is located at a boundary of a first display area according to at least one embodiment of the present disclosure.
  • the motion information of the reference block H is usually used. If the reference block H is not available, it is replaced by the reference block C.
  • the motion information of the reference block at position H cannot be obtained (because it is not decoded). Therefore, during the encoding process, a large error value may be assigned to the motion information of the reference block H, so that the motion information will not become the optimal candidate MV, that is, the motion information of the reference block C is selected as the temporal candidate list.
  • the current video frame includes a first display area and at least one display sub-area sequentially arranged adjacent to each other along an expansion direction, and the expansion direction is from left to right.
  • the motion vector of the current video block is limited to the second motion vector prediction range.
  • the second motion vector prediction range is determined based on the position of the current video block, the motion vector prediction accuracy, the boundary of the first display area, and the width of the first display sub-area.
  • the current video frame includes a first display area basic_tile on the left and at least one display sub-area enhanced_tile on the right.
  • the width of each display sub-area enhanced_tile is not less than the width of one CTU.
  • the left boundary of the MV prediction range (second MV prediction range) of the display sub-region enhanced_tile is the left boundary of the basic_tile of the current video frame
  • the right boundary of the second MV prediction range is the right boundary of the enhanced_tile adjacent to the left side of the enhanced_tile where the current video block is located.
  • the right boundary of the second MV prediction range is the right boundary of the basic_tile.
  • the MV prediction range of the current video block in the first display sub-region enhanced_tile located on the right side of the first display region basic_tile is determined by the following equations (7) to (12).
  • left k (basic_tile l -x+2 n ) ⁇ n Equation (7)
  • right k (basic_tile r +(k-1)*s en -x-2 n ) ⁇ n Equation (8)
  • top k (basic_tile t -y+2 n ) ⁇ n Equation (9)
  • bottom k (basic_tile b -y-2 n ) ⁇ n Equation (10)
  • x, y represent the position of the current video block (e.g., PU)
  • the four boundaries of the first display area basic_tile are represented as basic_tile l , basic_tile r , basic_tile t , and basic_tile b
  • sen represents the width of the first display sub-area enhanced_tile
  • the prediction accuracy of the MV is 2 -n , where n is an integer.
  • the left boundary left k , the right boundary right k , the upper boundary top k , and the lower boundary bottom k represent the MV boundaries of the kth enhanced_tile on the right side of the basic_tile from the left to the right, where k is an integer not less than 0. That is, the four MV boundaries constitute the MV prediction range of the current video block of the display sub-area enhanced_tile located on the right side of the display area basic_tile, that is, the second MV prediction range.
  • mv_x represents the horizontal component of the MV
  • mv_y represents the vertical component of the MV.
  • the second MV prediction range is equal to the first MV prediction range.
  • the first right boundary of the first MV prediction range is different from the second right boundary of the second MV prediction range.
  • first right boundary is used to indicate the right boundary of the first MV prediction range
  • second right boundary is used to indicate the right boundary of the second MV prediction range.
  • first right boundary and the second right boundary are not limited to a specific one or some boundaries, nor are they limited to a specific order.
  • the current video frame includes a first display area and at least one display sub-area arranged in sequence along an expansion direction, and the expansion direction is from top to bottom.
  • the motion vector of the current video block is limited to a third motion vector prediction range.
  • the third motion vector prediction range is determined based on the position of the current video block, the motion vector prediction accuracy, the boundary of the first display area, and the height of the second display sub-area.
  • the current video frame includes a first display area basic_tile at the top and at least one display sub-area enhanced_tile at the bottom.
  • the height of each display sub-area enhanced_tile is not less than the height of one CTU.
  • the MV prediction range of the current video block in the second display sub-area enhanced_tile located below the first display area basic_tile is determined by the following equations (13) to (16).
  • left m (basic_tile l -x+2 n ) ⁇ n Equation (13)
  • right m (basic_tile r -x-2 n ) ⁇ n Equation (14)
  • top m (basic_tile t -y+2 n ) ⁇ n Equation (15)
  • bottom m (basic_tile b +(m-1)*s en -y-2 n ) ⁇ n Equation (16)
  • the MV prediction range of the current video block of the mth enhanced_tile located below the basic_tile from top to bottom i.e., the third MV prediction range
  • the third MV prediction range is defined by the left boundary left m , the right boundary right m , the upper boundary top m , and the lower boundary bottom m defined in equations (13) to (16).
  • x, y represent the position of the current video block (e.g., PU)
  • the four boundaries of the first display area basic_tile are respectively represented as: basic_tile l , basic_tile r , basic_tile t , and basic_tile b
  • sen represents the height of the second display sub-area enhanced_tile
  • the prediction accuracy of the MV is 2 -n
  • n is an integer
  • m is an integer not less than 0.
  • mv_x represents the horizontal component of the MV
  • mv_y represents the vertical component of the MV. For example, based on equations (17) to (18), it is determined whether the MV associated with the current video block overflows the boundary.
  • the third motion vector prediction range is equal to the first motion vector prediction range.
  • the first lower boundary of the first motion vector prediction range in response to the current video block being located in the mth second display sub-area below the first display area and m being an integer greater than 1, is different from the third lower boundary of the third motion vector prediction range.
  • first lower boundary is used to indicate the lower boundary of the first MV prediction range
  • third lower boundary is used to indicate the lower boundary of the third MV prediction range.
  • first lower boundary and the “third lower boundary” are not limited to a specific one or some boundaries, nor are they limited to a specific order.
  • the lower boundary bottom m of the third MV prediction range is equal to the lower boundary B bottom of the first MV prediction range, and the first MV prediction range is equal to the third MV prediction range.
  • k is an integer greater than 1
  • bottom m is not equal to B bottom .
  • equations (1) to (4) and equations (13) to (16) except for the lower boundary, the other three boundaries of the first MV prediction range are respectively equal to the other three boundaries of the third MV prediction range.
  • the temporal candidate motion vector prediction value in the motion vector prediction candidate list is calculated using the spatial candidate motion vector prediction value.
  • the motion information of the reference block H and the reference block C cannot be obtained in the process of constructing the temporal candidate list.
  • the motion vector scaling MV of the first reference block is selected according to the order of the spatial candidate list and added to the temporal candidate list, as shown in the following equation (19).
  • td and tb represent the distances between the current video block and the X reference block to their respective reference images.
  • the embodiments of the present disclosure do not limit which spatial candidate motion vector prediction value is used to replace the temporal candidate motion vector prediction value, and can be set according to actual needs.
  • performing conversion between the current video block of the current video frame and the bit stream of the video may include a decoding process.
  • the received bit stream is fully decoded for display.
  • the display terminal when the display terminal only partially displays, for example, the display terminal has a scrolling screen as shown in FIG. 1, only a portion of the received bit stream needs to be decoded, thereby reducing the use of decoding resources and improving the encoding and decoding efficiency of the video.
  • FIG. 15 is a schematic diagram of another video data processing method provided by at least one embodiment of the present disclosure.
  • another video data processing method 30 is provided.
  • the video data processing method 30 can be applied to various application scenarios related to video decoding (i.e., applied to the decoding end).
  • the video data processing method 30 includes the following operations S301 to S303.
  • Step S301 Receive a video bit stream.
  • Step S302 Determine whether a current video block of a video is encoded or decoded using a first inter-frame prediction mode.
  • Step S303 Based on the determination, the bitstream is decoded, and in the first inter-frame prediction mode, the derivation of the motion vector of the current video block is based on a basic area in the video corresponding to the first display mode.
  • the decoding side based on the bit stream of the received video, it can be determined whether the video applies the first display mode and the corresponding video expansion direction. For example, in some examples, when the received bit stream includes the syntax element "enhanced_tile_enabled_hor" (or the value of the syntax element is 1), the current video applies the first display mode, and the expansion direction is horizontal (for example, from left to right). For another example, in other examples, when the received bit stream includes the syntax element "enhanced_tile_enabled_ver” (or the value of the syntax element is 1), the current video applies the first display mode, and the expansion direction is vertical (for example, from top to bottom).
  • the received bit stream includes the syntax element "enhanced_tile_enabled_hor” (or the value of the syntax element is 1)
  • the current video applies the first display mode
  • the expansion direction is vertical (for example, from top to bottom).
  • the application of the first display mode is not only determined based on the relevant syntax elements in the received bitstream, but also takes into account the actual situation of the display terminal. For example, in some examples, when the video display mode of the display terminal does not match the first display mode identified in the bitstream, the first display mode is not applied. For example, when the relevant syntax elements in the bitstream indicate that the first display mode is applied to the current video and the expansion direction is the horizontal direction, and at the same time, the video display mode of the display terminal is rolled up in the vertical direction, it is determined that the first display mode is not applied to the current video.
  • the relevant syntax elements in the bitstream indicate that the first display mode is applied to the current video and the expansion direction is the horizontal direction, and at the same time, the video display mode of the display terminal is conventionally displayed (e.g., full-screen display), and no rolling is required, it is determined that the first display mode is not applied to the current video.
  • the video display mode of the display terminal is conventionally displayed (e.g., full-screen display), and no rolling is required, it is determined that the first display mode is not applied to the current video.
  • the embodiment of the present disclosure does not specifically limit this, and can be set according to actual conditions.
  • decoding the bitstream includes: determining a to-be-decoded region of the current video frame; and decoding the bitstream based on the to-be-decoded region.
  • the to-be-decoded region includes at least a first display region corresponding to the base region.
  • the encoding of the current video frame mainly includes the encoding of the basic area/first display area basic_tile and the encoding of at least one display sub-area enhanced_tile arranged adjacently from left to right.
  • the basic_tile area of each video frame in the video needs to be decoded, and the number of decoded enhanced_tile areas can be determined according to the size of the display area and the number of decoded enhanced_tiles of the previous frame.
  • the current video frame can only decode one more enhanced_tile than the corresponding reference frame at most.
  • the area to be decoded is bounded by the boundary of the coding unit, or the coding unit is used as the unit of the area to be decoded.
  • the coding unit is described as a coding tree unit CTU as an example.
  • the area to be decoded (dec t ) may be determined based on at least one of the number of pixels to be displayed (l t ) and the number of video coding units CTUs (n t ) of the current video frame, the number of CTUs (n basic_tile ) of the first display area (basic_tile), the number of displayed pixels of the previous video frame (l t-1 ), and the number of CTUs of the decoded area of the previous video frame (b t -1 ).
  • the area to be decoded (dec t ) includes the decoded area of the previous video frame (dec t-1 ) and a new display sub-area (enhanced_tile).
  • the current video frame decodes at most one more display sub-region enhanced_tile than the previous video frame, that is, one more new display sub-region enhanced_tile is decoded.
  • the number of CTUs to be displayed (n t ) of the current video frame in response to the number of CTUs to be displayed (n basic_tile ) of the first display area, and the number of CTUs to be displayed (n t ) of the current video frame being less than the number of CTUs (b t-1 ) of the decoded area of the previous video frame, or in response to the number of CTUs to be displayed (n t ) of the current video frame being greater than the number of CTUs (n basic_tile ) of the first display area, the number of CTUs to be displayed (n t ) of the current video frame being equal to the number of CTUs (b t-1 ) of the decoded area of the previous video frame, and the number of pixels to be displayed (l t ) of the current video frame being not greater than the number of displayed pixels (l t-1 ) of the previous video frame, it is determined that the area to be decoded (de
  • the area to be decoded (dec t ) may be determined according to the following equation, as shown below:
  • dec t basic_tile
  • the basic_tile and multiple enhanced_tiles can be decoded in parallel.
  • CTUs in areas that do not need to be displayed may not be decoded and may be directly filled with 0 pixels.
  • the encoding and decoding efficiency of the video may be improved, the encoding and decoding process may be simplified, and product energy may be saved.
  • pixels may be used to fill the CTU in the area that does not need to be displayed.
  • the pixels are not necessarily 0 pixels and can be set according to actual needs.
  • FIG16 is a schematic block diagram of a video encoding and decoding system under an LDP configuration according to at least one embodiment of the present disclosure.
  • the general description of the video encoding and decoding system in FIG. 16 can refer to the relevant description of FIG. 2-4, which will not be repeated here.
  • the range of the motion vector associated with the current video block is limited to avoid using invalid reference pixel information.
  • the bit stream can be partially decoded according to the actual displayed area, thereby reducing the decoding resource consumption of the undisplayed part and improving the encoding and decoding efficiency.
  • FIG. 17 is a schematic flowchart of a method for processing video data under an LDP configuration according to at least one embodiment of the present disclosure.
  • a video data processing method is provided, as shown in Figure 17.
  • the video data processing method includes steps S201-S206.
  • Step S201 At the encoding end, the current video frame is divided into basic_tile and enhanced_tile.
  • the encoder divides a frame of image into two tiles, the left side is basic_tile, and the right side is at least one enhanced_tile.
  • a 1:1 allocation is performed according to the number of CTUs in a row.
  • a 1:2 allocation is performed according to the number of CTUs in a row.
  • the total number of CTUs of multiple enhanced_tiles in each row is one more than the number of CTUs in a row of basic_tile.
  • the embodiments of the present disclosure do not limit the specific division method and can be set according to actual needs.
  • Step S202 For the basic_tile on the left, an independent tile encoding method is used without coupling on the right side, and for the enhanced_tile on the right, a direction dependent on the basic_tile on the left side is used, and the encoding method and range of the MV are limited.
  • the selection process of the initial MV in the inter-frame coding AMVP and Merge process and the modification of the motion search algorithm are involved, so that the bitstream can adapt to the requirements of expansion and contraction during decoding.
  • Step S203 Obtain the number of pixels l t of the scrolling screen that are not scrolled at the current moment and the number of pixels l t-1 of the scrolling screen that are not scrolled at the previous frame moment, and pass them to the decoder.
  • Step S204 The decoding end receives the video bit stream (not limited to H.264/H.265/H.266).
  • Step S205 Obtain the number of CTUs b t-1 in the decoding area of the previous frame.
  • Step S206 Determine the area to be decoded according to the number of unscrolled pixels l t and l t-1 of the scroll screen at the current moment and the previous frame moment, and the number of CTUs b t-1 of the decoding area of the previous frame.
  • the area to be decoded includes basic_tile and corresponding multiple enhanced_tiles.
  • Step S207 Decode the area to be decoded and fill the undecoded area.
  • Step S208 Send the content of the area to be decoded to the display terminal for display.
  • the bit stream of the video can be partially decoded according to the area to be displayed, thereby reducing the decoding resource consumption of the undisplayed part and improving the encoding and decoding efficiency.
  • the execution order of the various steps of the video data processing method 10 is not limited. Although the execution process of each step is described in a specific order above, this does not constitute a limitation on the embodiments of the present disclosure.
  • the various steps in the video data processing method 10 can be executed serially or in parallel, which can be determined according to actual needs.
  • the video data processing method 10 can also include more or fewer steps, and the embodiments of the present disclosure are not limited to this.
  • FIG. 18 is a schematic block diagram of a video data processing apparatus according to at least one embodiment of the present disclosure.
  • the video data processing device 40 includes a determination module 401 and an execution module 402.
  • the determination module 401 is configured to determine to use a first inter-frame prediction mode for encoding and decoding for a current video block of a video.
  • the determination module 401 can implement step S101, and its specific implementation method can refer to the relevant description of step S101, which will not be repeated here.
  • the execution module 402 is configured to perform conversion between the current video block and the bit stream of the video based on the determination, and in the first inter-frame prediction mode, the derivation of the motion vector of the current video block is based on the basic area in the video corresponding to the first display mode.
  • the execution module 402 can implement step S102, and its specific implementation method can refer to the relevant description of step S102, which will not be repeated here.
  • determination modules 401 and execution modules 402 may be implemented by software, hardware, firmware or any combination thereof, for example, they may be implemented as determination circuit 401 and execution circuit 402 respectively, and the embodiments of the present disclosure do not limit their specific implementations.
  • the video data processing device 40 provided in at least one embodiment of the present disclosure can implement technical effects similar to the aforementioned video data processing method 10.
  • the bit stream can be partially decoded according to the area actually required to be displayed, thereby reducing the decoding resource consumption of the undisplayed part and improving the encoding and decoding efficiency.
  • the video data processing device 40 may include more or fewer circuits or units, and the connection relationship between the various circuits or units is not limited and can be determined according to actual needs.
  • the specific configuration of each circuit is not limited and can be composed of analog devices according to circuit principles, or can be composed of digital chips, or can be composed in other applicable ways.
  • At least one embodiment of the present disclosure further provides a display device, including a video data processing device and a scrolling screen.
  • the video data processing device is configured to decode the received bit stream according to the method provided by at least one of the above embodiments, and send the decoded pixel values to the scrolling screen for display.
  • the scrolling screen is fully expanded and there is no non-display area, then the video data processing device fully decodes the received bit stream.
  • the scrolling screen includes a rolled-up part and an expanded part, that is, there is a display area and a non-display area, then the video data processing device partially decodes the received bit stream.
  • the video data processing device in response to the scrolling screen including a display area and a non-display area during operation, decodes the bit stream based on the size of the display area at the current moment and the previous frame moment. For example, as shown in FIG1 , when the scrolling screen is in a partially expanded state, the video data processing device only needs to decode the content corresponding to the display area. For example, the video data processing device can determine the area to be decoded according to the size of the display area at the current moment and the previous frame moment.
  • the video data processing device can determine the number of pixels to be displayed l t and l t-1 of the video frame at the current moment and the previous frame moment according to the size of the display area at the current moment and the previous frame moment.
  • the operation of determining the area to be decoded has been described in detail in the previous text and will not be repeated here.
  • the display device includes a curling state judgment device in addition to the video data processing device and the scrolling screen.
  • the curling state judgment device is configured to detect the size of the display area of the scrolling screen and send the size of the display area to the video data processing device, so that the video data processing device decodes the bit stream based on the size of the display area at the current moment and the previous frame moment.
  • the curling state judgment device can be implemented by software, hardware, firmware or any combination thereof, for example, it can be implemented as a curling state judgment circuit, and the embodiments of the present disclosure do not limit the specific implementation of the curling state judgment device.
  • the embodiments of the present disclosure do not limit the type of display device.
  • the display device may be a mobile terminal, a computer, a tablet computer, a phone watch, a television, etc., and the embodiments of the present disclosure do not limit this.
  • the embodiments of the present disclosure do not limit the type of sliding screen.
  • the sliding screen may be any type of display screen with a variable display area, including but not limited to the type of sliding screen shown in FIG. 1.
  • the video data processing device included in the display device may be implemented as the video data processing device 40/90/600 mentioned in the present disclosure, etc., and the embodiments of the present disclosure do not limit the specific implementation of the video data processing device.
  • the display device may include more or fewer circuits or units, and the connection relationship between the various circuits or units is not limited and can be determined according to actual needs.
  • the specific configuration of each circuit is not limited and can be composed of analog devices according to circuit principles, or can be composed of digital chips, or in other applicable ways.
  • FIG. 19 is a schematic block diagram of another video data processing apparatus provided by at least one embodiment of the present disclosure.
  • the video data processing device 90 includes a processor 910 and a memory 920.
  • the memory 920 includes one or more computer program modules 921.
  • the one or more computer program modules 921 are stored in the memory 920 and are configured to be executed by the processor 910.
  • the one or more computer program modules 921 include instructions for executing the video data processing method 10 provided by at least one embodiment of the present disclosure.
  • the instructions are executed by the processor 910, one or more steps in the video data processing method 10 provided by at least one embodiment of the present disclosure can be executed.
  • the memory 920 and the processor 910 can be interconnected through a bus system and/or other forms of connection mechanisms (not shown).
  • the processor 910 may be a central processing unit (CPU), a digital signal processor (DSP), or other forms of processing units with data processing capabilities and/or program execution capabilities, such as a field programmable gate array (FPGA), etc.; for example, the central processing unit (CPU) may be an X86 or ARM architecture, etc.
  • the processor 910 may be a general-purpose processor or a dedicated processor, and may control other components in the video data processing device 90 to perform desired functions.
  • the memory 920 may include any combination of one or more computer program products, and the computer program product may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • Volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache), etc.
  • Non-volatile memory may include, for example, read-only memory (ROM), hard disk, erasable programmable read-only memory (EPROM), portable compact disk read-only memory (CD-ROM), USB memory, flash memory, etc.
  • One or more computer program modules 921 may be stored on the computer-readable storage medium, and the processor 910 may run one or more computer program modules 921 to implement various functions of the video data processing device 90.
  • video data processing device 90 can refer to the description of the video data processing method 10/30 above, which will not be repeated here.
  • FIG. 20 is a schematic block diagram of yet another video data processing device provided by at least one embodiment of the present disclosure.
  • the terminal devices in the embodiments of the present disclosure may include but are not limited to mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • mobile terminals such as mobile phones, laptop computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablet computers), PMPs (portable multimedia players), vehicle-mounted terminals (such as vehicle-mounted navigation terminals), etc., and fixed terminals such as digital TVs, desktop computers, etc.
  • PDAs personal digital assistants
  • PADs tablet computers
  • PMPs portable multimedia players
  • vehicle-mounted terminals such as vehicle-mounted navigation terminals
  • fixed terminals such as digital TVs, desktop computers, etc.
  • fixed terminals such as digital TVs, desktop computers, etc.
  • the video data processing device 600 shown in FIG20
  • the video data processing device 600 includes a processing device (e.g., a central processing unit, a graphics processor, etc.) 601, which can perform various appropriate actions and processes according to a program stored in a read-only memory (ROM) 602 or a program loaded from a storage device 608 to a random access memory (RAM) 603.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for the operation of the computer system are also stored.
  • the processing device 601, ROM 602, and RAM 603 are connected to each other via a bus 604.
  • An input/output (I/O) interface 605 is also connected to the bus 604.
  • an input device 606 including, for example, a touch screen, a touch pad, a keyboard, a mouse, a camera, a microphone, an accelerometer, a gyroscope, etc.
  • an output device 607 including, for example, a liquid crystal display (LCD), a speaker, a vibrator, etc.
  • a storage device 608 including, for example, a magnetic tape, a hard disk, etc.
  • a communication device 609 including a network interface card such as a LAN card, a modem, etc.
  • the communication device 609 may allow the video data processing device 600 to communicate with other devices wirelessly or by wire to exchange data, and perform communication processing via a network such as the Internet.
  • a drive 610 is also connected to the I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is installed on the drive 610 as needed, so that a computer program read therefrom is installed into the storage device 608 as needed.
  • FIG. 20 shows a video data processing device 600 including various devices, it should be understood that it is not required to implement or include all the devices shown. More or fewer devices may be implemented or included alternatively.
  • the video data processing device 600 may further include a peripheral interface (not shown in the figure), etc.
  • the peripheral interface may be various types of interfaces, such as a USB interface, a lightning interface, etc.
  • the communication device 609 may communicate with a network and other devices through wireless communication, such as the Internet, an intranet and/or a wireless network such as a cellular phone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN).
  • wireless communication such as the Internet, an intranet and/or a wireless network such as a cellular phone network, a wireless local area network (LAN) and/or a metropolitan area network (MAN).
  • LAN wireless local area network
  • MAN metropolitan area network
  • Wireless communication may use any of a variety of communication standards, protocols, and technologies, including, but not limited to, Global System for Mobile Communications (GSM), Enhanced Data GSM Environment (EDGE), Wideband Code Division Multiple Access (W-CDMA), Code Division Multiple Access (CDMA), Time Division Multiple Access (TDMA), Bluetooth, Wi-Fi (e.g., based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards), Voice over Internet Protocol (VoIP), Wi-MAX, protocols for email, instant messaging and/or Short Message Service (SMS), or any other suitable communication protocol.
  • GSM Global System for Mobile Communications
  • EDGE Enhanced Data GSM Environment
  • W-CDMA Wideband Code Division Multiple Access
  • CDMA Code Division Multiple Access
  • TDMA Time Division Multiple Access
  • Wi-Fi e.g., based on IEEE 802.11a, IEEE 802.11b, IEEE 802.11g and/or IEEE 802.11n standards
  • VoIP Voice over Internet Protocol
  • Wi-MAX protocols
  • the video data processing device 600 may be any device such as a mobile phone, a tablet computer, a laptop computer, an e-book, a television, or the like, or may be a combination of any data processing device and hardware, which is not limited in the embodiments of the present disclosure.
  • an embodiment of the present disclosure includes a computer program product, which includes a computer program carried on a non-transitory computer-readable medium, and the computer program contains program code for executing the method shown in the flowchart.
  • the computer program can be downloaded and installed from a network through a communication device 609, or installed from a storage device 608, or installed from a ROM 602.
  • the processing device 601 the video data processing method 10 disclosed in the embodiment of the present disclosure is executed.
  • the computer-readable medium disclosed above may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above.
  • Computer-readable storage media may include, but are not limited to: an electrical connection with one or more wires, a portable computer disk, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium containing or storing a program that may be used by or in combination with an instruction execution system, device or device.
  • a computer-readable signal medium may include a data signal propagated in a baseband or as part of a carrier wave, in which a computer-readable program code is carried. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the above.
  • the computer readable signal medium may also be any computer readable medium other than a computer readable storage medium, which may send, propagate or transmit a program for use by or in conjunction with an instruction execution system, apparatus or device.
  • the program code contained on the computer readable medium may be transmitted using any suitable medium, including but not limited to: wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
  • the computer-readable medium may be included in the video data processing device 600 ; or may exist independently without being incorporated into the video data processing device 600 .
  • FIG. 21 is a schematic block diagram of a non-transitory readable storage medium provided by at least one embodiment of the present disclosure.
  • FIG21 is a schematic block diagram of a non-transient readable storage medium according to at least one embodiment of the present disclosure.
  • a non-transient readable storage medium 70 stores computer instructions 111, which, when executed by a processor, execute one or more steps in the video data processing method 10 as described above.
  • the non-transitory readable storage medium 70 may be any combination of one or more computer-readable storage media, for example, a computer-readable storage medium includes a computer-readable program code for obtaining a first display mode of a video, another computer-readable storage medium includes a computer-readable program code for determining a first subset of multiple currently available inter-frame prediction modes for a current video block of a video, and another computer-readable storage medium includes a computer-readable program code for selecting an available first inter-frame prediction mode from the first subset, performing a conversion between a current video block of a current video frame and a bit stream of the video, and the available effective range of the motion vector used by each member in the first subset is determined according to the basic area corresponding to the first display mode in the video.
  • the above-mentioned various program codes may also be stored in the same computer-readable medium, and the embodiments of the present disclosure are not limited to this.
  • the computer may execute the program code stored in the computer storage medium, and execute, for example, the video data processing method 10 provided in any one of the embodiments of the present disclosure.
  • the storage medium may include a memory card of a smart phone, a storage component of a tablet computer, a hard disk of a personal computer, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), a portable compact disk read-only memory (CD-ROM), a flash memory, or any combination of the above storage media, or other applicable storage media.
  • the readable storage medium may also be the memory 920 in FIG. 19 , and the related description may refer to the aforementioned content, which will not be repeated here.
  • the term “plurality” refers to two or more than two, unless clearly defined otherwise.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

一种视频数据处理方法及视频数据处理装置、显示装置和存储介质。该方法包括:对于视频的当前视频块,确定使用第一帧间预测模式进行编解码(S101),基于确定,执行当前视频块与视频的比特流之间的转换,在第一帧间预测模式中,当前视频块的运动矢量的推导基于视频中的与第一显示模式对应的基础区域(S102)。通过上述方法,可以根据实际需要显示的区域来对比特流进行部分解码,从而减少未显示部分的解码资源消耗,提高编解码效率。

Description

视频数据处理方法及装置、显示装置和存储介质 技术领域
本公开的实施例涉及一种视频数据处理方法及视频数据处理装置、显示装置和计算机可读存储介质。
背景技术
数字视频功能可以结合在各种各样的设备中,包括数字电视、数字直播系统、无线广播系统、便携式电脑或台式电脑、平板电脑、电子阅读器、数码相机、数字记录设备、数字媒体播放器、视频游戏设备、视频游戏机、智能手机、视频电话会议设备和视频流设备等。数字视频设备可以实施视频编解码技术,诸如由MPEG-2、MPEG-4、ITU-T H.263、ITU-T H.264/MPEG-4、Part 10、高级视频编解码(AVC)、高效视频编解码(HEVC)、ITU-T H.265/高效视频编解码定义的标准以及此类标准的扩展中所描述的那些视频编解码技术。通过实施以上视频编解码技术,视频设备可以更有效地发送、接收、编码、解码和/或存储数字视频信息。
发明内容
本公开至少一个实施例提供一种视频数据处理方法。视频数据处理方法包括:对于视频的当前视频块,确定使用第一帧间预测模式进行编解码,基于所述确定,执行所述当前视频块与所述视频的比特流之间的转换。在所述第一帧间预测模式中,所述当前视频块的运动矢量的推导基于所述视频中的与第一显示模式对应的基础区域。
例如,在本公开至少一个实施例提供的方法中,对于所述视频的当前视频帧,从所述第一显示模式定义的视频展开起始位置沿着展开方向的第一显示区域作为所述基础区域。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述第一显示区域内,所述运动矢量在第一运动矢量预测范围内。
例如,在本公开至少一个实施例提供的方法中,所述第一运动矢量预测范围基于所述当前视频块的位置、运动矢量预测精度、所述第一显示区域的边界来确定。
例如,在本公开至少一个实施例提供的方法中,所述当前视频帧包括所述第一显示区域和沿着所述展开方向依次相邻布置的至少一个显示子区域,并且所述展开方向为从左到右,
响应于所述当前视频块位于所述当前视频帧中的所述第一显示区域右侧的第一显示子区域内,所述运动矢量在第二运动矢量预测范围内。
例如,在本公开至少一个实施例提供的方法中,所述第二运动矢量预测范围基于所述当前视频块的位置、运动矢量预测精度、所述第一显示区域的边界、所述第一显示子区域的宽度来确定。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述第一显示区域右侧的第k个第一显示子区域内并且k=1,所述第二运动矢量预测范围等于所述第一运动矢量预测范围。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述第一显示区域右侧的第k个第一显示子区域内并且k为大于1的整数,所述第一运动矢量预测范围的第一右侧边界与所述第二运动矢量预测范围的第二右侧边界不同。
例如,在本公开至少一个实施例提供的方法中,所述当前视频帧包括所述第一显示区域和沿着所述展开方向依次相邻布置的至少一个显示子区域,并且所述展开方向为从上到下,
响应于所述当前视频块位于所述第一显示区域下方的第二显示子区域内,所述运动矢量在第三运动矢量预测范围内。
例如,在本公开至少一个实施例提供的方法中,所述第三运动矢量预测范围基于所述当前视频块的位置、运动矢量预测精度、所述第一显示区域的边界、所述第二显示子区域的高度来确定。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述第一显示区域下方的第m个第二显示子区域内并且m=1,所述第三运动矢量预测范围等于所述第一运动矢量预测范围。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述第一显示区域下方的第m个第二显示子区域内并且m为大于1的整数,所述第一运动矢量预测范围的第一下方边界与所述第三运动矢量预测范围的第三下方边界不同。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述基础区域外,所述当前视频块的运动矢量预测候选列表中的时域候选运动矢量预测值基于空域候选运动矢量预测值计算。
例如,在本公开至少一个实施例提供的方法中,响应于所述当前视频块位于所述基础区域内,所述当前视频块使用的所有参考像素在所述基础区域内。
例如,在本公开至少一个实施例提供的方法中,所述第一帧间预测模式包括Merge预测模式、高级运动矢量预测AMVP模式、带运动矢量差的Merge模式、双向加权预测模式或者仿射预测模式。
本公开至少一个实施例还提供了一种视频数据处理方法。视频数据处理方法,包括:接收视频的比特流;确定所述视频的当前视频块使用第一帧间预测模式进行编解码;基于所述确定,对所述比特流进行解码。在所述第一帧间预测模式中,所述当前视频块的运动矢量的推导基于所述视频中的与第一显示模式对应的基础区域。
例如,在本公开至少一个实施例提供的方法中,对所述比特流进行解码,包括:确定所述视频的当前视频帧的待解码区域,所述待解码区域至少包括所述基础区域对应的第一显示区域。
例如,在本公开至少一个实施例提供的方法中,确定所述待解码区域,包括:基于所述当前视频帧待显示的像素数量和编码单元数量、所述第一显示区域的编码单元数量、前一视频帧的已显示的像素数量和所述前一视频帧的已解码区域的编码单元数量中的至少一个确定所述待解码区域。
例如,在本公开至少一个实施例提供的方法中,确定所述待解码区域,包括:响应于所述当前视频帧待显示的编码单元数量大于前一视频帧的已解码区域的编码单元数量,或者响应于所述当前视频帧待显示的编码单元数量等于前一视频帧的已解码区域的编码单元数量,并且所述当前视频帧待显示的像素数量大于所述前一视频帧的已显示的像素数量,确定所述待解码区域包括所述前一视频帧的已解码区域和一个新的显示子区域。
例如,在本公开至少一个实施例提供的方法中,确定所述待解码区域,包括:响应于所述当前视频帧待显示的编码单元数量大于所述第一显示区域的编码单元数量,并且所述当前视频帧待显示的编码单元数量小于前一视频帧的已解码区域的编码单元数量,或者响应于当前视频帧待显示的编码单元数量大于所述第一显示区域的编码单元数量,所述当前视频帧待显示的编码单元数量等于前一视频帧的已解码区域的编码单元数量,并且所述当前视频帧待显示的像素数量不大于所述前一视频帧的已显示的像素数量,确定所述当前视频帧的待解码区域包括所述当前视频帧待显示的区域。
本公开至少一个实施例还提供了一种视频数据处理装置,包括确定模块和执行模块。确定模块被配置为对于所述视频的当前视频块,确定使用第一帧间预测模式进行编解码。执行模块被配置为基于所述确定,执行所述当前视频块与所述视频的比特流之间的转换。在所述第一帧间预测模式中,所述当前视频块的运动矢量的推导基于所述视频中的与第一显示模式对应的基础区域。
本公开至少一个实施例还提供了一种显示装置,包括视频数据处理装置和滑卷屏。所述视频数据处理装置被配置为根据权利要求1-20中任一项所述的方法,对接收到的比特流进行解码,并将解码后的像素值发送至所述滑卷屏以供显示。
例如,在本公开至少一个实施例提供的显示装置中,响应于所述滑卷屏在工作中包括显示区域和非显示区域,所述视频数据处理装置基于当前时刻和前一帧时刻的所述显示区域的尺寸对所述比特流进行解码。
例如,本公开至少一个实施例提供的显示装置还包括卷曲状态判断装置。所述卷曲状态判断装置配置为检测所述滑卷屏的显示区域的尺寸,并将所述显示区域的尺寸发送至所述视频数据处理装置,以使得所述视频数据处理装置基于当前时刻和前一帧时刻的所述显示区域的尺寸对所述比特流进行解码。
例如,本公开至少一个实施例还提供一种视频数据处理装置,包括:处理器和包括一个或多个计算机程序模块的存储器。所述一个或多个计算机程序模块被存储在所述存储器中并被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行上述任一实施例提供的视频数据处理方法的指令。
例如,本公开至少一个实施例还提供一种计算机可读存储介质,其上存储有计算机指令。该指令被处理器执行时实现上述任一实施例提供的视频数据处理方法的步骤。
附图说明
为了更清楚地说明本公开实施例的技术方案,下面将对实施例的附图作简单地介绍,显而易见地,下面描述的附图仅仅涉及本公开的一些实施例,而非对本公开的限制。
图1为本公开至少一个实施例提供的一种滑卷屏的结构示意图;
图2为本公开至少一个实施例提供的一种示例视频编解码系统的框图;
图3为本公开至少一个实施例提供的一种示例视频编码器的框图;
图4为本公开至少一个实施例提供的一种示例视频解码器的框图;
图5为本公开至少一个实施例提供的全帧内配置的编码结构示意图;
图6为本公开至少一个实施例提供的低延迟配置的编码结构示意图;
图7A为本公开至少一个实施例提供的一种帧间预测编码的示意图;
图7B为本公开至少一个实施例提供的一种帧间预测技术的示意流程图;
图8A为本公开至少一个实施例提供的一种仿射运动补偿的示意图;
图8B为本公开至少一个实施例提供的另一种仿射运动补偿的示意图;
图9为本公开至少一个实施例提供的一种视频数据处理方法的示意图;
图10为本公开至少一个实施例提供的一种滑卷屏的视频编解码的示意图;
图11为本公开至少一个实施例提供的一种当前视频帧的划分方式的示意图;
图12A为本公开至少一个实施例提供的一种滑卷屏的转轴方向的示意图;
图12B为本公开至少一个实施例提供的另一种滑卷屏的转轴方向的示意图;
图13为本公开至少一个实施例提供的当前视频块位于第一显示区域内的编码示意图;
图14为本公开至少一个实施例提供的当前视频块位于第一显示区域的边界的编码示意图;
图15为本公开至少一个实施例提供的另一种视频数据处理方法的示意图;
图16为根据本公开至少一个实施例的一种在低延迟配置下视频编解码系统的示意框图;
图17为根据本公开至少一个实施例的一种在低延迟配置下视频数据处理方法的示意流程图;
图18为根据本公开至少一个实施例的一种视频数据处理装置的示意流程图;
图19为根据本公开至少一个实施例的另一种视频数据处理装置的示意框图;
图20为本公开至少一个实施例提供的又一种视频数据处理装置的示意框图;以及
图21为本公开至少一个实施例提供的一种非瞬时可读存储介质的示意框图。
具体实施方式
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合附图,对本公开实施例的技术方案进行清楚、完整地描述。显然,所描述的实施例是本公开的一部分实施例,而不是全部的实施例。基于所描述的本公开的实施例,本领域普通技术人员在无需创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。
本公开中使用了流程图来说明根据本申请的实施例的系统所执行的操作。应当理解的是,前面或下面操作不一定按照顺序来精确地执行。相反,根据需要,可以按照倒序或同时处理各种步骤。同时,也可以将其他操作添加到这些过程中,或从这些过程移除某一步或数步操作。
除非另外定义,本公开使用的技术术语或者科学术语应当为本公开所属领域内具有一般技能的人士所理解的通常意义。本公开中使用的“第一”、“第二”以及类似的词语并不表示任何顺序、数量或者重要性,而只是用来区分不同的组成部分。同样,“一个”、“一”或者“该”等类似词语也不表示数量限制,而是表示存在至少一个。“包括”或者“包含”等类似的词语意指出现该词前面的元件或者物件涵盖出现在该词后面列举的元件或者物件及其等同,而不排除其他元件或者物件。“连接”或者“相连”等类似的词语并非限定于物理的或者机械的连接,而是可以包括电性的连接,不管是直接的还是间接的。“上”、“下”、“左”、“右”等仅用于表示相对位置关系,当被描述对象的绝对位置改变后,则该相对位置关系也可能相应地改变。
由于对高分辨率视频的需求的增加,视频编解码方法和技术在现代技术中普遍存在。视频编解码器通常包括压缩或解压缩数字视频的电子电路或软件,并且不断改进以提供更高的编码效率。视频编解码器将未压缩视频转换成压缩格式,反之亦然。视频质量、用于表示视频的数据量(由比特率确定)、编码和解码算法的复杂性、对数据丢失和错误的敏感性、编辑的简易性、随机存取、端到端延迟(延迟时间)之间存在复杂的关系。压缩格式通常符合标准视频压缩规范,例如,高效视频编解码(HEVC)标准(也称为H.265)、待最终确定的通用视频编解码(VVC)标准(也称为H.266)或其他当前和/或未来的视频编解码标准。
可以理解的是,本公开所涉及的技术的实施例可以应用于现有视频编解码标准(例如,AVC、HEVC和VVC)和未来的标准以改进压缩性能。在本文中对于编解码操作的描述可以参照现有的视频编解码标准,可以理解的是,本公开中提供的方法并不限于所描述的视频编解码标准。
目前,随着折叠屏手机、折叠屏平板等终端产品的出现,人们越来越关注对柔性显示屏的研究,例如柔性滑卷屏等。图1为本公开至少一个实施例提供的一种滑卷屏的结构示意图。如图1所示,滑卷屏通常包括全展开状态和部分展开状态。滑卷屏的未卷部分视为显示区域,卷起部分视为未显示区域。需要说明的是,在本公开的各个实施例中,滑卷屏可以是任意类型的显示区域可变的显示屏,包括且不限于图1所示的滑卷屏结构。通常,在滑卷屏的实际使用过程,例如在卷收过程中,滑卷屏的被卷部分的视频画面无需显示,但仍然会解码该部分画面,造成解码资源的浪费。
至少为了解决上述技术问题,本公开至少一个实施例提供了一种视频数据处理方法,该方法包括:对于视频的当前视频块,确定使用第一帧间预测模式进行编解码;基于确定,执行当前视频块与视频的比特流之间的转换。在第一帧间预测模式中,当前视频块的运动矢量的推导基于视频中的与第一显示模式对应的基础区域。
相应地,本公开至少一个实施例还提供了一种对应于上述视频数据处理方法的视频数据处理装置、显示装置和计算机可读存储介质。
通过本公开至少一个实施例提供的视频数据处理方法,通过基于视频中的与第一显示模式对应的基础区域来推导当前视频块的运动矢量,使得可以根据实际显示的显示区域,对视频的比特流进行部分解码,从而减少未显示部分的解码资源消耗,有效提升视频编解码的效率,进而提升用户的产品使用体验。
需要说明的是,在本公开的实施例中,用于描述相邻块或者参考像素相对于当前视频块的位置的词语,例如“上方”、“下方”、“左侧”、“右侧”等,其含义与视频编解码标准(例如,AVC、HEVC和VVC)中所定义的保持一致。例如,在一些示例中,“左侧”和“右侧”分别表示在水平方向上的两侧,上方”和“下方”分别表示在垂直方向上的两侧。
下面通过多个示例或实施例及其示例对根据本公开提供的布局设计方法进行非限制性的说明,如下面所描述的,在不相互抵触的情况下这些具体示例或实施例中不同特征可以相互组合,从而得到新的示例或实施例,这些新的示例或实施例也都属于本公开保护的范围。
本公开的至少一个实施例提供一种编解码系统。可以理解的是,在本公开中,对于编码端和解码端可以采用相同结构的编解码器来实现。
图2是示出可执行根据本公开一些实施例的示例视频编解码系统1000的框图。本公开的技术一般地涉及对视频数据进行编解码(编码和/或解码)。一般来说,视频数据包括用于处理视频的任何数据,因此,视频数据可包括未编码的原始视频、编码视频、解码(如重构)视频和诸如语法数据的视频元数据。视频中可以包括一个或多个图片,或者称为图片序列。
如图2所示,在此示例中,系统1000包括源设备102,用于提供待由目的设备116解码以用于显示的经编码的视频数据,经编码的视频数据用于形成比特流(bitstream),以传输至解码端,其中,比特流也可以称为位流。具体而言,源设备102经由计算机可读介质110向目的设备116提供经编码的视频数据。源设备102和目的设备116可以实施为多种设备,如台式电脑、笔记本(即便携式)电脑、平板电脑、移动设备、机顶盒、智能手机、手持电话、电视、相机、显示设备、数字媒体播放器,视频游戏机、视频流设备等。在一些情况下,源设备102和目的设备116可被配备用于无线通信,因此也可被称为无线通信设备。
在图2的示例中,源设备102包括视频源104、存储器106、视频编码器200和输出接口108。目的设备116包括输入接口122、视频解码器300、存储器120和显示设备118。根据本公开一些实施例,源设备102的视频编码器200和目的设备116的视频解码器300可以被配置成用于实施根据本公开一些实施例的编码方法和解码方法。因此,源设备102表示视频编码设备的示例,而目的设备116表示视频解码设备的示例。在其他示例中,源设备102和目的设备116可包括其他组件或配置。例如,源设备102可以从外部像机等外部视频源接收视频数据。同样,目的设备116可以与外部显示设备连接,而无需内置集成显示设备118。
图2所示的系统1000仅为一个示例。一般来说,任何数字视频编码和/或解码设备都可执行根据本公开一些实施例的编码方法和解码方法。源设备102和目的设备116仅为此类编解码设备的示例,其中源设备102生成比特流以传输到目的设备116。本公开将“编解码”设备称为执行数据编解码(编码和/或解码)的设备。因此,视频编码器200和视频解码器300分别表示编解码设备的示例。
在一些示例中,设备102、116实质上是以对称的方式操作,这样,设备102、116均包括视频编码及解码组件,即设备102、116均可以实现视频编码和解码过程。因此,系统1000可支持视频设备102和116之间的单向或双向视频传输,如可用于视频流、视频回放、视频广播或视频电话通信。
一般来说,视频源104表示视频数据源(即未经编码的原始视频数据)且将视频数据的连续系列图片(也称为“帧”)提供至视频编码器200,视频编码器200对图片的数据进行编码。源设备102的视频源104可包括视频捕获设备,如视频摄像机、包含先前捕获的原始视频的视频档案库和/或用于从视频内容提供者接收视频的视频馈送接口。作为另一可选方案,视频源104可产生基于计算机图形的数据作为源视频或实况视频、存档视频和计算机生成视频的组合。在各种情形下,视频编码器200对捕获的、预捕获的或计算机生成的视频数据进行编码处理。视频编码器200可将图片从接收时的次序(有时称为“显示次序”)重新排列成用于编码的编码次序。视频编码器200可产生包括经编码的视频数据的比特流。然后,源设备102可经输出接口108将生成的比特流输出至计算机可读介质110上,用于如目的设备116的输入接口122等的接收和/或检索。
源设备102的存储器106和目的设备116的存储器120表示通用存储器。在一些示例中,存储器106和存储器120可存储原始视频数据,例如来自视频源104的原始视频数据和来自视频解码器300的解码视频数据。另外或可选地,存储器106和存储器120可分别存储可由视频编码器200和视频解码器300等分别执行的软件指令。尽管在此示例中与视频编码器200和视频解码器300分开展示,但应理解,视频编码器200和视频解码器300还可包括内部存储器,以实现功能上相似或等效的目的。此外,存储器106和存储器120可存储从视频编码器200输出且输入到视频解码器300等的经编码的视频数据。在一些示例中,存储器106和存储器120的一些部分可被分配作为一个或多个视频缓冲器,如为了存储解码的原始视频数据和/或经编码的原始视频数据。
计算机可读介质110可表示能够将经编码的视频数据从源设备102传输到目的设备116的任何类型的介质或设备。在一些示例中,计算机可读介质110表示通信介质,以使源设备102能够经由射频网络或计算机网络等将比特流直接实时地传输到目的设备116。根据无线通信协议等通信标准,输出接口108可调制包括编码的视频数据在内的传输信号,且输入接口122可以调制接收到的传输信号。该通信介质可包括无线或有线通信介质,或两者都包括,例如射频(RF)频谱或一条或多条物理传输线。通信介质可以形成基于分组的网络的一部分,如局域网、广域网或因特网等全球网络。通信介质可包括路由器、交换机、基站或可用于促进从源设备102到目的设备116的通信的任何其他设备。
在一些示例中,源设备102可将经编码的数据从输出接口108输出到存储设备112。类似地,目的设备116可经由输入接口122从存储设备112访问经编码的数据。存储设备112可包括各种分布式数据存储介质或本地访问的数据存储介质,如硬盘驱动器、蓝光光盘、数字视频盘(DVD)、只读光盘驱动器(CD-ROM)、闪存、易失性或非易失性存储器或用于存储编码视频数据的任何其他合适的数字存储介质。
在一些示例中,源设备102可将经编码的数据输出至文件服务器114或可存储由源设备102生成的编码视频的另一中间存储设备。目的设备116可经由在线或下载方式从文件服务器114访问所存储的视频数据。文件服务器114可以为能够存储经编码的数据并将经编码的数据传输至目的设备116的任何类型的服务器设备。文件服务器114可以表示网络服务器(如用于网站)、文件传输协议(FTP)服务器、内容分发网络设备或网络附加存储(NAS)设备。目的设备116可以通过包括因特网连接在内的任何标准数据连接从文件服务器114访问经编码的数据。这可以包括适用于访问存储在文件服务器114上的编码视频数据的Wi-Fi连接等无线信道、数字用户线路(DSL)和电缆调制解调器等有线连接或无线信道和有线连接的组合。文件服务器114和输入接口122可被配置成根据流式传输协议、下载传输协议或其组合来操作。
输出接口108和输入接口122可以表示无线发射器/接收器、调制解调器、以太网卡等有线连网组件、根据各种IEEE 802.11标准中任一项进行操作的无线通信组件或其他物理组件。在输出接口108和输入接口122包括无线组件的示例中,输出接口108和输入接口122可被配置成根据第四代移动通信技术(4G)、4G长期演进(4G-LTE)、先进LTE(LTE Advanced)、第五代移动通信技术(5G)或其他蜂窝通信标准来传送经编码的数据等数据。在输出接口108包括无线发射器的一些示例中,输出接口108和输入接口122可被配置成根据诸如IEEE 802.11规范、IEEE 802.15规范(例如,ZigBeeTM)、蓝牙标准等其他无线标准来传送经编码的数据等数据。在一些示例中,源设备102和/或目的设备116可以包括相应的片上系统(SoC)设备。例如,源设备102可包括SoC设备以执行视频编码器200和/或输出接口108的功能,目的设备116可包括SoC设备以执行诸如视频解码器300和/或输入接口122的功能。
本公开的技术可应用于支持多种多媒体应用的视频编码,如无线电视广播、有线电视传输、卫星电视传输、基于HTTP的动态自适应流等因特网流视频传输、编码到数据存储介质上的数字视频、存储于数据存储介质的数字视频的解码或其他应用。
目的设备116的输入接口122从计算机可读介质110(如存储设备112和文件服务器114等)接收比特流。比特流可以包括由视频编码器200限定的信令信息,这些信令信息也由视频解码器300使用,如具有描述视频块或其他编码单元(如条带、图片、图片组和序列等)的性质和/或处理过程的值的语法元素。
显示设备118向用户显示解码视频数据的解码图片。显示设备118可以是各种类型的显示设备,诸如基于阴极射线管(CRT)的设备、液晶显示器(LCD)、等离子显示器、有机发光二极管(OLED)显示器或其他类型的显示设备等。
尽管图2中未示出,但在一些示例中,视频编码器200和视频解码器300可各自与音频编码器和/或音频解码器集成,且可包括适当的复用-解复用(MUX-DEMUX)单元或其他硬件和/或软件,以处理公共数据流中既包括音频也包括视频的多路复用流。如果适用,则MUX-DEMUX单元可以符合ITUH.223多路复用器协议或诸如用户数据报协议(UDP)等其他协议。
视频编码器200和视频解码器300都可被实现为任何合适的编解码器电路,诸如微处理器、数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)、离散逻辑元件、软件、硬件、固件或其任何组合。当技术部分地以软件实现时,设备可将用于软件的指令存储在合适的非暂时性计算机可读介质中,且使用一个或多个以上处理器在硬件中执行指令以执行本公开的技术。视频编码器200和视频解码器300都可包含在一个或多个编码器或解码器中,编码器或解码器中的任一者可集成为相应设备中的组合编码器/解码器(CODEC)的一部分。包括视频编码器200和/或视频解码器300的设备可以是集成电路、微处理器和/或蜂窝式电话等无线通信设备。
视频编码器200和视频解码器300可以根据视频编解码标准来进行操作,例如,根据ITU-T H.265(也称为高效视频编解码(HEVC))等视频编解码标准来操作,或根据多视图和/或可伸缩视频编解码扩展等HEVC的扩展来操作。可选地,视频编码器200和视频解码器300可根据其他专有或工业标准(如目前正在开发的联合探索测试模型(JEM)或通用视频编解码(VVC)标准)来操作。本公开所涉及的技术不限于任何特定的编解码标准。
一般来说,视频编码器200和视频解码器300可以对以YUV(如Y、Cb、Cr)格式表示的视频数据进行编解码。即,视频编码器200和视频解码器300可以对亮度和色度分量进行编解码,而非对图片样点的红绿蓝(RGB)数据进行编解码,其中色度分量可以包括红色色调和蓝色色调的色度分量。在一些示例中,视频编码器200在进行编码之前将接收的RGB格式化数据转换为YUV格式,且视频解码器300将YUV格式转换为RGB格式。可选地,前处理单元和后处理单元(未示出)可以执行这些转换。
一般来说,视频编码器200和视频解码器300可执行图片的基于块的编解码过程。术语“块”或者“视频块”通常是指包括待处理的(如编码的、解码的或其他在编码和/或解码过程中使用的)数据的结构。例如,块可以包括亮度和/或色度数据样点的二维矩阵。一般地,可以首先将图片划分成多个块以进行编解码处理,图片中正在进行编解码处理的块可以称为“当前块”或者“当前视频块”。
此外,本公开的实施例还可以涉及对图片进行编解码以包括对图片数据进行编码或解码的过程。类似地,本公开可涉及对图片的块进行编码以包括对块的数据进行编码或解码的过程,如预测和/或残差编码。通过编码处理得到的比特流通常包括一系列用于语法元素的值,语法元素表示编码决策(如编码模式)以及将图片分割成块的信息。因此,对图片或块进行编码通常可以理解为对形成图片或块的语法元素的值进行编码。
HEVC界定各种块,包含编码单元(CU)、预测单元(PU)和变换单元(TU)。根据HEVC,视频编码器(如视频编码器200)根据四叉树结构将编码树单元(CTU)分割成CU。即,视频编码器将CTU和CU分割为四个相等的非重叠方块,且四叉树的每一节点具有零个或四个子节点。没有子节点的节点可被称为“叶节点”,且此类叶节点的CU可包括一个或多个PU和/或一个或多个TU。视频编码器可以进一步分割PU和TU。例如在HEVC中,残差四叉树(RQT)表示对TU的分割。在HEVC中,PU表示帧间预测数据,而TU表示残差数据。帧内预测的CU包括帧内模式指示等帧内预测信息。
在VVC中,带有嵌套的多类型树(使用二叉树和三叉树划分)的四叉树取代了多个分区单元类型的概念,即,它去除了CU、PU和TU概念的分离,除非对于最大变换长度而言尺寸过大的CU是需要的,并且支持CU分区形状的更大灵活性。在编码树结构中,CU可以具有正方形或矩形形状。首先,CTU由四叉树结构划分。然后,可以通过多类型树结构进一步划分四叉树叶节点。多类型树叶节点被称为编码单元CU,并且除非CU对于最大变换长度来说太大,否则该分割被用于预测和变换处理,而无需任何进一步的分割。这意味着,在大多数情况下,CU、PU和TU在具有嵌套多类型树编码块结构的四叉树中具有相同的块大小。
视频编码器200和视频解码器300可被配置成在按照HEVC使用四叉树分割,根据JEM进行四叉树二叉树(QTBT)分割或使用其他分割结构。应该了解的是,本公开的技术还可应用于被配置成使用四叉树分割或其他分割类型的视频编码器。视频编码器200对用于表示预测信息和/或残差信息及其他信息的CU的视频数据进行编码。预测信息指示如何预测CU以形成CU的预测块。残差信息通常表示编码前的CU的样点与预测块样点之间的逐样点差值。
视频编码器200可例如在图片标头、块标头、条带标头等中进一步产生用于视频解码器300的语法数据,例如基于块的语法数据、基于图片的语法数据和基于序列的语法数据,或者产生诸如序列参数集(SPS)、图片参数集(PPS)或视频参数集(VPS)的其他语法数据。视频解码器300可以同样解码此类语法数据以确定如何解码对应的视频数据。例如,语法数据可以包括各种语法元素、标志、参数等,用于表示视频的编解码信息。
以此方式,视频编码器200可以产生比特流,比特流包括经编码的视频数据,如描述将图片分割为块(如CU)的语法元素和块的预测信息和/或残差信息。最终,视频解码器300可接收比特流并解码经编码的视频数据。
一般来说,视频解码器300执行与视频编码器200所执行过程互逆的过程,以解码比特流中的经编码的视频数据。例如,视频解码器300可以以实质上类似于视频编码器200的方式来解码比特流的语法元素的值。语法元素可以根据分割信息来界定图片CTU,并根据QTBT结构等相应的分割结构对每个CTU进行分割,以界定CTU的CU。语法元素可进一步界定视频数据的块(如CU)的预测信息和残差信息。残差信息可由例如量化变换系数表示。视频解码器300可对块的量化变换系数进行逆量化和逆变换以再现块的残差块。视频解码器300使用在比特流中信令传输的预测模式(帧内或帧间预测)和相关预测信息(如用于帧间预测的运动信息)来形成块的预测块。视频解码器300可接着(在逐样点的基础上)组合预测块和残差块以再现原始块。此外,视频解码器300还可以执行附加处理,例如执行去方块过程以减少沿着块边界的视觉伪像。
图3是示出根据本公开一些实施例的示例视频编码器的框图,对应地,图4是示出根据本公开一些实施例的示例视频解码器的框图,例如,图3中示出的编码器可以实施为图2中的视频编码器200,图4中示出的解码器可以实施为图2中的视频解码器300。以下将结合图3和图4对根据本公开一些实施例的编解码器进行详细描述。
可以理解的是,提供图3和图4是为了解释的目的,不应将其视为对本公开中广泛例示和描述的技术的限制。为了解释,本公开在开发中的视频编解码标准(如HEVC视频编解码标准或H.266视频编解码标准)的上下文中描述视频编码器200和视频解码器300,但本公开的技术不限于这些视频编解码标准。
图3中的各单元(或称为模块)经示出以助理解由视频编码器200执行的操作。这些单元可实现为固定功能电路、可编程电路或两者的组合。固定功能电路是指提供特定功能并预先设置在可执行操作上的电路。可编程电路是指可经编程以执行多种任务并在可执行操作中提供灵活功能的电路。例如,可编程电路可执行使可编程电路以软件或固件的指令所界定的方式操作的软件或固件。固定功能电路可执行软件指令(来接收参数或输出参数等),但固定功能电路执行的操作类型通常是固定的。在一些示例中,一个或多个单元可以为不同的电路块(固定功能电路块或可编程电路块),且在一些示例中,一个或多个单元可以为集成电路。
图3中示出的视频编码器200可以包括算术逻辑单元(ALU)、基本功能单元(EFU)、数字电路、模拟电路和/或由可编程电路形成的可编程核心。在使用由可编程电路执行的软件来执行视频编码器200的操作的示例中,存储器106(图2)可存储视频编码器200接收并执行的软件的目标代码,或视频编码器200内的其他存储器(图中未示出)以用于存储此类指令。
在图3的示例中,视频编码器200可以接收输入视频,例如,可以从诸如视频数据存储器中接收输入视频,或者,也可以直接地从视频采集设备接收输入视频。视频数据存储器可以存储待由视频编码器200组件进行编码处理的视频数据。视频编码器200可从诸如视频源104(如图2所示)等接收存储在视频数据存储器中的视频数据。解码缓存可以用作参考图片存储器来存储参考视频数据,以供视频编码器200预测后续的视频数据时使用。视频数据存储器和解码缓存可以由多种存储器设备形成,如包括同步DRAM(SDRAM)、磁阻RAM(MRAM)、电阻RAM(RRAM)的动态随机存取存储器(DRAM)或其他类型的存储器设备。视频数据存储器和解码缓存可以由同一存储设备或不同的存储设备提供。在各种示例中,视频数据存储器可以如图3所示的与视频编码器200的其他组件位于同一芯片,也可与其他组件不位于同一芯片。
在本公开中,对视频数据存储器的参考不应被解释为限于视频编码器200内部的存储器(除非如此具体描述)或限于视频编码器200外部的存储器(除非如此具体描述)。更确切地说,对视频数据存储器的参考应理解为存储视频编码器200接收的以用于编码的视频数据(如待编码的当前块的视频数据)的参考存储器。此外,图2中的存储器106还可为视频编码器200中各单元的输出提供临时存储。
模式选择单元通常协调多个编码通道以测试编码参数的组合以及由这些组合得到的速率失真值。编码参数可包括CTU到CU的分割、CU的预测模式、CU残差数据的变换类型、CU的残差数据的量化参数等。模式选择单元可最终选择速率失真值比其他被测试组合更好的编码参数组合。
视频编码器200可将从视频存储器检索的图片分割成一系列CTU并将一个或多个CTU封装至条带内。模式选择单元可以根据树结构(如上述的QTBT结构或HEVC的四叉树结构)分割图片的CTU。如上所述,视频编码器200可通过根据树结构分割CTU来形成一个或多个CU。这样的CU通常也可称为“块”或者“视频块”。
一般来说,模式选择单元还控制其组件(诸如运动估计单元、运动补偿单元和帧内预测单元)以产生当前块(如当前CU或HEVC中PU和TU的重叠部分)的预测块。对于当前块的帧间预测,运动估计单元可执行运动搜索以识别一个或多个参考图片(如解码缓存中存储的一个或多个解码图片)中的一个或多个紧密匹配的参考块。具体来说,运动估计单元可根据诸如绝对差值和(SAD)、差平方值和(SSD)、平均绝对差值(MAD)、平均平方差值(MSD)等来计算表示潜在参考块与当前块相似程度的值,运动估计单元通常可使用当前块与所考虑的参考块之间的逐样点差值来执行这些计算。运动估计单元可识别具有从这些计算产生的最低值的参考块,从而指示与当前块匹配最紧密的参考块。
运动估计单元可形成一个或多个运动矢量(MV),这些运动矢量界定参考图片中的参考块相对于当前图片中当前块位置的位置。运动估计单元可接着将运动矢量提供至运动补偿单元。例如,对于单向帧间预测,运动估计单元可提供单个运动矢量,而对于双向帧间预测,运动估计单元可提供两个运动矢量。运动补偿单元可接着使用运动矢量产生预测块。例如,运动补偿单元可使用运动矢量来检索参考块的数据。作为另一示例,如果运动矢量具有分数样点精度,那么运动补偿单元可根据一个或多个内插滤波器对预测块进行插值。此外,对于双向帧间预测,运动补偿单元可检索由相应运动矢量识别的两个参考块的数据,并通过逐样点平均或加权平均等来组合检索的数据。
作为另一示例,对于帧内预测,帧内预测单元可从与当前块相邻的样点产生预测块。例如,对于方向模式,帧内预测单元通常可数学地组合相邻样点的值,且在当前块上沿界定的方向填充这些计算值,以产生预测块。作为另一示例,对于DC模式,帧内预测单元可计算与当前块相邻样点的平均值,且产生预测块以包括预测块每一样点的所得平均值。
对于帧内块复制模式编码、仿射模式编码和线性模型(LM)模式编码等其他视频编解码技术,举例来说,模式选择单元可经由与编解码技术相关联的相应单元产生正被编码的当前块的预测块。在一些示例中,如调色板模式编码,模式选择单元可不产生预测块,而是产生指示根据选定调色板重构块的方式的语法元素。在这类模式中,模式选择单元可将这些语法元素提供至熵编码单元以进行编码。
如上所述,残差单元接收当前块和对应的预测块。残差单元随后生成当前块的残差块。为产生残差块,残差单元计算预测块和当前块之间的逐样点差值。
变换单元(图3中示出的“变换&采样&量化”)将一个或多个变换应用于残差块以产生变换系数的块(例如称为“变换系数块”)。变换单元可将各种变换应用于残差块以形成变换系数块。例如,变换单元可将离散余弦变换(DCT)、方向变换、卡洛变换(KLT)或概念上的相似变换应用于残差块。在一些示例中,变换单元可对残差块执行多个变换,例如,初级变换和次级变换,诸如旋转变换。在一些示例中,变换单元可以不将变换应用于残差块。
接着,变换单元可以量化变换系数块中的变换系数,以产生量化变换系数块。变换单元可根据与当前块相关联的量化参数(QP)值来量化变换系数块的变换系数。视频编码器200(如经由模式选择单元)可通过调整与CU相关联的QP值来调整应用于与当前块相关联的系数块的量化程度。量化可能会导致信息丢失,因此量化后的变换系数其精度可能比原始变换系数的精度低。
此外,编码器200还可以包括编码控制单元,以用于对编码过程中的操作产生控制信息。接着,逆量化和逆变换单元(图3中示出的“逆量化&逆变换”)可分别将逆量化和逆变换应用于量化变换系数块,以从变换系数块获得重构残差块。重构单元可基于重构残差块和由模式选择单元产生的预测块生成对应于当前块的重构块(尽管可能有某种程度的失真)。例如,重构单元可将重构残差块的样点添加至来自模式选择单元产生的预测块的相应样点,以生成重构块。
重构块可以经过滤波处理,例如图3中示出的环路滤波单元,以执行一个或多个滤波操作。例如,滤波处理可以包括去方块操作以减少沿CU边缘的块效应伪像。在一些示例中,可以跳过滤波处理的操作。
接着,经过诸如环路滤波之后,视频编码器200可以将重构块存储于解码缓存。在跳过滤波处理的示例中,重构单元可将重构块存储于解码缓存。在需要滤波处理的示例中,可以将经过滤波的重构块存储于解码缓存。运动估计单元和运动补偿单元可从解码缓存中检索由重构(且可能为滤波的)块形成的参考图片,以对随后编码的图片的块进行帧间预测。此外,帧内预测单元可使用当前图片的解码缓存中的重构块来对当前图片中的其他块进行帧内预测。
以上描述的操作是关于块的。此描述应被理解为用于亮度编码块和/或色度编码块的操作。如上所述,在一些示例中,亮度编码块和色度编码块是CU的亮度分量和色度分量。在一些示例中,亮度编码块和色度编码块是PU的亮度分量和色度分量。
一般来说,熵编码单元可对从视频编码器200的其他功能组件接收的语法元素进行熵编码。例如,熵编码单元可对来自变换单元的量化变换系数块进行熵编码。例如,熵编码单元可以对来自模式选择单元的预测语法元素(如帧间预测的运动信息或帧内预测的帧内模式信息)进行熵编码,以生成熵编码数据。例如,熵编码单元可以对数据执行上下文自适应可变长度编码(CAVLC)操作、上下文自适应二进制算术编码(CABAC)操作、可变长度编码操作、基于语法的上下文自适应二进制算术编码(SBAC)操作、概率区间分割熵(PIPE)编码操作、指数哥伦布编码操作或其他类型的熵编码操作。在一些示例中,熵编码单元可以在语法元素未被熵编码的旁路模式中操作。视频编码器200可输出包括重构条带或图片的块所需的熵编码语法元素的比特流。
图4是示出根据本公开一些实施例的示例视频解码器的框图,例如,图4中示出的解码器可以是图2中的视频解码器300。可以理解的是,提供图4是为了解释,而非是对在本公开中广泛示例和描述的技术的限制。为了解释的目的,根据HEVC技术来描述视频解码器300。然而,本公开技术可由配置成其他视频编解码标准的视频解码设备执行。
可以理解的是,在实际应用中视频解码器300的基本结构可以与图3中示出的视频编码器类似,从而使得编码器200、解码器300均包括视频编码及解码组件,即编码器200、解码器300均可以实现视频编码和解码过程。在此种情形下,编码器200和解码器300可以统称为编解码器。因此,由编码器200和解码器300组成的系统可以支持设备之间的单向或双向视频传输,如可用于视频流、视频回放、视频广播或视频电话通信。可以理解的是,视频解码器300可以包括相比于图4中示出的组件更多、更少或不同的功能组件。为解释的目的,图4中示出了与根据本公开一些实施例的解码转换过程相关的组件。
在图4的示例中,视频解码器300包括存储器、熵解码单元、预测处理单元、逆量化和逆变换单元(图4中示出的“逆量化&逆变换单元”)、重构单元、滤波器单元、解码缓存以及比特深度逆变换单元。其中,预测处理单元可以包括运动补偿单元和帧内预测单元。预测处理单元例如还可以包括加法单元以根据其他预测模式执行预测。作为示例,预测处理单元可以包括调色板单元、帧内块复制单元(其可形成运动补偿单元的一部分)、仿射单元、线性模型(LM)单元等。在其他示例中,视频解码器300可包括更多、更少或不同的功能组件。
如图4所示,首先,解码器300可以接收包括经编码的视频数据的比特流。例如,图4中的存储器可以称为编解码图片缓冲器(CPB),以存储包括该经编码的视频数据的比特流,该比特流用于等待由视频解码器300的组件进行解码。存储在CPB中的视频数据例如可从计算机可读介质110(图2)等处获得。此外,CPB还可以存储例如视频解码器300各个单元的输出的临时数据。解码缓存通常存储解码图片,当对比特流的后续数据或图片进行解码时,视频解码器300可将解码图片输出和/或用作参考视频数据。CPB存储器和解码缓存可由多种存储器设备形成,如包括同步DRAM(SDRAM)、磁阻RAM(MRAM)、电阻RAM(RRAM)的动态随机存取存储器(DRAM)或其他类型的存储器设备。CPB存储器和解码缓存可以由同一存储设备或不同的存储设备提供。在各种示例中,CPB存储器可与视频解码器300的其他组件位于同一芯片,如图所示,也可与其他组件不位于同一芯片。
图4中所示的各种单元经示出以帮助理解由视频解码器300执行的操作。这些单元可实现为固定功能电路、可编程电路或两者的组合。与图3类似,固定功能电路是指提供特定功能并预先设置在可执行操作上的电路。可编程电路是指可经编程以执行多种任务并在可执行操作中提供灵活功能的电路。例如,可编程电路可执行使可编程电路以软件或固件的指令所界定的方式操作的软件或固件。固定功能电路可执行软件指令(来接收参数或输出参数等),但固定功能电路执行的操作类型通常是固定的。在一些示例中,一个或多个单元可以为不同的电路块(固定功能电路块或可编程电路块),且在一些示例中,一个或多个单元可以为集成电路。
视频解码器300可包括ALU、EFU、数字电路、模拟电路和/或由可编程电路形成的可编程核心。在视频解码器300的操作由在可编程电路上执行的软件执行的示例中,片上或片外存储器可以存储视频解码器300接收和执行的软件的指令(如目标代码)。
接着,熵解码单元可以对接收的比特流进行熵解码,以从中解析出对应于图片的编码信息。
接着,解码器300可以根据解析的编码信息进行解码转换处理,用于生成显示视频数据。根据本公开的一些实施例,位于解码端的解码器300可执行的操作可以参考如图4中示出的解码转换处理,该解码转换处理可以理解为包括一般的解码处理,以生成用于由显示设备进行显示的显示图片。
在如图4所示的解码器300中,熵解码单元可以从诸如存储器120接收包括经编码的视频的比特流,并对其进行熵解码以再现语法元素。逆量化和逆变换单元(图4中示出的“逆量化&逆变换”)、重构单元和滤波器单元可以基于从比特流提取的语法元素产生经解码的视频,例如,生成解码图片。
一般来说,视频解码器300逐块地重构图片。视频解码器400可对每一个块单独执行重构操作,其中当前正被重构(即解码)的块可被称作“当前块”。
具体的,熵解码单元可对限定量化变换系数块的量化变换系数的语法元素以及量化参数(QP)和/或变换模式指示等变换信息进行熵解码。逆量化和逆变换单元可使用与量化变换系数块相关联的QP来确定量化程度,且同样可确定要应用的逆量化程度。例如,逆量化和逆变换单元可执行逐比特左移操作以对量化变换系数进行逆量化。逆量化和逆变换单元由此可形成包括变换系数的变换系数块。在形成变换系数块之后,逆量化和逆变换单元可将一个或多个逆变换应用于变换系数块以产生与当前块相关联的残差块。例如,逆量化和逆变换单元可将逆DCT、逆整数变换、逆卡洛变换(KLT)、逆旋转变换、逆方向变换或其他逆变换应用于系数块。
此外,预测处理单元根据由熵解码单元进行熵解码的预测信息语法元素产生预测块。例如,如果预测信息语法元素指示当前块是帧间预测的,则运动补偿单元可产生预测块。在此情况下,预测信息语法元素可以指示解码缓存中的参考图片(从此参考图片中检索参考块),以及指示识别参考图片中的参考块相对于当前图片中当前块位置的运动矢量。运动补偿单元通常可以以基本上类似于关于图3中的运动补偿单元所描述的方式来执行帧间预测过程。
作为另一示例,如果预测信息语法元素指示对当前块进行帧内预测,则帧内预测单元可根据由预测信息语法元素指示的帧内预测模式来产生预测块。同样,帧内预测单元通常可以以基本上类似于关于图3中的帧内预测单元所描述的方式来执行帧内预测过程。帧内预测单元可从解码缓存中检索当前块相邻样点的数据。
重构单元可使用预测块和残差块重构当前块。例如,重构单元可将残差块的样点添加至预测块的对应样点以重构当前块。
接着,滤波器单元可对重构块执行一个或多个滤波器操作。例如,滤波器单元可执行去块操作以减少沿重构块边缘的块效应伪像。可以理解的是,滤波操作不必在所有示例中执行,即,在一些情况下可以跳过滤波操作。
视频解码器300可将重构块存储在解码缓存中。如上所述,解码缓存可向诸如运动补偿、运动估计单元提供参考信息,如用于帧内预测的当前图片的样点和用于后续运动补偿的先前解码图片的样点。此外,视频解码器300可输出来自解码缓存的解码图片以供后续呈现在显示设备(如图2的显示设备118)上。
图5为本公开至少一个实施例提供的全帧内配置(All Intra,AI)的编码结构示意图。
例如,如图5所示,在全帧内AI配置中,视频中的所有帧在编码过程均按I帧编码,即编解码过程完全独立,并且对其他帧不存在依赖关系。同时,编码过程的量化参数(QP)不随编码位置波动,均与首帧的QP值(QPI)相等。如图5所示,在全帧内AI配置中,视频中的所有帧的播放顺序和编码顺序相同,即视频帧的播放顺序计数(POC)与编码顺序计数(EOC)相同。
图6为本公开至少一个实施例提供的低延迟(LD)配置的编码结构示意图。
通常,在实际应用中,低延迟LD配置主要适用于低延时需求的实时通信环境。如图6所示,在LD配置下,所有P帧或B帧均采用广义P/B帧预测,并且所有帧的EOC仍与POC一致。对于低延时配置提出了“1+x”方案,其中“1”为一幅最近邻参考帧,“x”为x幅高质量的参考帧。
图7A为本公开至少一个实施例提供的一种帧间预测编码的示意图。图7B为本公开至少一个实施例提供的一种帧间预测技术的示意流程图。
视频预测编码的主要思想是通过预测来消除像素间的相关性。根据参考像素位置的不同,视频预测编码技术主要分为两大类:(1)帧内预测,即利用当前图像(当前视频帧)内已编码像素生成预测值;(2)帧间预测,即利用当前图像(当前视频帧)之前已编码图像的重建像素生成预测值。帧间预测编码是指利用视频时间域的相关性,使用邻近已编码图像像素预测当前图像的像素,以达到有效去除视频时域冗余的目的。如图7A和图7B所示,HEVC/H265标准中帧间预测编码算法是通过将已编码的图像作为当前图像的参考图像,来获得当前图像的各个块在参考图像中的运动信息,运动信息通常用运动矢量和参考帧索引表示。参考图像可以是前向、后向或者双向的。使用帧间编码技术得到当前块的信息时,可以从邻近块直接继承运动信息,也可以通过运动估计在参考图像中搜索匹配块得到对应的运动信息。接着通过运动补偿过程得到当前块的预测值。
当前图像的每个像素块在之前已编码图像中寻找一个最佳匹配块,该过程称为运动估计。用于预测的图像称为参考图像,参考块到当前块(即当前像素块)的位移称为运动矢量,当前块与参考块的差值称为预测残差。
视频编解码标准中定义了三种类型的图像:I帧图像、P帧图像和B帧图像。I帧图像仅能使用帧内编码,P帧图像和B帧图像可以使用帧间预测编码。P帧图像的预测方式是由前一帧图像预测当前图像,这种方式称为“前向预测”。也就是在前向参考图像中寻找当前块的匹配块(参考块)。B帧图像可以使用3种预测方式:前向预测、后向预测以及双向预测。
帧间预测编码依赖于帧间的关联性,包括运动估计、运动补偿等过程。例如,一些示例中,帧间预测的主要过程包括如下步骤:
步骤1:创建运动矢量(MV)候选列表,进行拉格朗日率失真优化(RDO)计算,选取失真最小的MV作为初始MV;
步骤2:在步骤1中找到匹配误差最小的点作为接下来搜索的起始点;
步骤3:步长从1开始,以2的指数递增,进行8点钻石搜索,该步骤中可以设置搜索的最大次数(以某个步长遍历一遍就算1次);
步骤4:如果通过步骤3搜索得到的最佳步长为1,则需要以该最佳点为起始点做1次两点钻石搜索,因为前面8点搜索的时候,这个最佳点的8个邻点会有两个没有搜索到;
步骤5:如果步骤3搜索得到的最佳步长大于某个阈值(iRaster),则以步骤2得到的点作为起始点,做步长为iRaster的光栅扫描(即在运动搜索的范围内遍历所有点);
步骤6:在经过前面步骤1-5之后,以得到的最佳点为起始点,再次重复步骤3和4;
步骤7:保存与最佳匹配点对应的MV作为最终MV和绝对误差和(SAD)。
在视频编解码标准H.265/HEVC中,帧间预测技术主要包括采用了时域和空域运动视频预测思想的Merge和高级运动矢量预测AMVP技术。这两种技术的核心思想都是通过建立一个候选运动矢量预测MV列表,并选取性能最优的一个MV作为当前编码块的预测MV。例如,在Merge模式中,为当前预测单元(PU)建立一个MV候选列表,列表中存在5个候选MV(及其对应的参考图像)。通过遍历这5个候选MV,并进行率失真代价的计算,最终选取率失真代价最小的一个作为该Merge模式的最优MV。若编/解码端依照相同的方式建立该候选列表,则编码器只需要传输最优MV在候选列表中的索引即可,这样大幅节省了运动信息的编码比特数。
视频编解码标准VVC沿用了HEVC中运动矢量预测技术,但又进行了一些优化。例如扩展Merge运动矢量候选列表的长度,修改候选列表构造过程等,同时也增加了一些新的预测技术。例如仿射变换技术,自适应运动矢量精度技术等。
图8A和图8B示出了本公开至少一个实施例提供的仿射运动补偿的示意图。图8A示出的是2个控制点的仿射变换,即当前块的仿射运动矢量由2个控制点(4个参数)生成。图8B示出的是3个控制点的仿射变换,即当前块的仿射运动矢量由3个控制点(6个参数)生成。
对于4参数仿射运动模型,中心像素为(x,y)的子块运动矢量的计算方法如下:
MVh=(bh-ah)/W*x+(bv-av)/w*y+ah
MVv=(bv-av)/W*x+(bh-ah)/w*y+av
其中,参考点矢量a,b在二维空间可以分别表示为(ah,av)、(bh,bv),当前中心像素2点4参数仿射运动模型的水平方向预测矢量为(MVh,MVv),可以用a、b矢量表示为如下(同理可知中心像素2点4参数仿射运动模型的垂直方向预测),其中w和h分别代表当前块的宽度和高度。
如果使用a、b、c三个控制点,即6参数仿射运动模型,中心像素为(x,y)的子块的运动矢量计算方法如下:
MVh=(bh-ah)/W*x+(ch-ah)/h*y+ah
MVv=(bv-av)/W*x+(cv-av)/h*y+av
6参数仿射运动模型相比于4参数仿射运动模型增加了一个参考点c,运动矢量表示为(ch,cv)。
图9为本公开至少一个实施例提供的一种视频数据处理方法的示意图。
例如,在本公开至少一个实施例中,提供了一种视频数据处理方法10。视频数据处理方法10可以应用于与视频编解码相关的各种应用场景,例如可以应用于手机、计算机等终端,又例如应用于视频网站/视频平台等,本公开的实施例对此不作具体限制。例如,如图9所示,视频数据处理方法10包括以下操作S101至S102。
步骤S101:对于视频的当前视频块,确定使用第一帧间预测模式进行编解码。
步骤S102:基于确定,执行当前视频块与视频的比特流之间的转换。在第一帧间预测模式中,当前视频块的运动矢量的推导基于视频中的与第一显示模式对应的基础区域。
例如,在本公开至少一个实施例中,视频可以是拍摄的摄像作品、从网络下载的视频、或者本地存储的视频等,也可以是LDR视频、SDR视频等,本公开的实施例对此不作任何限制。
例如,在本公开至少一个实施例中,视频的第一显示模式可以定义视频画面从某一展开起始位置,沿着某一展开方向逐渐增大(例如,水平方向或者垂直方向)。例如,在一些示例中,在视频的比特流中存在与第一显示模式相关联的一个或多个语法元素。例如,在一些示例中,视频画面从左到右展开(即水平方向),又例如,在一些示例中,视频画面从上到下展开(即垂直方向),反之亦然。例如,在一个示例中,视频的第一显示模式定义了基础区域的尺寸、位置等信息。需要说明的是,“第一显示模式”并不受限于特定的某一个或一些显示模式,也不受限于特定的顺序。
例如,在本公开的至少一个实施例中,对于视频的当前视频块可以选择任意帧间预测模式。例如,在H.265/HEVC标准中,当前视频块可以选择Merge模式、AMVP模式来进行帧间预测编码。例如,对于Merge模式,为当前块构建运动矢量MV候选列表,该候选列表包括5个候选MV。这5个候选MV通常包括空域和时域两种类型。空域最多提供4个候选MV,时域最多提供1个候选MV。若当前MV候选列表中候选MV的个数达不到5个,则需要使用零矢量(0,0)进行填补已达到规定的数目。类似于Merge模式,AMVP模式下构建的MV候选列表也包含空域和时域两种情形,不同的是AMVP列表长度仅为2。
H.266/VVC标准对Merge模式的候选列表的大小进行了扩展,最多可以有6个候选MV。VVC标准还引入了新的帧间预测技术,例如,仿射预测模式、帧内帧间组合预测(CIIP)模式、几何划分预测模式(TPM)、双向光流(BIO)方法、双向加权预测(BCW)、带运动矢量差的Merge模式等。在本公开的实施例中,对于当前视频块,可用的帧间预测模式可以包括上述帧间预测模式中的任意一个,本公开的实施例对此不作限制。
例如,在本公开至少一个实施例中,为了获得较好的压缩性能同时保持图像的质量,使用率失真优化(RDO)来选择最佳的帧间预测模式和运动矢量。例如,在本公开的实施例中,“第一帧间预测模式”用于指示用于当前视频块的帧间预测模式。需要说明的是,“第一帧间预测模式”并不受限于特定的某一个或一些帧间预测模式,也不受限于特定的顺序。
例如,在本公开至少一个实施例中,第一帧间预测模式可以是上述可用的帧间预测模式中的任一种,例如Merge模式、高级运动矢量预测AMVP模式、带运动矢量差的Merge模式、双向加权预测模式或者仿射预测模式等,本公开的实施例对此不作限制。
例如,在本公开的至少一个实施例中,对于步骤S102,当前视频块与比特流之间的转换可以包括将当前视频块编码到比特流,也可以包括从比特流解码当前视频块。例如,该转换过程可以包括编码过程,也可以包括解码过程,本公开的实施例对此不作限制。
例如,在本公开的至少一个实施例中,在第一帧间预测模式中,当前视频块的运动矢量的推导基于视频中的与第一显示模式对应的基础区域。例如,在一些示例中,当前视频块的运动矢量与基础区域的位置和/或尺寸有关。例如,在一些示例中,当前视频块所使用的参考像素被限制在特定区域内。如此,通过限制当前视频块的帧间预测模式只能采用有效区域内的参考像素,使得可以根据实际显示的显示区域,对视频的比特流进行部分解码,从而减少未显示部分的解码资源消耗,有效提升视频编解码的效率,进而提升用户的产品使用体验。
例如,在本公开的实施例中,基础区域是指视频中始终会显示的区域,可以通过视频的第一显示模式来确定。例如,在本公开的实施例中,从第一显示模式定义的视频展开起始位置沿着展开方向的具有一定长度的固定区域作为基础区域(本文中也称为第一显示区域)。例如,随视频的第一显示模式不同,基础区域的位置可能会变化。例如,在图1所示的示例中,视频的第一显示模式定义视频画面从最左侧虚线处,沿着从左到右的方向逐渐增大。在这种情况下,固定区域是从视频画面的最左侧开始,从左到右延伸一定长度的区域。例如,在一些示例中,视频的第一显示模式定义视频画面从最上方开始,沿着从上到下的方向逐渐增大。在这种情况下,固定区域是从视频画面的最上方开始,从上到下延伸一定长度的区域。例如,在一些示例中,基础区域与整个显示区域的长度比为1:1。又例如,在一些示例中,基础区域与整个显示区域的长度比为1:2。本公开的实施例对基础区域的位置/尺寸不作限制,可以根据实际情况来设置。
例如,在本公开至少一个实施例中,可以通过与第一显示模式相关的一个或多个语法元素来定义第一显示模式的应用、视频的展开方向、基础区域的尺寸等,本公开的实施例对此不作限制,可以根据实际情况来设置。
图10为本公开至少一个实施例提供的一种滑卷屏的视频编解码的示意图。
例如,如图10所示,在本公开至少一个实施例中,当上述视频数据处理方法10应用于带有滑卷屏的显示装置中时,在滑卷屏的卷收过程中,对应于卷起区域的视频内容可以不显示,从而与卷起区域对应的比特流可以不用解码。在图10中,斜线区域表示未参与解码的区域。当滑卷屏从全展开状态逐渐卷起时,未参与解码的区域的尺寸也逐渐变大。因此,根据滑卷屏的实际显示区域,对接收到的比特流进行部分解码,达到节省解码过程资源消耗的技术效果,从而提升带有滑卷屏的产品的续航能力,提升用户的产品体验感。
例如,在本公开至少一个实施例中,如图10所示,滑卷屏在卷收过程中,P帧图像的部分解码需要满足待显示的当前视频帧的MV预测范围应为参考帧的显示区域的子集。在现有编码标准设计的片(Tile)编码虽然能对画面进行区域划分,但只对Tile中包含的编码树单元CTU按扫描顺序编码以提升并行编解码的能力。由于Tile只对帧内预测的范围进行了限制,即帧内预测模式不会利用超出Tile范围的像素信息,然而,对于帧间编码和环路滤波模块,可能会超出Tile边界。这也使得Tile需要解码参考帧整帧的图像,不能实现有限范围的独立解码,因此,现有Tile编码不能适配于滑卷屏等产品的编码需求。基于上述技术问题,本公开的至少一个实施例提供了一种渐进式的编码结构。
图11为本公开至少一个实施例提供的一种当前视频帧的划分方式的示意图。
例如,在本公开至少一个实施例中,如图11所示,当前视频帧包括第一显示区域和沿着展开方向(从左到右)依次相邻布置的至少一个显示子区域。例如,在图11所示的示例中,第一显示区域表示为左侧的basic_tile区域,至少一个显示子区域表示为右侧的至少一个enhanced_tile区域。
例如,在一些示例中,第一显示区域可以为固定的显示区域,例如常规显示屏的固定显示区域。例如,在滑卷屏的应用场景中,如图10所示,第一显示区域是在滑卷屏的卷收过程中始终会显示的显示区域。例如,在一些示例中,第一显示区域为视频画面的一半区域,例如占据视频画面的二分之一的CTU数量。又例如,在一些示例中,第一显示区域为视频画面的三分之一区域,例如占据视频画面的三分之一的CTU数量。又例如,在一些示例中,第一显示区域为视频画面的整个区域,需要说明的是,本公开的实施例对此不作具体限制,可以根据实际需求来设置。还需要说明的是,在本公开的实施例中,“第一显示区域”用于指示固定显示的显示区域(即基础区域),并不受限于特定的某一个显示区域,也不受限于特定的顺序。
例如,在本公开至少一个实施例中,当前视频帧中除了第一显示区域以外的其他显示区域可以被平均划分成至少一个显示子区域,如图11所示的至少一个enhanced_tile。例如,随着滑卷屏的卷收状态不同,显示子区域的个数不同。需要说明的是,在本公开的实施例中,至少一个显示子区域的每一个大小相同,并且每一个显示子区域的宽度或者高度大于一个CTU。
图12A为本公开至少一个实施例提供的一种滑卷屏的转轴方向的示意图,图12B为本公开至少一个实施例提供的另一种滑卷屏的转轴方向的示意图。
例如,在本公开至少一个实施例中,视频的编码方向通常是在水平方向上从左到右,然后在垂直方向上从上到下。在图12A所示的示例中,滑卷屏的转轴方向视为与视频的编码方向垂直。例如,在本公开至少一个实施例中,当滑卷屏在水平方向卷展时,转轴从左到右移动,待显示区域越来越大。在图12B所示的示例中,滑卷屏的转轴方向视为与视频的编码方向平行。例如,在本公开至少一个实施例中,当滑卷屏在垂直方向卷展时,转轴从上到下移动,待显示区域越来越大。
例如,在图12A所示的示例中,当前视频帧可以包括最左侧的第一显示区域(basic_tile)和沿着展开方向(从左到右)依次相邻布置的至少一个显示子区域(enhanced_tile)。例如,在图12B所示的示例中,当前视频帧可以包括最上方的第一显示区域(basic_tile)和沿着展开方向(从上到下)依次相邻布置的第一显示区域下方的至少一个显示子区域(enhanced_tile)。
需要说明的是,在本公开的实施例中,“第一显示子区域”用于指示在基础区域/第一显示区域(basic_tile)右侧的显示子区域(enhanced_tile),“第二显示子区域”用于指示在基础区域/第一显示区域(basic_tile)下方的显示子区域(enhanced_tile)。“第一显示子区域”和“第二显示子区域”并不受限于特定的某一个或一些显示子区域,也不受限于特定的顺序。
例如,在本公开至少一个实施例中,响应于当前视频块位于当前视频帧的第一显示区域内,当前视频块的运动矢量在第一运动矢量预测范围内。
例如,在本公开至少一个实施例中,如图11所示,对于基础区域(basic_tile)采用独立的编码方式。例如,对于视频的每一帧视频帧,都需要解码基础区域basic_tile。对于位于basic_tile区域内的当前视频块,定义一个有效的MV预测范围(即第一MV预测范围),使得与当前视频块相关联的MV都在该MV预测范围内。
图13示出了本公开至少一个实施例提供的位于第一显示区域内的视频块的编码示意图。
例如,在本公开至少一个实施例中,第一运动矢量预测范围基于当前视频块的位置、运动矢量预测精度、第一显示区域的边界来确定。
例如,在一些示例中,如图13所示,第一显示区域basic_tile的四个边界分别表示为:左侧边界basic_tilel、右侧边界basic_tiler、上方边界basic_tilet和下方边界basic_tileb,当前视频块(例如,图13中所示的PU)的坐标为(x,y)。考虑到在不同编码器中MV预测精度不一样,MV预测精度可以表示为2-n,其中n为整数。例如,在本公开至少一个实施例中,可以通过如下等式(1)-(4)来确定第一MV预测范围:
Bleft=(basic_tilel-x+2n)<<n      等式(1)
Bright=(basic_tiler-x-2n)<<n      等式(2)
Btop=(basic_tilet-y+2n)<<n      等式(3)
Bbottom=(basic_tileb-y-2n)<<n     等式(4)
在上述等式(1)-(4)中,Bleft表示第一MV预测范围的左侧边界、Bright表示第一MV预测范围的右侧边界,Btop表示第一MV预测范围的上方边界、Bbottom表示第一MV预测范围的下方边界。例如,在本公开的一些实施例中,MV预测精度可以由语法元素来指示,也可以是默认精度等,本公开的实施例对此不作限制。
例如,在本公开至少一个实施例中,通过等式(1)-(4)限定的第一MV预测范围既是当前视频块的初始MV(对应于搜索起始点)的限制范围,也是当前视频块的最终MV(对应最佳匹配点)的限制范围。如此,确保当前视频块的最终MV在相应的MV预测范围(第一MV预测范围)内。
例如,在本公开至少一个实施例中,响应于当前视频块位于基础区域内,当前视频块使用的所有参考像素在基础区域内。该基础区域同样可以由上述等式(1)-(4)来限定。
例如,在本公开至少一个实施例中,对于位置为(x,y)并且位于basic_tile内的当前视频块,相关联的所有运动矢量都应满足以下等式(5)和(6)。

在等式(5)和等式(6)中,mv_x表示MV的水平分量,mv_y表示MV的垂直分量。例如,基于上述等式(1)至(6),获取位于basic_tile内的当前视频块的第一MV预测范围,从而判断与当前视频块相关的MV是否溢出边界。
例如,在本公开至少一个实施例中,对于当前视频块的MV候选列表,可以判断候选列表中每一个MV是否在相应的MV预测范围内。例如,在一些示例中,如果判断某一个MV不在该MV预测范围内,从候选列表中移除该MV,不会选择该MV作为初始MV,以保证MV的搜索起始点在相应的MV预测范围内。
图14为本公开至少一个实施例提供的当前视频块位于第一显示区域的边界的示意图。
例如,在本公开至少一个实施例中,在选择初始MV确定搜索起始点的过程中,无论是以Merge方式还是以AMVP方式进行帧间预测,当创建时域候选列表时,需要使用时域上相邻的编码帧的相应位置处的参考块的运动信息。如图14所示,在时域候选列表构建中,通常使用参考块H的运动信息,若参考块H不可用,则用参考块C进行替换。
例如,在图14所示的示例中,在当前视频块处于第一显示区域basic_tile的右侧边界时,位于位置H的参考块的运动信息无法获取(因为未解码)。因此,在编码过程中,可以赋给参考块H的运动信息一个很大的误差值,使得该运动信息不会成为最优的候选MV,即选择参考块C的运动信息作为时域候选列表。
例如,在本公开至少一个实施例中,当前视频帧包括第一显示区域和沿着展开方向依次相邻布置的至少一个显示子区域,并且展开方向为从左到右。响应于当前视频块位于当前视频帧中的第一显示区域右侧的第一显示子区域内,当前视频块的运动矢量被限制在第二运动矢量预测范围内。
例如,在本公开至少一个实施例中,第二运动矢量预测范围基于当前视频块的位置、运动矢量预测精度、第一显示区域的边界和第一显示子区域的宽度来确定。
例如,在本公开至少一个实施例中,如图12A所示,当前视频帧包括在左侧的第一显示区域basic_tile和在右侧的至少一个显示子区域enhanced_tile。每个显示子区域enhanced_tile的宽度不小于一个CTU的宽度。
例如,在一些示例中,显示子区域enhanced_tile的MV预测范围(第二MV预测范围)的左侧边界为当前视频帧的basic_tile的左侧边界,第二MV预测范围的右侧边界为当前视频块所在的enhanced_tile的左侧邻近enhanced_tile的右边界。例如,在当前视频块所在的enhanced_tile的左侧不存在另一个enhanced_tile的情况下,第二MV预测范围的右侧边界为basic_tile的右侧边界。
例如,在本公开至少一个实施例中,通过以下等式(7)至(12)来确定位于第一显示区域basic_tile右侧的第一显示子区域enhanced_tile内的当前视频块的MV预测范围,即第二MV预测范围。
leftk=(basic_tilel-x+2n)<<n     等式(7)
rightk=(basic_tiler+(k-1)*sen-x-2n)<<n    等式(8)
topk=(basic_tilet-y+2n)<<n      等式(9)
bottomk=(basic_tileb-y-2n)<<n       等式(10)
在等式(7)至(12),x,y表示当前视频块(例如PU)的位置,第一显示区域basic_tile的四个边界分别表示为:basic_tilel、basic_tiler、basic_tilet和basic_tileb,sen表示第一显示子区域enhanced_tile的宽度,MV的预测精度为2-n,n为整数。左侧边界leftk、右侧边界rightk、上方边界topk、下方边界bottomk表示在basic_tile右侧的、从左到右方向上的第k个enhanced_tile的MV边界,k为不小于0的整数。即该四个MV边界构成位于显示区域basic_tile右侧的显示子区域enhanced_tile的当前视频块的MV预测范围,即第二MV预测范围。
与等式(5)和(6)类似,对于位于第一显示区域basic_tile右侧的、从左到右方向上的第k个enhanced_tile内的当前视频块,与当前视频块相关联的所有MV都应满足等式(11)和(12)的限制。

在等式(10)和等式(11)中,mv_x表示MV的水平分量,mv_y表示MV的垂直分量。例如,基于上述等式(11)至(12),从而判断与当前视频块相关的MV是否溢出边界。
例如,在本公开至少一个实施例中,响应于当前视频块位于第一显示区域右侧的第k个显示子区域内并且k=1,第二MV预测范围等于第一MV预测范围。
例如,在本公开至少一个实施例中,响应于当前视频块位于第一显示区域右侧的第k个显示子区域内并且k为大于1的整数,第一MV预测范围的第一右侧边界与第二MV预测范围的第二右侧边界不同。
需要说明的是,在本公开的实施例中,“第一右侧边界”用于指示第一MV预测范围的右侧边界,“第二右侧边界”用于指示第二MV预测范围的右侧边界。“第一右侧边界”和“第二右侧边界”并不受限于特定的某一个或一些边界,也不受限于特定的顺序。
例如,基于上述等式(8)和等式(2)可知,当k=1时,第二MV预测范围的右侧边界rightk等于第一MV预测范围的右侧边界Bright,第一MV预测范围等于第二MV预测范围。当k为大于1的整数时,rightk不等于Bright。基于等式(1)至等式(4)和等式(7)至等式(10)可知,除了右侧边界以外,第一MV预测范围其他三条边界分别等于第二MV预测范围其他三条边界。
例如,在本公开至少一个实施例中,当前视频帧包括第一显示区域和沿着展开方向依次相邻布置的至少一个显示子区域,并且展开方向为从上到下。响应于当前视频块位于第一显示区域下方的第二显示子区域内,当前视频块的运动矢量被限制在第三运动矢量预测范围内。
例如,在本公开至少一个实施例中,第三运动矢量预测范围基于当前视频块的位置、运动矢量预测精度、第一显示区域的边界、第二显示子区域的高度来确定。
例如,在本公开至少一个实施例中,如图12B所示,当前视频帧包括在上方的第一显示区域basic_tile和在下方的至少一个显示子区域enhanced_tile。每个显示子区域enhanced_tile的高度不小于一个CTU的高度。
例如,在本公开至少一个实施例中,通过以下等式(13)至(16)来确定位于第一显示区域basic_tile下方的第二显示子区域enhanced_tile内的当前视频块的MV预测范围,即第三MV预测范围。
leftm=(basic_tilel-x+2n)<<n        等式(13)
rightm=(basic_tiler-x-2n)<<n      等式(14)
topm=(basic_tilet-y+2n)<<n       等式(15)
bottomm=(basic_tileb+(m-1)*sen-y-2n)<<n    等式(16)
例如,位于在basic_tile下方的、从上到下方向上的第m个enhanced_tile的当前视频块的MV预测范围,即第三MV预测范围,由等式(13)-(16)定义的左侧边界leftm、右侧边界rightm、上方边界topm、下方边界bottomm来限定。在等式(13)至(16),x,y表示当前视频块(例如PU)的位置,第一显示区域basic_tile的四个边界分别表示为:basic_tilel、basic_tiler、basic_tilet和basic_tileb,sen表示第二显示子区域enhanced_tile的高度,MV的预测精度为2-n,n为整数,m为不小于0的整数。
与等式(11)和(12)类似,在当前视频块位于第一显示区域basic_tile下方的第m个enhanced_tile时,与当前视频块相关联的所有MV都应满足等式(17)和(18)的限制。

在等式(17)和等式(18)中,mv_x表示MV的水平分量,mv_y表示MV的垂直分量。例如,基于上述等式(17)至(18),从而判断与当前视频块相关的MV是否溢出边界。
例如,在本公开至少一个实施例中,响应于当前视频块位于第一显示区域下方的第m个第二显示子区域内并且m=1,第三运动矢量预测范围等于第一运动矢量预测范围。
例如,在本公开至少一个实施例中,响应于当前视频块位于第一显示区域下方的第m个第二显示子区域内并且m为大于1的整数,第一运动矢量预测范围的第一下方边界与第三运动矢量预测范围的第三下方边界不同。
需要说明的是,在本公开的实施例中,“第一下方边界”用于指示第一MV预测范围的下方边界,“第三下方边界”用于指示第三MV预测范围的下方边界。“第一下方边界”和“第三下方边界”并不受限于特定的某一个或一些边界,也不受限于特定的顺序。
例如,基于上述等式(4)和等式(16)可知,当m=1时,第三MV预测范围的下方边界bottomm等于第一MV预测范围的下方边界Bbottom,第一MV预测范围等于第三MV预测范围。当k为大于1的整数时,bottomm不等于Bbottom。基于等式(1)至等式(4)和等式(13)至等式(16)可知,除了下方边界以外,第一MV预测范围其他三条边界分别等于第三MV预测范围其他三条边界。
例如,在本公开至少一个实施例中,响应于当前视频块位于当前视频帧的第一显示区域或者基础区域以外,运动矢量预测候选列表中的时域候选运动矢量预测值使用空域候选运动矢量预测值来计算。
例如,在一些示例中,对于当前视频块位于第一显示区域/基础区域以外的情况(例如,位于第一显示区域的右侧或者下方),无论是采用Merge还是AMVP模式,在构建时域候选列表的过程中无法获取参考块H和参考块C的运动信息。例如,在一些示例中,按照空域候选列表顺序选取第一个参考块的运动矢量比例伸缩MV加入到时域候选列表中,如以下等式(19)所示。
其中MVx_ref表示在位置X处的参考块的运动矢量,其中x=A0、A1、B0、B1、B2。td与tb分别表示当前视频块与X参考块分别到各自的参考图像之间的距离。
需要说明的是,本公开的实施例并不限制具体采用哪一个空域候选运动矢量预测值来取代时域候选运动矢量预测值,可以根据实际需求来设置。
例如,在本公开至少一个实施例中,执行当前视频帧的当前视频块与视频的比特流之间的转换可以包括解码过程。例如,在一些示例中,将接收到的比特流进行全部解码,以供显示。又例如,在一些示例中,在显示终端只进行部分显示的情况下,例如,显示终端具有图1所示的滑卷屏,仅需要对接收到的比特流进行部分解码,从而减少解码资源的使用,提高视频的编解码效率。
图15为本公开至少一个实施例提供的另一种视频数据处理方法的示意图。
例如,在本公开至少一个实施例中,提供了另一种视频数据处理方法30。视频数据处理方法30可以应用于与视频解码相关的各种应用场景(即,应用于解码端)。例如,如图15所示,视频数据处理方法30包括以下操作S301至S303。
步骤S301:接收视频的比特流。
步骤S302:确定视频的当前视频块使用第一帧间预测模式进行编解码。
步骤S303:基于确定,对比特流进行解码,在第一帧间预测模式中,当前视频块的运动矢量的推导基于视频中的与第一显示模式对应的基础区域。
例如,在本公开至少一个实施例中,对于解码侧,基于接收到的视频的比特流,可以确定视频是否应用第一显示模式以及相应的视频展开方向。例如,在一些示例中,当接收到的比特流中包括语法元素“enhanced_tile_enabled_hor”(或者该语法元素的值为1)时,则当前视频应用了第一显示模式,并且展开方向为水平方向(例如从左到右)。又例如,在另一些示例中,当接收到的比特流中包括语法元素“enhanced_tile_enabled_ver”(或者该语法元素的值为1)时,则当前视频应用了第一显示模式,并且展开方向为垂直方向(例如从上到下)。
需要说明的是,在本公开的实施例中,对于解码过程,第一显示模式的应用不仅仅基于接收到的比特流中的相关语法元素来确定,还会考虑显示终端的实际情况。例如,在一些示例中,当显示终端的视频显示方式与比特流中标识的第一显示模式不匹配时,则不应用第一显示模式。例如,当比特流中的相关语法元素指示当前视频应用第一显示模式并且展开方向为水平方向,同时,显示终端的视频显示方式为在垂直方向上卷展时,则确定对于当前视频不应用第一显示模式。又例如,当比特流中的相关语法元素指示对于当前视频应用第一显示模式并且展开方向为水平方向,同时,显示终端的视频显示方式常规显示(例如全屏显示),无需卷展时,则确定对于当前视频不应用第一显示模式。本公开的实施例对此不作具体限制,可以根据实际情况来设置。
例如,在本公开至少一个实施例中,对于步骤S303,对比特流进行解码包括:确定当前视频帧的待解码区域;基于待解码区域,对比特流进行解码。待解码区域至少包括与基础区域对应的第一显示区域。
例如,在本公开至少一个实施例中,如图11所示,当前视频帧的编码主要包括基础区域/第一显示区域basic_tile的编码和从左到右依次相邻布置的至少一个显示子区域enhanced_tile的编码。例如,视频中的每一帧视频帧的basic_tile区域都需要解码,enhanced_tile区域的解码数量可以根据显示区域的尺寸和上一帧解码enhanced_tile数量来确定。例如,在一些示例中,当前视频帧最多只能比相应的参考帧多解码一个enhanced_tile。
例如,在本公开至少一个实施例中,只需要对待显示的显示区域进行解码,因此需要限定解码过程中的待解码区域。例如,在本公开的实施例中,待解码区域以编码单元边界为界限,或者以编码单元作为待解码区域的单位。需要说明的是,在本公开的实施例中,以编码单元为编码树单元CTU作为示例来描述。
例如,在本公开至少一个实施例中,可以基于当前视频帧的待显示的像素数量(lt)和视频编码单元CTU数量(nt)、第一显示区域(basic_tile)的CTU数量(nbasic_tile)、前一视频帧的已显示的像素数量(lt-1)和前一视频帧的已解码区域的CTU数量(bt-1)中的至少一个确定待解码区域(dect)。
例如,在本公开至少一个实施例中,响应于当前视频帧待显示的CTU数量(nt)大于前一视频帧的已解码区域的CTU数量(bt-1),或者响应于当前视频帧待显示的CTU数量(nt)等于前一视频帧的已解码区域的CTU数量(bt-1),并且当前视频帧待显示的像素数量(lt)大于前一视频帧的已显示的像素数量(lt-1),确定待解码区域(dect)包括前一视频帧的已解码区域(dect-1)和一个新的显示子区域(enhanced_tile)。
例如,在本公开至少一个示例中实施例中,当前视频帧最多只比前一视频帧多解码一个显示子区域enhanced_tile,即多解码一个新的显示子区域enhanced_tile。
例如,在本公开至少一个实施例中,响应于当前视频帧待显示的CTU数量(nt)大于第一显示区域的CTU数量(nbasic_tile),并且当前视频帧待显示的CTU数量(nt)小于前一视频帧的已解码区域的CTU数量(bt-1),或者响应于当前视频帧待显示的CTU数量(nt)大于第一显示区域的CTU数量(nbasic_tile),当前视频帧待显示的CTU数量(nt)等于前一视频帧的已解码区域的CTU数量(bt-1),并且当前视频帧待显示的像素数量(lt)不大于前一视频帧的已显示的像素数量(lt-1),确定当前视频帧的待解码区域(dect)包括当前视频帧的待显示区域。例如,该当前视频帧的待显示区域包括当前视频帧待显示的像素数量(lt)。
例如,在本公开至少一个实施例中,可以根据以下等式来确定待解码区域(dect),如下所示:
例如,若nbasic_tile<nt<bt-1或者nt=bt-1>nbasic_tile,lt≤lt-1,表示视频的显示内容在变小或者保持不变,则解码当前视频帧的待显示区域的内容。若nt>bt-1(nt>nbasic_tile)或者nt=bt-1>nbasic_tile,lt>lt-1,表示视频的显示内容在变大或者有变大的趋势,则相比于前一帧的已解码区域,当前视频帧需要多解码一个enhanced_tile对应的视频内容。
例如,在本公开至少一个实施例中,若当前视频帧待解码区域仅包括固定显示的显示区域(例如,第一显示区域),即dect=basic_tile时,单独解码basic_tile即可。例如,若当前视频帧待解码区域包括固定显示的显示区域(例如,第一显示区域)和至少一个显示子区域(enhanced_tile),即dect=basic_tile+αenhanced_tile时,其中α为额外需要解码的enhanced_tile数量。basic_tile和多个enhanced_tile可以并行解码。
例如,在本公开至少一个实施例中,对于无需显示的区域的CTU可以不用解码,直接用0像素填充。这样,可以提高视频的编解码效率,简化编解码过程,也可以节约产品能源。
需要说明的是,在本公开实施例中,可以用其他像素来对无需显示的区域的CTU进行填充,不一定是0像素,可以根据实际需求来设置。
例如,在本公开至少一个实施例中,当显示终端为全部卷起状态,即不存在需要显示的区域时,根据上述确定待解码区域(dect)的等式可知,仍然需要解码基础区域basic_tile。图16为根据本公开至少一个实施例的一种在LDP配置下视频编解码系统的示意框图。
例如,在本公开至少一个实施例中,关于图16中视频编解码系统的一般描述可以参考图2-4的相关说明,在此不再赘述。在本公开的实施例中,对于帧间预测模式下的编码过程,限制了与当前视频块相关联的运动矢量的范围,避免使用无效的参考像素信息。对于解码过程,可以根据实际显示的区域,对比特流进行部分解码,从而减少未显示部分的解码资源消耗,提高编解码效率。
图17为根据本公开至少一个实施例的一种在LDP配置下视频数据处理方法的示意流程图。
例如,在本公开至少一个实施例中,提供了一种视频数据处理方法,如图17所示。该视频数据处理方法包括步骤S201-S206。
步骤S201:在编码端,将当前视频帧划分为basic_tile和enhanced_tile。例如,在图12A所示的示例中,编码器将一帧图像分为左右两个Tile,左侧为basic_tile,右侧为至少一个enhanced_tile。例如,在一些示例中,按照一行CTU数量进行1:1分配。例如,另在一些示例中,按照一行CTU数量进行1:2分配。又例如,当一行CTU数量为奇数时,则每行多个enhanced_tile的CTU总数量比basic_tile的一行CTU数量多一个。本公开的实施例对具体划分方式不作限制,可以根据实际需求来设置。
步骤S202:对于左侧的basic_tile采用去掉右侧耦合的独立tile编码方式,对于右侧的enhanced_tile采用依赖左侧basic_tile的方向,并且限制MV的编码方式和范围。例如,在此步骤中,涉及帧间编码AMVP和Merge过程中初始MV的选取过程以及运动搜索算法的修正,以实现比特流能够适应解码时卷展和卷收的需求。
步骤S203:获取当前时刻的滑卷屏未卷的像素数lt和前一帧时刻滑卷屏未卷的像素数lt-1,并传入解码器中。
步骤S204:解码端接收到视频的比特流(不限于H.264/H.265/H.266)。
步骤S205:获取前一帧解码区域的CTU数量bt-1
步骤S206:根据当前时刻和前一帧时刻的滑卷屏未卷的像素数lt和lt-1、前一帧解码区域的CTU数量bt-1,确定待解码区域。例如,待解码区域包括basic_tile和相应的多个enhanced_tile。
步骤S207:对待解码区域进行解码,对未解码区域进行填充。
步骤S208:将待解码区域的内容送至显示终端显示。
需要说明的是,关于图17中所示的各个步骤S201-S206的具体操作都在上文中详细描述,在此不在赘述。
因此,通过本公开至少一个实施例提供的视频数据处理方法,可以根据待显示的区域来对视频的比特流进行部分解码,从而减少未显示部分的解码资源消耗,提高编解码效率。
需要说明的是,在本公开的各个实施例中,视频数据处理方法10的各个步骤的执行顺序不受限制,虽然上文以特定顺序描述了各个步骤的执行过程,但这并不构成对本公开实施例的限制。视频数据处理方法10中的各个步骤可以串行执行或并行执行,这可以根据实际需求而定。例如,视频数据处理方法10还可以包括更多或更少的步骤,本公开的实施例对此不作限制。
图18为根据本公开至少一个实施例的一种视频数据处理装置的示意框图。
例如,本公开至少一个实施例提供了一种视频数据处理装置40,如图18所示。视频数据处理装置40包括确定模块401和执行模块402。确定模块401被配置为对于视频的当前视频块,确定使用第一帧间预测模式进行编解码。例如,该确定模块401可以实现步骤S101,其具体实现方法可以参考步骤S101的相关描述,在此不再赘述。执行模块402被配置为基于确定,执行当前视频块与视频的比特流之间的转换,在第一帧间预测模式中,当前视频块的运动矢量的推导基于视频中的与第一显示模式对应的基础区域。例如,该执行模块402可以实现步骤S102,其具体实现方法可以参考步骤S102的相关描述,在此不再赘述。
需要说明的是,这些确定模块401和执行模块402可以通过软件、硬件、固件或它们的任意组合实现,例如,可以分别实现为确定电路401和执行电路402,本公开的实施例对它们的具体实施方式不作限制。
应当理解的是,本公开至少一个实施例提供的视频数据处理装置40可以实施前述视频数据处理方法10相似的技术效果。例如,在本公开至少一个实施例提供的视频数据处理装置40,通过上述方法,可以根据实际需要显示的区域来对比特流进行部分解码,从而减少未显示部分的解码资源消耗,提高编解码效率。
需要注意的是,在本公开的实施例中,该视频数据处理装置40可以包括更多或更少的电路或单元,并且各个电路或单元之间的连接关系不受限制,可以根据实际需求而定。各个电路的具体构成方式不受限制,可以根据电路原理由模拟器件构成,也可以由数字芯片构成,或者以其他适用的方式构成。
例如,本公开至少一个实施例还提供了一种显示装置,包括视频数据处理装置和滑卷屏。视频数据处理装置被配置为根据上述至少一个实施例提供的方法对接收到的比特流进行解码,并将解码后的像素值发送至滑卷屏以供显示。例如,在一些示例中,滑卷屏完全展开,不存在非显示区域,则视频数据处理装置对接收到的比特流进行完全解码。又例如,在另一些示例中,如图1所示,滑卷屏包括卷起部分和展开部分,即存在显示区域和非显示区域,则视频数据处理装置对接收到的比特流进行部分解码。
例如,在本公开至少一个实施例中,响应于滑卷屏在工作中包括显示区域和非显示区域,视频数据处理装置基于当前时刻和前一帧时刻的显示区域的尺寸对比特流进行解码。例如,如图1所示,在滑卷屏处于部分展开的状态时,视频数据处理装置只需要解码对应于显示区域的内容。例如,视频数据处理装置可以根据当前时刻和前一帧时刻的显示区域的尺寸确定待解码区域。例如,视频数据处理装置可以根据当前时刻和前一帧时刻的显示区域的尺寸来确定当前时刻和前一帧时刻的视频帧的待显示的像素数lt和lt-1。关于确定待解码区域的操作在前文中已详细描述,在此不再赘述。
例如,在本公开至少一个实施例中,显示装置除了视频数据处理装置和滑卷屏之外,还包括卷曲状态判断装置。例如,卷曲状态判断装置配置为检测滑卷屏的显示区域的尺寸,并将显示区域的尺寸发送至视频数据处理装置,以使得视频数据处理装置基于当前时刻和前一帧时刻的显示区域的尺寸对比特流进行解码。需要说明的是,卷曲状态判断装置可以通过软件、硬件、固件或它们的任意组合实现,例如,可以实现为卷曲状态判断电路,本公开的实施例对卷曲状态判断装置的具体实施方式不作限制。
需要说明的是,本公开的实施例不限制显示装置的类型。例如,该显示装置可以是移动终端、计算机、平板电脑、电话手表、电视机等,本公开的实施例对此不作限制。同样,本公开的实施例不限制滑卷屏的类型。例如,在本公开的实施例中,滑卷屏可以是任意类型的显示区域可变的显示屏,包括且不限于图1所示的滑卷屏类型。例如,在本公开的实施例中,显示装置所包括的视频数据处理装置可以实施为本公开中提及的视频数据处理装置40/90/600等,本公开的实施例对视频数据处理装置的具体实施方式不作限制。
需要注意的是,在本公开的实施例中,该显示装置可以包括更多或更少的电路或单元,并且各个电路或单元之间的连接关系不受限制,可以根据实际需求而定。各个电路的具体构成方式不受限制,可以根据电路原理由模拟器件构成,也可以由数字芯片构成,或者以其他适用的方式构成。
图19是本公开至少一个实施例提供另一种视频数据处理装置的示意框图。
本公开至少一个实施例还提供了一种视频数据处理装置90。如图19所示,视频数据处理装置90包括处理器910和存储器920。存储器920包括一个或多个计算机程序模块921。一个或多个计算机程序模块921被存储在存储器920中并被配置为由处理器910执行,该一个或多个计算机程序模块921包括用于执行本公开的至少一个实施例提供的视频数据处理方法10的指令,其被处理器910执行时,可以执行本公开的至少一个实施例提供的视频数据处理方法10中的一个或多个步骤。存储器920和处理器910可以通过总线系统和/或其它形式的连接机构(未示出)互连。
例如,处理器910可以是中央处理单元(CPU)、数字信号处理器(DSP)或者具有数据处理能力和/或程序执行能力的其它形式的处理单元,例如现场可编程门阵列(FPGA)等;例如,中央处理单元(CPU)可以为X86或ARM架构等。处理器910可以为通用处理器或专用处理器,可以控制视频数据处理装置90中的其它组件以执行期望的功能。
例如,存储器920可以包括一个或多个计算机程序产品的任意组合,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、USB存储器、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序模块921,处理器910可以运行一个或多个计算机程序模块921,以实现视频数据处理装置90的各种功能。在计算机可读存储介质中还可以存储各种应用程序和各种数据以及应用程序使用和/或产生的各种数据等。视频数据处理装置90的具体功能和技术效果可以参考上文中关于视频数据处理方法10/30的描述,此处不再赘述。
图20为本公开至少一个实施例提供的又一种视频数据处理装置的示意框图。
本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图20示出的视频数据处理装置600仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
例如,如图20所示,在一些示例中,视频数据处理装置600包括处理装置(例如中央处理器、图形处理器等)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储装置608加载到随机访问存储器(RAM)603中的程序而执行各种适当的动作和处理。在RAM 603中,还存储有计算机系统操作所需的各种程序和数据。处理装置601、ROM 602以及RAM 603通过总线604被此相连。输入/输出(I/O)接口605也连接至总线604。
例如,以下部件可以连接至I/O接口605:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置606;包括诸如液晶显示器(LCD)、扬声器、振动器等的输出装置607;包括例如磁带、硬盘等的存储装置608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信装置609。通信装置609可以允许视频数据处理装置600与其他设备进行无线或有线通信以交换数据,经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储装置608。虽然图20示出了包括各种装置的视频数据处理装置600,但是应理解的是,并不要求实施或包括所有示出的装置。可以替代地实施或包括更多或更少的装置。
例如,该视频数据处理装置600还可以进一步包括外设接口(图中未示 出)等。该外设接口可以为各种类型的接口,例如为USB接口、闪电(lighting)接口等。该通信装置609可以通过无线通信来与网络和其他设备进行通信,该网络例如为因特网、内部网和/或诸如蜂窝电话网络之类的无线网络、无线局域网(LAN)和/或城域网(MAN)。无线通信可以使用多种通信标准、协议和技术中的任何一种,包括但不局限于全球移动通信系统(GSM)、增强型数据GSM环境(EDGE)、宽带码分多址(W-CDMA)、码分多址(CDMA)、时分多址(TDMA)、蓝牙、Wi-Fi(例如基于IEEE 802.11a、IEEE 802.11b、IEEE 802.11g和/或IEEE 802.11n标准)、基于因特网协议的语音传输(VoIP)、Wi-MAX,用于电子邮件、即时消息传递和/或短消息服务(SMS)的协议,或任何其他合适的通信协议。
例如,视频数据处理装置600可以为手机、平板电脑、笔记本电脑、电子书、电视机等任何设备,也可以为任意的数据处理装置及硬件的组合,本公开的实施例对此不作限制。
例如,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置609从网络上被下载和安装,或者从存储装置608被安装,或者从ROM602被安装。在该计算机程序被处理装置601执行时,执行本公开实施例所公开的视频数据处理方法10。
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开的实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开的实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。
上述计算机可读介质可以是上述视频数据处理装置600中所包含的;也可以是单独存在,而未装配入该视频数据处理装置600中。
图21为本公开至少一个实施例提供的一种非瞬时可读存储介质的示意框图。
本公开的实施例还提供一种非瞬时可读存储介质。图21是根据本公开至少一个实施例的一种非瞬时可读存储介质的示意框图。如图21所示,非瞬时可读存储介质70上存储有计算机指令111,该计算机指令111被处理器执行时执行如上所述的视频数据处理方法10中的一个或多个步骤。
例如,该非瞬时可读存储介质70可以是一个或多个计算机可读存储介质的任意组合,例如,一个计算机可读存储介质包含用于获取视频的第一显示模式的计算机可读的程序代码,另一个计算机可读存储介质包含用于对于视频的当前视频块,确定当前多个可用的帧间预测模式的第一子集合的计算机可读的程序代码,又一个计算机可读存储介质包含用于从第一子集合中选择可用的第一帧间预测模式,执行当前视频帧的当前视频块与视频的比特流之间的转换的计算机可读的程序代码,该第一子集合中每个成员所使用的运动矢量的可用有效范围根据视频中与第一显示模式对应的基础区域确定。当然,上述各个程序代码也可以存储在同一个计算机可读介质中,本公开的实施例对此不作限制。
例如,当该程序代码由计算机读取时,计算机可以执行该计算机存储介质中存储的程序代码,执行例如本公开任一个实施例提供的视频数据处理方法10。
例如,存储介质可以包括智能电话的存储卡、平板电脑的存储部件、个人计算机的硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM)、便携式紧致盘只读存储器(CD-ROM)、闪存、或者上述存储介质的任意组合,也可以为其他适用的存储介质。例如,该可读存储介质也可以为图19中的存储器920,相关描述可以参考前述内容,此处不再赘述。
在本公开中,术语“多个”指两个或两个以上,除非另有明确的限定。
本领域技术人员在考虑说明书及实践这里公开的公开后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (26)

  1. 一种视频数据处理方法,包括:
    对于视频的当前视频块,确定使用第一帧间预测模式进行编解码,
    基于所述确定,执行所述当前视频块与所述视频的比特流之间的转换,
    其中,在所述第一帧间预测模式中,所述当前视频块的运动矢量的推导基于所述视频中的与第一显示模式对应的基础区域。
  2. 根据权利要求1所述的方法,其中,对于所述视频的当前视频帧,从所述第一显示模式定义的视频展开起始位置沿着展开方向的第一显示区域作为所述基础区域。
  3. 根据权利要求2所述的方法,其中,响应于所述当前视频块位于所述第一显示区域内,所述当前视频块的运动矢量在第一运动矢量预测范围内。
  4. 根据权利要求3所述的方法,其中,所述第一运动矢量预测范围基于所述当前视频块的位置、运动矢量预测精度、所述第一显示区域的边界来确定。
  5. 根据权利要求3或4所述的方法,其中,所述当前视频帧包括所述第一显示区域和沿着所述展开方向依次相邻布置的至少一个显示子区域,并且所述展开方向为从左到右,
    响应于所述当前视频块位于所述当前视频帧中的所述第一显示区域右侧的第一显示子区域内,所述当前视频块的运动矢量在第二运动矢量预测范围内。
  6. 根据权利要求5所述的方法,其中,所述第二运动矢量预测范围基于所述当前视频块的位置、运动矢量预测精度、所述第一显示区域的边界、所述第一显示子区域的宽度来确定。
  7. 根据权利要求5或6所述的方法,响应于所述当前视频块位于所述第一显示区域右侧的第k个第一显示子区域内并且k=1,所述第二运动矢量预测范围等于所述第一运动矢量预测范围。
  8. 根据权利要求5-7中任一项所述的方法,响应于所述当前视频块位于所述第一显示区域右侧的第k个第一显示子区域内并且k为大于1的整数,所述第一运动矢量预测范围的第一右侧边界与所述第二运动矢量预测范围的第二右侧边界不同。
  9. 根据权利要求3-8中任一项所述的方法,其中,所述当前视频帧包括所述第一显示区域和沿着所述展开方向依次相邻布置的至少一个显示子区域,并且所述展开方向为从上到下,
    响应于所述当前视频块位于所述第一显示区域下方的第二显示子区域内,所述当前视频块的运动矢量在第三运动矢量预测范围内。
  10. 根据权利要求9所述的方法,其中,所述第三运动矢量预测范围基于所述当前视频块的位置、运动矢量预测精度、所述第一显示区域的边界、所述第二显示子区域的高度来确定。
  11. 根据权利要求9或10所述的方法,其中,响应于所述当前视频块位于所述第一显示区域下方的第m个第二显示子区域内并且m=1,所述第三运动矢量预测范围等于所述第一运动矢量预测范围。
  12. 根据权利要求9-11中任一项所述的方法,响应于所述当前视频块位于所述第一显示区域下方的第m个第二显示子区域内并且m为大于1的整数,所述第一运动矢量预测范围的第一下方边界与所述第三运动矢量预测范围的第三下方边界不同。
  13. 根据权利要求2-12中任一项所述的方法,其中,响应于所述当前视频块位于所述基础区域外,所述当前视频块的运动矢量预测候选列表中的时域候选运动矢量预测值基于空域候选运动矢量预测值计算。
  14. 根据权利要求2-13中任一项所述的方法,其中,响应于所述当前视频块位于所述基础区域内,所述当前视频块使用的所有参考像素在所述基础区域内。
  15. 根据权利要求1-14中任一项所述的方法,其中,所述第一帧间预测模式包括Merge预测模式、高级运动矢量预测AMVP模式、带运动矢量差的Merge模式、双向加权预测模式或者仿射预测模式。
  16. 一种视频数据处理方法,包括:
    接收视频的比特流;
    确定所述视频的当前视频块使用第一帧间预测模式进行编解码,
    基于所述确定,对所述比特流进行解码,
    其中,在所述第一帧间预测模式中,所述当前视频块的运动矢量的推导基于所述视频中的与第一显示模式对应的基础区域。
  17. 根据权利要求16所述的方法,其中,对所述比特流进行解码,包括:
    确定所述视频的当前视频帧的待解码区域,所述待解码区域至少包括所述基础区域对应的第一显示区域;
    基于所述待解码区域,对所述比特流进行解码。
  18. 根据权利要求17所述的方法,其中,确定所述待解码区域,包括:
    基于所述当前视频帧待显示的像素数量和编码单元数量、所述第一显示区域的编码单元数量、前一视频帧的已显示的像素数量和所述前一视频帧的已解码区域的编码单元数量中的至少一个确定所述待解码区域。
  19. 根据权利要求18所述的方法,其中,确定所述待解码区域,包括:
    响应于所述当前视频帧待显示的编码单元数量大于所述前一视频帧的已解码区域的编码单元数量,或者
    响应于所述当前视频帧待显示的编码单元数量等于前一视频帧的已解码区域的编码单元数量,并且所述当前视频帧待显示的像素数量大于所述前一视频帧的已显示的像素数量,
    确定所述待解码区域包括所述前一视频帧的已解码区域和一个新的显示子区域。
  20. 根据权利要求18或19所述的方法,其中,确定所述待解码区域,包括:
    响应于所述当前视频帧待显示的编码单元数量大于所述第一显示区域的编码单元数量,并且所述当前视频帧待显示的编码单元数量小于前一视频帧的已解码区域的编码单元数量,或者
    响应于所述当前视频帧待显示的编码单元数量大于所述第一显示区域的编码单元数量,所述当前视频帧待显示的编码单元数量等于所述前一视频帧的已解码区域的编码单元数量,并且所述当前视频帧待显示的像素数量不大于所述前一视频帧的已显示的像素数量,
    确定所述当前视频帧的待解码区域包括所述当前视频帧待显示的区域。
  21. 一种视频数据处理装置,包括:
    确定模块,被配置为对于所述视频的当前视频块,确定使用第一帧间预测模式进行编解码,
    执行模块,被配置为基于所述确定,执行所述当前视频块与所述视频的比特流之间的转换,
    其中,在所述第一帧间预测模式中,所述当前视频块的运动矢量的推导基于所述视频中的与第一显示模式对应的基础区域。
  22. 一种显示装置,包括:视频数据处理装置和滑卷屏,
    其中,所述视频数据处理装置被配置为根据权利要求1-20中任一项所述的方法对接收到的比特流进行解码,并将解码后的像素值发送至所述滑卷屏以供显示。
  23. 根据权利要求22所述的显示装置,其中,响应于所述滑卷屏在工作中包括显示区域和非显示区域,所述视频数据处理装置基于当前时刻和前一帧时刻的所述显示区域的尺寸对所述比特流进行解码。
  24. 根据权利要求22或23所述的显示装置,还包括卷曲状态判断装置,其中,所述卷曲状态判断装置配置为检测所述滑卷屏的显示区域的尺寸,并将所述显示区域的尺寸发送至所述视频数据处理装置,以使得所述视频数据处理装置基于当前时刻和前一帧时刻的所述显示区域的尺寸对所述比特流进行解码。
  25. 一种视频数据处理装置,包括:
    处理器;
    存储器,包括一个或多个计算机程序模块;
    其中,所述一个或多个计算机程序模块被存储在所述存储器中并被配置为由所述处理器执行,所述一个或多个计算机程序模块包括用于执行权利要求1-20中任一项所述的视频数据处理方法的指令。
  26. 一种计算机可读存储介质,其中存储有计算机指令,该指令被处理器执行时实现权利要求1-20中任一项所述视频数据处理方法的步骤。
PCT/CN2023/133303 2022-11-25 2023-11-22 视频数据处理方法及装置、显示装置和存储介质 WO2024109816A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211493517.XA CN118101964A (zh) 2022-11-25 2022-11-25 视频数据处理方法及装置、显示装置和存储介质
CN202211493517.X 2022-11-25

Publications (1)

Publication Number Publication Date
WO2024109816A1 true WO2024109816A1 (zh) 2024-05-30

Family

ID=91140903

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/133303 WO2024109816A1 (zh) 2022-11-25 2023-11-22 视频数据处理方法及装置、显示装置和存储介质

Country Status (2)

Country Link
CN (1) CN118101964A (zh)
WO (1) WO2024109816A1 (zh)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105519117A (zh) * 2013-09-06 2016-04-20 三菱电机株式会社 动态图像编码装置、动态图像转码装置、动态图像编码方法、动态图像转码方法以及动态图像流传输系统
CN105681805A (zh) * 2016-01-19 2016-06-15 北京大学深圳研究生院 视频编码、解码方法及其帧间预测方法和装置
CN108351706A (zh) * 2015-11-18 2018-07-31 三星电子株式会社 具有可卷动的显示器的电子设备及其控制方法
CN108735100A (zh) * 2018-05-21 2018-11-02 上海创功通讯技术有限公司 一种柔性显示装置及其控制方法、控制装置
CN108766235A (zh) * 2018-03-31 2018-11-06 上海创功通讯技术有限公司 卷轴式柔性屏、显示控制方法及存储介质
CN111708506A (zh) * 2020-06-09 2020-09-25 上海卷视科技有限公司 卷轴式显示屏的显示方法、系统、电子设备及存储介质
CN112262580A (zh) * 2018-04-10 2021-01-22 高通股份有限公司 用于视频编码的解码器侧运动矢量推导
WO2021194308A1 (ko) * 2020-03-26 2021-09-30 엘지전자 주식회사 랩-어라운드 움직임 보상에 기반하는 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
WO2021194307A1 (ko) * 2020-03-26 2021-09-30 엘지전자 주식회사 랩-어라운드 움직임 보상에 기반하는 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
CN114168051A (zh) * 2021-12-03 2022-03-11 深圳传音控股股份有限公司 显示方法、智能终端及存储介质
US20220248028A1 (en) * 2019-10-31 2022-08-04 Samsung Electronics Co., Ltd. Video decoding method and apparatus, and video encoding method and apparatus for performing inter prediction according to affine model

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105519117A (zh) * 2013-09-06 2016-04-20 三菱电机株式会社 动态图像编码装置、动态图像转码装置、动态图像编码方法、动态图像转码方法以及动态图像流传输系统
CN108351706A (zh) * 2015-11-18 2018-07-31 三星电子株式会社 具有可卷动的显示器的电子设备及其控制方法
CN105681805A (zh) * 2016-01-19 2016-06-15 北京大学深圳研究生院 视频编码、解码方法及其帧间预测方法和装置
CN108766235A (zh) * 2018-03-31 2018-11-06 上海创功通讯技术有限公司 卷轴式柔性屏、显示控制方法及存储介质
CN112262580A (zh) * 2018-04-10 2021-01-22 高通股份有限公司 用于视频编码的解码器侧运动矢量推导
CN108735100A (zh) * 2018-05-21 2018-11-02 上海创功通讯技术有限公司 一种柔性显示装置及其控制方法、控制装置
US20220248028A1 (en) * 2019-10-31 2022-08-04 Samsung Electronics Co., Ltd. Video decoding method and apparatus, and video encoding method and apparatus for performing inter prediction according to affine model
WO2021194308A1 (ko) * 2020-03-26 2021-09-30 엘지전자 주식회사 랩-어라운드 움직임 보상에 기반하는 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
WO2021194307A1 (ko) * 2020-03-26 2021-09-30 엘지전자 주식회사 랩-어라운드 움직임 보상에 기반하는 영상 부호화/복호화 방법, 장치 및 비트스트림을 저장한 기록 매체
CN111708506A (zh) * 2020-06-09 2020-09-25 上海卷视科技有限公司 卷轴式显示屏的显示方法、系统、电子设备及存储介质
CN114168051A (zh) * 2021-12-03 2022-03-11 深圳传音控股股份有限公司 显示方法、智能终端及存储介质

Also Published As

Publication number Publication date
CN118101964A (zh) 2024-05-28

Similar Documents

Publication Publication Date Title
TWI812694B (zh) 以角度模式延伸之位置相關框內預測組合
JP7239697B2 (ja) エンコーダ、デコーダ、インター予測のための対応する方法
JP7271683B2 (ja) エンコーダ、デコーダ、および対応するイントラ予測方法
US11863779B2 (en) Cross-component adaptive loop filter in video coding
US20200280736A1 (en) Constraints on decoder-side motion vector refinement
TW202025752A (zh) 用於仿射模式之以歷史為基礎之運動向量預測
CN112119636A (zh) 视频编码中高精度运动矢量的存储
JP2017511620A (ja) オーバーラップエリア内の再構成されたサンプル値のブロックベクトル予測及び推定におけるイノベーション
JP2022537064A (ja) エンコーダ、デコーダ、および対応する方法
TW202025767A (zh) 具有適應性方向性資訊集合之最終動作向量表示
CN114128261A (zh) 用于视频译码的组合的帧间和帧内预测模式
WO2020220884A1 (zh) 视频序列的帧内预测方法及装置
TW202101993A (zh) 用於視訊寫碼之可切換內插濾波
JP2022521809A (ja) ビデオコーディングにおける係数領域ブロック差分パルスコード変調
TWI826487B (zh) 用於視訊寫碼中之適應性運動向量差解析度及增加的運動向量儲存精確度的運動向量捨位
TW202034695A (zh) 用於視訊寫碼之限制仿射運動繼承
WO2020103593A1 (zh) 一种帧间预测的方法及装置
US12010325B2 (en) Intra block copy scratch frame buffer
KR20210058856A (ko) 저장된 파라미터들을 사용하는 비디오 인코딩 및 디코딩을 위한 로컬 조명 보상
TW202101996A (zh) 用於視訊寫碼之以梯度為基礎之預測精細化
WO2020114356A1 (zh) 帧间预测方法和相关装置
CN118101967A (zh) 用于视频编解码的位置相关空间变化变换
WO2024109816A1 (zh) 视频数据处理方法及装置、显示装置和存储介质
WO2024109790A1 (zh) 视频数据处理方法及装置、显示装置和存储介质
CN116250240A (zh) 图像编码方法、图像解码方法及相关装置