CN116980620A - Image frame prediction method and device, electronic equipment and storage medium - Google Patents

Image frame prediction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116980620A
CN116980620A CN202310730347.0A CN202310730347A CN116980620A CN 116980620 A CN116980620 A CN 116980620A CN 202310730347 A CN202310730347 A CN 202310730347A CN 116980620 A CN116980620 A CN 116980620A
Authority
CN
China
Prior art keywords
frame
image frame
preset
coding unit
mode
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310730347.0A
Other languages
Chinese (zh)
Inventor
薛毅
黄跃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN202310730347.0A priority Critical patent/CN116980620A/en
Publication of CN116980620A publication Critical patent/CN116980620A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties
    • H04N19/14Coding unit complexity, e.g. amount of activity or edge presence estimation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/177Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being a group of pictures [GOP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/42Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/577Motion compensation with bidirectional frame interpolation, i.e. using B-pictures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/593Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving spatial prediction techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The disclosure relates to an image frame prediction method, an image frame prediction device, an electronic device and a storage medium, wherein the method comprises the following steps: determining a currently processed coding unit in the decoded image frame; if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset position of the decoded image frame; if the execution flag is the skip flag, skipping the decoding process of the preset prediction mode for the coding unit. According to the embodiment of the application, the decoding processing of the preset prediction mode of the coding unit can be skipped through the set skip mark, so that the high computational complexity of the coding and decoding end is reduced.

Description

Image frame prediction method and device, electronic equipment and storage medium
Technical Field
The disclosure relates to the technical field of internet, and in particular relates to an image frame prediction method, an image frame prediction device, electronic equipment and a storage medium.
Background
In the video coding standard, a video sequence is divided into a plurality of image frame groups, and one image frame group is generally divided into three frame types according to different functions: intra-coded image frames, forward predictive coded image frames, and bi-directionally predictive coded image frames. In addition to intra-coded image frames being self-frame, other types of frames are predictive coded with reference to other frames to improve the coding efficiency of the encoder.
In the existing Affine motion prediction mode in the VVC/h.266 standard, namely the Affine technology, since the 4x4 sub-block is used as a basic unit for motion search and motion compensation, the high computational complexity of the codec end is caused, and in some embodiments, the Affine technology is used as an example, the BD-Rate gain on the encoder Test sequence Test100 is-2.4%, and the CPU cluster encoding time and the mobile end decoding time account for 39% and 20%, respectively, it is worth mentioning that the decoding complexity of the mobile end directly affects the power consumption of the mobile phone, and is related to QoS, qoE indexes, gear coverage and bandwidth cost such as user viewing experience.
Disclosure of Invention
The disclosure provides an image frame prediction method, an image frame prediction device, electronic equipment and a storage medium, and the technical scheme of the disclosure is as follows:
according to a first aspect of an embodiment of the present disclosure, there is provided an image frame prediction method, including:
determining a currently processed coding unit in the decoded image frame;
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset position of the decoded image frame;
if the execution flag is the skip flag, skipping the decoding process of the preset prediction mode for the coding unit.
In some possible embodiments, if the decoding mode of the encoding unit is a preset prediction mode, determining the execution flag from a preset position of the decoded image frame includes:
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset prediction position of the coding unit.
In some possible embodiments, if the decoding mode of the encoding unit is a preset prediction mode, determining the execution flag from a preset position of the decoded image frame includes:
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a frame header of the decoded image frame;
if the execution mark is a skip mark; after skipping the decoding process of the preset prediction mode on the coding unit, the method further comprises:
an execution flag of a preset prediction position of a coding unit other than the currently processed coding unit in the decoded image frame is set as a skip flag.
In some possible embodiments, before determining the currently processed coding unit in the decoded image frame, further comprising:
determining a coding unit that codes a current code in an image frame;
determining hierarchy information from a header of an encoded image frame;
If the level information meets the preset level information, determining the coding mode of the currently coded coding unit;
if the coding mode is a preset prediction mode, setting an execution mark of a preset prediction position of a currently coded coding unit as a skip mark.
In some possible embodiments, if the level information satisfies the preset level information, determining the coding mode of the currently coded coding unit further includes:
if the coding mode is a preset prediction mode, adding an execution identifier to the frame head of the coded image frame; the execution identification comprises a skip identification; skip flag characterization in the coding/decoding process, the preset prediction mode processing for the coding unit is skipped.
In some possible embodiments, the preset prediction mode includes an affine motion prediction mode.
According to a second aspect of embodiments of the present disclosure, there is provided an image frame prediction apparatus including:
an encoding unit determination module configured to perform determination of an encoding unit currently processing in the decoded image frame;
the identification determining module is configured to execute the execution identification from the preset position of the decoded image frame if the decoding mode of the encoding unit is a preset prediction mode;
And the processing module is configured to execute the decoding processing of skipping the preset prediction mode of the coding unit if the execution identifier is the skip identifier.
In some possible embodiments, the identification determination module is configured to perform:
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset prediction position of the coding unit.
In some possible embodiments, the identification determination module is configured to perform:
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a frame header of the decoded image frame;
if the execution mark is a skip mark; after skipping the decoding process of the preset prediction mode on the coding unit, the method further comprises:
an execution flag of a preset prediction position of a coding unit other than the currently processed coding unit in the decoded image frame is set as a skip flag.
In some possible embodiments, the apparatus further comprises:
a first determination module configured to perform determining an encoding unit that encodes a current encoding in the image frame;
a second determination module configured to perform determination of hierarchy information from a header of the encoded image frame;
the third determining module is configured to execute the determination of the coding mode of the currently coded coding unit if the level information meets the preset level information;
The setting module is configured to execute setting the execution identification of the preset prediction position of the currently encoded encoding unit as the skip identification if the encoding mode is the preset prediction mode.
In some possible embodiments, the setting module is configured to perform:
if the coding mode is a preset prediction mode, adding an execution identifier to the frame head of the coded image frame; the execution identification comprises a skip identification; skip flag characterization in the coding/decoding process, the preset prediction mode processing for the coding unit is skipped.
In some possible embodiments, the preset prediction mode includes an affine motion prediction mode.
According to a third aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; a memory for storing processor-executable instructions; wherein the processor is configured to execute instructions to implement the method as in any of the first aspects above.
According to a fourth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the method of any one of the first aspects of embodiments of the present disclosure.
According to a fifth aspect of embodiments of the present disclosure, there is provided a computer program product comprising a computer program stored in a readable storage medium, the computer program being read from the readable storage medium by at least one processor of the computer device and executed, such that the computer device performs the method of any one of the first aspects of embodiments of the present disclosure.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
determining a currently processed coding unit in the decoded image frame; if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset position of the decoded image frame; if the execution flag is the skip flag, skipping the decoding process of the preset prediction mode for the coding unit. According to the embodiment of the application, the decoding processing of the preset prediction mode of the coding unit can be skipped through the set skip mark, so that the high computational complexity of the coding and decoding end is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of an application environment shown in accordance with an exemplary embodiment;
FIG. 2 is a flowchart illustrating an image frame prediction method according to an exemplary embodiment;
fig. 3 is a schematic diagram one of a GOP shown in accordance with an exemplary embodiment;
fig. 4 is a schematic diagram two of a GOP shown according to an exemplary embodiment;
FIG. 5 is a schematic diagram illustrating a codec process according to an example embodiment;
FIG. 6 is a schematic diagram illustrating a codec process according to an example embodiment;
FIG. 7 is a flowchart illustrating a method of image frame prediction during encoding according to an exemplary embodiment;
FIG. 8 is a schematic diagram of a prediction unit, according to an example embodiment;
FIG. 9 is a block diagram of an image frame prediction device according to an exemplary embodiment;
fig. 10 is a block diagram illustrating an electronic device for image frame prediction according to an exemplary embodiment.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar first objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by the user or sufficiently authorized by each party.
Referring to fig. 1, fig. 1 is a schematic diagram illustrating an application environment of an image frame prediction method according to an exemplary embodiment, and as shown in fig. 1, the application environment may include a server 01 and a client 02.
In some possible embodiments, the server 01 may include a stand-alone physical server, a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server for providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud audio recognition model training, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platforms. Operating systems running on the server may include, but are not limited to, android systems, IOS systems, linux, windows, unix, and the like.
In some possible embodiments, the client 02 described above may include, but is not limited to, a smart phone, a desktop computer, a tablet computer, a notebook computer, a smart speaker, a digital assistant, an augmented reality (augmented reality, AR)/Virtual Reality (VR) device, a smart wearable device, and the like. Or may be software running on the client, such as an application, applet, etc. Alternatively, the operating system running on the client may include, but is not limited to, an android system, an IOS system, linux, windows, unix, and the like.
In some possible embodiments, the client 02 may determine the currently processed coding unit in the decoded image frame; if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset position of the decoded image frame; if the execution flag is the skip flag, skipping the decoding process of the preset prediction mode for the coding unit. According to the embodiment of the application, the decoding processing of the preset prediction mode of the coding unit can be skipped through the set skip mark, so that the high computational complexity of the coding and decoding end is reduced.
In some possible embodiments, the client 02 and the server 01 may be connected through a wired link, or may be connected through a wireless link.
In an exemplary embodiment, the client, the server and the database corresponding to the server may be node devices in the blockchain system, and may share the acquired and generated information to other node devices in the blockchain system, so as to implement information sharing between multiple node devices. The plurality of node devices in the blockchain system can be configured with the same blockchain, the blockchain consists of a plurality of blocks, and the blocks adjacent to each other in front and back have an association relationship, so that the data in any block can be detected through the next block when being tampered, thereby avoiding the data in the blockchain from being tampered, and ensuring the safety and reliability of the data in the blockchain.
Fig. 2 is a flowchart of an image frame prediction method according to an exemplary embodiment, and as shown in fig. 2, the image frame prediction method may be applied to a server, or may be applied to other node devices, such as a client, and the method is described below by taking the server as an example, and includes at least the following steps S201 to S205:
in step S201, the currently processed encoding unit in the decoded image frame is determined.
In an embodiment of the present application, the decoded image frames may include intra-coded image frames (Intra coded frames, I-frames), forward Predictive-coded image frames (P-frames), and Bi-Predictive-coded image frames (Bi-directional predicted frame, B-frames).
In an embodiment of the application, the I-frames exploit spatial correlation within a single frame image, and do not exploit temporal correlation. The I frame is decoded to reconstruct a complete image using only the data of the I frame, and is not only a random access entry point, but also a decoded reference frame, i.e., the I frame may be a reference image of a P frame or a B frame, since the I frame does not need to refer to other frames. The quality of an I-frame may affect the quality of frames following the I-frame in the same group. The I frame is mainly used for initializing video playing, and the compression multiple of the I frame is relatively low. Wherein the I-frames are periodically present in the sequence of pictures, the frequency of occurrence being selectable by the encoder. In the video picture playing process, if the I frame is lost, the following P frame can not be decoded, and the phenomenon of video picture black screen or blocking can occur.
In the embodiment of the application, the P frames adopt an inter-frame coding mode, namely, the spatial correlation and the time correlation are simultaneously utilized, and the coded images with the transmission data quantity are compressed by fully compressing the time redundancy information lower than the previous coded frames in the image sequence. The P-frame only uses forward temporal prediction, i.e., the P-frame may reference an I-frame preceding the P-frame, or the P-frame may reference a P-frame preceding the P-frame. Taking a reference image of a P frame as an I frame as an example, in decoding, the P frame needs to sum a prediction value and a prediction error in the I frame to reconstruct a complete P frame image. Similarly, a P frame may be a reference picture of a subsequent P frame, may be a reference picture of a preceding B frame, or may be a reference picture of a subsequent B frame.
Alternatively, the use of forward temporal prediction for P frames may improve compression efficiency and image quality. If the P frame is lost, the video picture can appear the phenomena of screen display, mosaic, etc.
In the embodiment of the application, the B frame adopts bidirectional time prediction, so that the compression multiple can be greatly improved. It is noted that since the B-frame image uses a future frame as a reference, the transmission order and the display order of the image frames in the encoded code stream are different.
In an alternative embodiment, the I-frame may be the first frame of a group of picture frames (Group of Pictures, GOP) and there is only one I-frame in a GOP. The encoder encodes a plurality of images to generate a segment-by-segment GOP, and the decoder decodes the segment-by-segment GOP and reads the picture for rendering and displaying when playing.
Alternatively, fig. 3 is a schematic diagram of a GOP according to an exemplary embodiment-one GOP is separated by a distance between two I frames. As shown in fig. 3, GOP1 may include picture frames (B frames) between the first I frame and the second I frame, and GOP1 further includes the first I frame. Optionally, a P frame may be included between two I frames in addition to the B frame.
Alternatively, fig. 4 is a schematic diagram two of a GOP, one GOP being separated by an I-frame to P-frame distance, according to an exemplary embodiment. As shown in fig. 4, GOP1 may include picture frames (B frames) between I frames to P frames, and GOP1 also includes I frames.
Alternatively, the two GOP groups shown in fig. 3 and 4 are exemplary. In the actual application process, other possible presentations can be made to the GOP group according to the application environment.
In an alternative embodiment, since the P frame and the B frame use an inter-frame coding manner, and the inter-frame coding manner is based on a reference picture, a reference picture list is needed to manage the previously generated reference pictures, so that the current frame is conveniently encoded. Fig. 5 is a schematic diagram illustrating a codec process according to an exemplary embodiment, as shown in fig. 5, in which as image encoding proceeds, a decoded image may be continuously generated in a decoding stage, and the decoded image may be placed in a decoded image buffer (Decoded Picture Buffer, DPB) or directly output (not shown in fig. 5), and as the image is continuously decoded, a new decoded image may be moved into the DPB and a decoded image beyond the DPB window may be moved out in a first-in-first-out manner. The reference pictures in the reference picture list are from decoded pictures in the DPB. Optionally, the DPB may include reference pictures and non-reference pictures, and the reference picture list integrates the reference pictures in the DPB into a list (array) form, so as to facilitate subsequent sorting and other operations.
In the embodiment of the application, in the frame structure, a high-level frame can refer to a low-level frame in encoding and decoding. In the following description of reference pictures in conjunction with fig. 4, fig. 6 is a schematic diagram illustrating a coding and decoding process according to an exemplary embodiment, and as shown in fig. 6, POC above each frame (including I-frame and B-frame) in GOP1 with a frame number equal to 16 refers to a display sequence number, that is, when the client finishes decoding each frame, and in the process of rendering an image picture on the display screen, the client performs image playing according to the sequence represented by POC from small to large.
Specifically, the client decodes the I frame with poc=0, and then obtains the image after decoding the I frame. In the embodiment of the application, the B frame between the I frame and the P frame needs to refer to the I frame and the P frame, so that the image after I frame decoding can be used as the reference image of the P frame to decode the P frame, and the image after P frame decoding can be obtained. Alternatively, the level information of the I frame and the P frame may be set to "layer 0".
In an alternative embodiment, the B frame with poc=8 is taken as the next frame to be decoded. Alternatively, since the B frame is a bi-directionally predicted encoded image frame, the image after I frame decoding and the image after P frame decoding may be used as reference images of the B frame with poc=8 to decode the B frame with poc=8, thereby obtaining the image after B frame decoding with poc=8. Alternatively, the level information of the B frame of poc=8 may be set to "layer 1".
Alternatively, a B frame of poc=4 and a B frame of poc=12 may be used as frames to be decoded next. Alternatively, the client may decode the B frame with poc=4 and the B frame with poc=12 at the same time, or may decode the B frame with poc=4 and the B frame with poc=12 sequentially.
Specifically, the image after I-frame decoding and the image after poc=8B-frame decoding may be used as the reference image of poc=4B-frame to decode poc=4B-frame, thereby obtaining the image after poc=4B-frame decoding. Similarly, a B frame of poc=12 is decoded using a B frame decoded image of poc=8 and a P frame decoded image as reference images of a B frame of poc=12, and a B frame decoded image of poc=12 is obtained. Alternatively, the level information of the B frame of poc=4 and the B frame of poc=12 may be set to "layer 2".
Optionally, B frames of poc=2, poc=6, poc=10 and poc=14 need to be decoded. Alternatively, the client may decode the B frame with poc=2, the B frame with poc=6, the B frame with poc=10, and the B frame with poc=14 at the same time, or may decode the B frame with poc=2, the B frame with poc=6, the B frame with poc=10, and the B frame with poc=14 sequentially.
Specifically, the image after I-frame decoding and the image after poc=4B-frame decoding may be used as the reference image of poc=2B-frames to decode poc=2B-frames, thereby obtaining the image after poc=2B-frame decoding.
Similarly, a B frame with poc=6 may be decoded using a B frame decoded with poc=4 and a B frame decoded with poc=8 as reference images of a B frame with poc=6, to obtain a B frame decoded with poc=6.
Similarly, a B frame with poc=10 may be decoded using a B frame decoded with poc=8 and a B frame decoded with poc=12 as reference images of a B frame with poc=10, to obtain a B frame decoded with poc=10.
Similarly, a B frame of poc=14 may be decoded using a B frame decoded image of poc=12 and a P frame decoded image as reference images of a B frame of poc=14, to obtain a B frame decoded image of poc=14. Alternatively, the level information of the B frame of poc=2, the B frame of poc=6, the B frame of poc=10, and the B frame of poc=14 may be set to "layer 3".
Optionally, a B frame of poc=1, a B frame of poc=3, a B frame of poc=5, a B frame of poc=7, a B frame of poc=9, a B frame of poc=11, a B frame of poc=13, and a B frame of poc=15 need to be decoded. Alternatively, the client may decode the B frame of poc=1, the B frame of poc=3, the B frame of poc=5, the B frame of poc=7, the B frame of poc=9, the B frame of poc=11, the B frame of poc=13, and the B frame of poc=15 at the same time, or may decode the B frame of poc=1, the B frame of poc=3, the B frame of poc=5, the B frame of poc=7, the B frame of poc=9, the B frame of poc=11, the B frame of poc=13, and the B frame of poc=15 in order.
Specifically, the image after I-frame decoding and the image after B-frame decoding with poc=2 may be used as reference images of B-frames with poc=1 to decode the B-frames with poc=1, thereby obtaining the image after B-frame decoding with poc=1.
Similarly, a B frame with poc=3 may be decoded using a B frame decoded with poc=2 and a B frame decoded with poc=4 as reference images of a B frame with poc=3, to obtain a B frame decoded with poc=3.
Similarly, a B frame with poc=5 may be decoded using a B frame decoded with poc=4 and a B frame decoded with poc=6 as reference images of a B frame with poc=5, to obtain a B frame decoded with poc=5.
Similarly, a B frame with poc=7 may be decoded using a B frame decoded with poc=6 and a B frame decoded with poc=8 as reference images of a B frame with poc=7, to obtain a B frame decoded with poc=7.
Similarly, a B frame with poc=9 may be decoded using a B frame decoded with poc=8 and a B frame decoded with poc=10 as reference images of a B frame with poc=9, to obtain a B frame decoded with poc=9.
Similarly, a B frame of poc=11 may be decoded using a B frame decoded image of poc=10 and a B frame decoded image of poc=12 as reference images of a B frame of poc=11, to obtain a B frame decoded image of poc=11.
Similarly, a B frame of poc=13 may be decoded using a B frame decoded image of poc=12 and a B frame decoded image of poc=14 as reference images of a B frame of poc=13, to obtain a B frame decoded image of poc=13.
Similarly, a B frame with poc=15 may be decoded using a B frame decoded image with poc=14 and a P frame decoded image as reference images of a B frame with poc=15, to obtain a B frame decoded image with poc=15. Alternatively, the level information of the B frame of poc=1, the B frame of poc=3, the B frame of poc=5, the B frame of poc=7, the B frame of poc=9, the B frame of poc=11, the B frame of poc=13, and the B frame of poc=15 may be set to "layer 4".
In the embodiment of the present application, a Coding Unit (CU) currently processed may perform quadtree partitioning of leaf nodes by decoding a macroblock (CTU) root node in an image frame, where the CU is a basic Unit of predictive Coding.
In the embodiment of the present application, before executing the scheme of reducing the complexity of encoding and decoding by skipping the decoding process of the preset prediction mode for the encoding unit in steps S201 to S205, the encoding process of the encoding unit needs to be set.
Fig. 7 is a flowchart illustrating a method of image frame prediction in an encoding process according to an exemplary embodiment, as shown in fig. 7, including the following steps S701-707:
in step S701, a currently encoded encoding unit in an encoded image frame is determined.
In the embodiment of the application, the encoded image frame can be any one frame of an I frame, a P frame and a B frame in the encoding process. The coding unit CU that codes the current coding in an image frame is the basic unit of predictive coding.
In step S703, hierarchy information is determined from the header of the encoded image frame.
Alternatively, it is possible to determine the encoded image frame to which the encoded encoding unit belongs and determine the hierarchy information from the frame header. For example, the hierarchical information of poc=1 shown in fig. 6 is "layer 4".
In step S705, if the level information satisfies the preset level information, the coding mode of the currently coded coding unit is determined.
Alternatively, the preset hierarchy information may be determined according to the size of the group of image frames GOP and a preset prediction mode.
In an alternative embodiment, the pre-set prediction mode comprises affine motion pre-setAccording to the Affine Prediction mode, the affine motion prediction mode can measure the modes (Affine Prediction) of non-translational motion such as rotation, scaling and the like, so that the compression efficiency of non-translational motion content in video is greatly improved. Fig. 8 is a schematic diagram of a prediction unit according to an exemplary embodiment, and as shown in fig. 8, the Affine prediction technique uses 4x4 sub-blocks as prediction units and proposes a concept of CPMV (control point motion vector), which includes MV motion vectors (v 0 And v 1 ),v 0 And v 1 And respectively obtaining a CPMV from candidate lists formed by MVs of adjacent coding units on the left upper side and the right upper side of the current coding unit by screening RDCost, respectively calculating MVs of each 4x4 sub-block by a decoding end through an affine transformation formula in the formula 1 after obtaining the CPMV, and sequentially carrying out motion compensation on the sub-blocks to obtain a predicted value of the whole coding unit. At the encoding end, the motion search based on the sub-blocks is performed by the Affine mode, which is also the reason that the complexity of the Affine mode is higher.
In step S707, if the encoding mode is a preset prediction mode, the execution flag of the preset prediction position of the currently encoded encoding unit is set as the skip flag.
Alternatively, the encoding end may determine the encoding mode of the encoding unit by calculating RDCost, and if the encoding mode is a preset prediction mode (such as an affine motion prediction mode), the execution flag of the preset prediction position of the currently encoded encoding unit CU may be set to be a skip flag. Skip flag characterization in the encoding/decoding process, the process of skipping the preset prediction mode of the coding unit, that is, the coding process of the affine motion prediction mode of the coding unit is skipped. The preset prediction position may be an Affine flag bit in the CU.
This is because the high-level image frames tend to be closer to the reference frame, and the inter-frame prediction accuracy is higher than that of the low-level frames, so that Affine motion prediction modes (Affine modes) in the high-level image frames can be limited, and the purpose of reducing complexity at the encoding end and decoding end can be achieved.
In another alternative embodiment, the encoding end may determine the encoding mode of the encoding unit by calculating RDCost, and if the encoding mode is a preset prediction mode (such as an affine motion prediction mode), the encoding end adds an execution identifier to the frame header of the encoded image frame; the execution flag indicates a skip flag. Skip flag characterization in the codec process, skip the preset prediction of the coding unit, i.e., skip the coding process of affine motion prediction mode of the coding unit.
Compared with the Affine flag bit with the skip mark arranged in the CU, as a plurality of CUs may exist in each image frame, the skip mark is uniformly arranged at the frame head of the image frame, so that the complexity of the coding end and the decoding end can be reduced, the field for marking can be reduced, and although partial performance reduction exists in the prediction accuracy, the performance improvement is realized in the field.
In the embodiment of the application, after the level information satisfies the Affine flag bit in the CU of the image frame of the preset level information or the skip flag is set in the frame header of the image frame, the image frame is transmitted to the decoding end along with the code stream. Thus, the decoding end can determine the currently processed coding unit in the decoded image frame.
In step S203, if the decoding mode of the encoding unit is the preset prediction mode, the execution flag is determined from the preset position of the decoded image frame.
In an alternative embodiment, the preset prediction mode may be an affine motion prediction mode. If the decoding mode of the encoding unit is that, the execution flag may be determined from a preset position of the decoded image frame. Optionally, the preset position of the decoded image frame is a preset predicted position of the coding unit, such as an Affine flag bit in the CU. Optionally, the preset position of the decoded image frame is a frame header of the decoded image frame.
In step S205, if the skip flag is executed, the decoding process of performing the preset prediction mode on the encoding unit is skipped.
In an alternative embodiment, if the identification is the skip identification, the decoding process of performing the preset prediction mode on the coding unit is skipped, and the next coding unit is processed.
When the preset position of the decoded image frame is the frame head of the decoded image frame, the decoding processing of the preset prediction mode of the coding unit is skipped, and meanwhile, the execution identification of the preset prediction position (Affine zone bit) of the coding unit except the currently processed coding unit in the decoded image frame can be set as the skip identification, so that when the decoding end processes to the coding unit except the currently processed coding unit, and the decoding mode of the coding unit is determined to be the preset prediction mode, the decoding mode can be directly read from the Affine zone bit of the CU without returning to the frame head of the image frame for reading.
Optionally, if the encoding end sets a skip identifier in a frame header of an image frame whose level information satisfies the preset level information, the decoding end may directly read the frame header information of the image frame after the image frame is transmitted to the decoding end along with the code stream, and if the execution identifier for the Affine motion prediction mode included in the frame header information is the skip identifier, the skip identifier may be set to an Affine flag bit of each CU, so that the decoding end may directly read the skip identifier from the Affine flag bit when the decoding end processes the inter-frame prediction of each CU and the inter-frame prediction is the Affine motion prediction mode, and introduce the search operation.
Fig. 9 is a block diagram of an image frame prediction apparatus according to an exemplary embodiment. The device has the function of realizing the data processing method in the method embodiment, and the function can be realized by hardware or can be realized by executing corresponding software by hardware. Referring to fig. 9, the apparatus includes an encoding unit determination module 901, an identification determination module 902, and a processing module 903.
An encoding unit determination module 901 configured to perform determination of an encoding unit currently processing in a decoded image frame;
an identification determining module 902 configured to determine an execution identification from a preset position of the decoded image frame if the decoding mode of the encoding unit is a preset prediction mode;
the processing module 903 is configured to skip the decoding process of the preset prediction mode for the coding unit if the execution flag is the skip flag.
In some possible embodiments, the identification determination module is configured to perform:
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset prediction position of the coding unit.
In some possible embodiments, the identification determination module is configured to perform:
if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a frame header of the decoded image frame;
If the execution mark is a skip mark; after skipping the decoding process of the preset prediction mode on the coding unit, the method further comprises:
an execution flag of a preset prediction position of a coding unit other than the currently processed coding unit in the decoded image frame is set as a skip flag.
In some possible embodiments, the apparatus further comprises:
a first determination module configured to perform determining an encoding unit that encodes a current encoding in the image frame;
a second determination module configured to perform determination of hierarchy information from a header of the encoded image frame;
the third determining module is configured to execute the determination of the coding mode of the currently coded coding unit if the level information meets the preset level information;
the setting module is configured to execute setting the execution identification of the preset prediction position of the currently encoded encoding unit as the skip identification if the encoding mode is the preset prediction mode.
In some possible embodiments, the setting module is configured to perform:
if the coding mode is a preset prediction mode, adding an execution identifier to the frame head of the coded image frame; the execution identification comprises a skip identification; skip flag characterization in the coding/decoding process, the preset prediction mode processing for the coding unit is skipped.
In some possible embodiments, the preset prediction mode includes an affine motion prediction mode.
It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.
Fig. 10 is a block diagram illustrating an apparatus 3000 for image frame prediction according to an exemplary embodiment. For example, apparatus 3000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, or the like.
Referring to fig. 10, the apparatus 3000 may include one or more of the following components: a processing component 3002, a memory 3004, a power component 3006, a multimedia component 3008, an audio component 3010, an input/output (I/O) interface 3012, a sensor component 3014, and a communications component 3016.
The processing component 3002 generally controls overall operations of the device 3000, such as operations associated with display, phone calls, data communications, camera operations, and recording operations. The processing assembly 3002 may include one or more processors 3020 to execute instructions to perform all or part of the steps of the methods described above. Further, the processing component 3002 may include one or more modules to facilitate interactions between the processing component 3002 and other components. For example, the processing component 3002 may include a multimedia module to facilitate interaction between the multimedia component 3008 and the processing component 3002.
The memory 3004 is configured to store various types of data to support operations at the device 3000. Examples of such data include instructions for any application or method operating on device 3000, contact data, phonebook data, messages, pictures, videos, and the like. The memory 3004 may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk.
The power supply assembly 3006 provides power to the various components of the device 3000. The power supply components 3006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device 3000.
The multimedia component 3008 includes a screen between the device 3000 and the user that provides an output interface. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or slide action, but also the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia assembly 3008 includes a front camera and/or a rear camera. The front camera and/or the rear camera may receive external multimedia data when the device 3000 is in an operational mode, such as a photographing mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have focal length and optical zoom capabilities.
The audio component 3010 is configured to output and/or input audio signals. For example, audio component 3010 includes a Microphone (MIC) configured to receive external audio signals when device 3000 is in an operational mode, such as a call mode, a recording mode, and a speech recognition mode. The received audio signals may be further stored in the memory 3004 or transmitted via the communication component 3016. In some embodiments, the audio component 3010 further comprises a speaker for outputting audio signals.
The I/O interface 3012 provides an interface between the processing component 3002 and a peripheral interface module, which may be a keyboard, click wheel, button, or the like. These buttons may include, but are not limited to: homepage button, volume button, start button, and lock button.
The sensor assembly 3014 includes one or more sensors for providing status assessment of various aspects of the device 3000. For example, sensor assembly 3014 may detect the on/off state of device 3000, the relative positioning of the components, such as the display and keypad of device 3000, sensor assembly 3014 may also detect the change in position of device 3000 or a component of device 3000, the presence or absence of user contact with device 3000, the orientation or acceleration/deceleration of device 3000, and the change in temperature of device 3000. The sensor assembly 3014 may include a proximity sensor configured to detect the presence of nearby objects in the absence of any physical contact. The sensor assembly 3014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 3014 may also include an acceleration sensor, a gyroscopic sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
The communication component 3016 is configured to facilitate wired or wireless communication between the apparatus 3000 and other devices. The device 3000 may access a wireless network based on a communication standard, such as WiFi, an operator network (e.g., 2G, 3G, 4G, or 5G), or a combination thereof. In one exemplary embodiment, the communication component 3016 receives broadcast signals or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component 3016 further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies.
In an exemplary embodiment, the apparatus 3000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic elements for executing the methods described above.
Embodiments of the present invention also provide a computer readable storage medium that may be disposed in an electronic device to hold at least one instruction or at least one program related to implementing an image frame prediction method, the at least one instruction or the at least one program being loaded and executed by the processor to implement the image frame prediction method provided by the above method embodiments.
Embodiments of the present invention also provide a computer program product comprising a computer program stored in a readable storage medium, from which at least one processor of the computer device reads and executes the computer program, causing the computer device to perform the method of any of the first aspects of the embodiments of the present disclosure.
It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. And the foregoing description has been directed to specific embodiments of this specification. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments in part.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (10)

1. An image frame prediction method, comprising:
determining a currently processed coding unit in the decoded image frame;
if the decoding mode of the encoding unit is a preset prediction mode, determining an execution identifier from a preset position of the decoded image frame;
and if the execution identifier is a skip identifier, skipping the decoding processing of the preset prediction mode of the coding unit.
2. The image frame prediction method according to claim 1, wherein determining the execution flag from the preset position of the decoded image frame if the decoding mode of the encoding unit is a preset prediction mode, comprises:
And if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset prediction position of the coding unit.
3. The image frame prediction method according to claim 1, wherein determining the execution flag from the preset position of the decoded image frame if the decoding mode of the encoding unit is a preset prediction mode, comprises:
if the decoding mode of the encoding unit is a preset prediction mode, determining an execution identifier from a frame header of the decoded image frame;
if the execution identifier is a skip identifier; after skipping the decoding process of the preset prediction mode for the coding unit, the method further includes:
and setting an execution identification of a preset prediction position of a coding unit other than the currently processed coding unit in the decoded image frame as the skip identification.
4. A method of image frame prediction according to any one of claims 1 to 3, wherein prior to said determining the currently processed coding unit in the decoded image frame, further comprising:
determining a coding unit that codes a current code in an image frame;
determining hierarchy information from a header of the encoded image frame;
If the level information meets preset level information, determining the coding mode of the currently coded coding unit;
and if the coding mode is a preset prediction mode, setting an execution mark of a preset prediction position of the currently coded coding unit as the skip mark.
5. The method according to claim 4, wherein after determining the coding mode of the currently coded coding unit if the level information satisfies a preset level information, further comprising:
if the coding mode is a preset prediction mode, adding an execution identifier to the frame head of the coded image frame; the execution identification comprises a skip identification; the skip identifier indicates that the preset prediction mode processing of the coding unit is skipped in the coding and decoding process.
6. A method of image frame prediction according to any one of claims 1 to 3, wherein the predetermined prediction mode comprises an affine motion prediction mode.
7. An image frame prediction apparatus, comprising:
an encoding unit determination module configured to perform determination of an encoding unit currently processing in the decoded image frame;
an identification determining module configured to perform an execution identification from a preset position of the decoded image frame if a decoding mode of the encoding unit is a preset prediction mode;
And the processing module is configured to execute the decoding processing of skipping the preset prediction mode of the coding unit if the execution identifier is a skipping identifier.
8. The image frame prediction apparatus according to claim 1, wherein the identification determination module is configured to perform:
and if the decoding mode of the coding unit is a preset prediction mode, determining an execution identifier from a preset prediction position of the coding unit.
9. An electronic device, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the image frame prediction method of any one of claims 1 to 6.
10. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the image frame prediction method of any one of claims 1 to 6.
CN202310730347.0A 2023-06-19 2023-06-19 Image frame prediction method and device, electronic equipment and storage medium Pending CN116980620A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310730347.0A CN116980620A (en) 2023-06-19 2023-06-19 Image frame prediction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310730347.0A CN116980620A (en) 2023-06-19 2023-06-19 Image frame prediction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116980620A true CN116980620A (en) 2023-10-31

Family

ID=88475776

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310730347.0A Pending CN116980620A (en) 2023-06-19 2023-06-19 Image frame prediction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116980620A (en)

Similar Documents

Publication Publication Date Title
US11568545B2 (en) Compressed content object and action detection
US10528819B1 (en) Compressed content object and action detection
US11093752B2 (en) Object tracking in multi-view video
CN110351553B (en) Video back-playing and video back-playing data processing method and device and computer equipment
CN110536168B (en) Video uploading method and device, electronic equipment and storage medium
CN109429064B (en) Video data coding and decoding method, device and medium
CN113965751B (en) Screen content coding method, device, equipment and storage medium
US10284850B2 (en) Method and system to control bit rate in video encoding
US20240121421A1 (en) Motion vector obtaining method and apparatus
CN109120929B (en) Video encoding method, video decoding method, video encoding device, video decoding device, electronic equipment and video encoding system
CN113794903A (en) Video image processing method and device and server
CN113099272A (en) Video processing method and device, electronic equipment and storage medium
CN110611820A (en) Video coding method and device, electronic equipment and storage medium
US10051281B2 (en) Video coding system with efficient processing of zooming transitions in video
CN110572723A (en) Thumbnail generation method and related device
CN113613003A (en) Video compression method, video decompression method, video compression device, video decompression device, electronic equipment and storage medium
US10536726B2 (en) Pixel patch collection for prediction in video coding system
CN116980620A (en) Image frame prediction method and device, electronic equipment and storage medium
CN115361582B (en) Video real-time super-resolution processing method, device, terminal and storage medium
CN116996676A (en) Reference image ordering method and device, electronic equipment and storage medium
Liu et al. Hierarchical motion-compensated deep network for video compression
CN112565763A (en) Abnormal image sample generation method and device, and image detection method and device
US9451288B2 (en) Inferred key frames for fast initiation of video coding sessions
Jubran et al. Sequence-level reference frames in video coding
US20240185075A1 (en) Generative video compression with a transformer-based discriminator

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination