CN112738529A - Inter-frame prediction method, device, equipment, storage medium and program product - Google Patents

Inter-frame prediction method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN112738529A
CN112738529A CN202011557753.4A CN202011557753A CN112738529A CN 112738529 A CN112738529 A CN 112738529A CN 202011557753 A CN202011557753 A CN 202011557753A CN 112738529 A CN112738529 A CN 112738529A
Authority
CN
China
Prior art keywords
reference frame
current
optimal
target
motion vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011557753.4A
Other languages
Chinese (zh)
Other versions
CN112738529B (en
Inventor
邹箭
丁文鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011557753.4A priority Critical patent/CN112738529B/en
Publication of CN112738529A publication Critical patent/CN112738529A/en
Application granted granted Critical
Publication of CN112738529B publication Critical patent/CN112738529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria

Abstract

The disclosure discloses inter-frame prediction methods, devices, equipment, storage media and program products, and relates to the technical fields of computer vision, cloud computing and the like. One embodiment of the method comprises: acquiring a coded reference frame sequence adjacent to a pixel block to be coded; determining a current search starting point from a reference frame which is closest to a pixel block to be coded in a reference frame sequence; performing a motion estimation process: searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree. This embodiment reduces the amount of computation of the motion estimation process.

Description

Inter-frame prediction method, device, equipment, storage medium and program product
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of computer vision, cloud computing, and the like, and in particular, to an inter-frame prediction method, apparatus, device, storage medium, and program product.
Background
HEVC (High Efficiency Video Coding) is a new generation of Video Coding compression standard. Compared with the previous generation H.264/AVC (advanced video coding) standard, the method can save nearly 50% of code rate under the same definition. Therefore, HEVC can be widely applied to fields related to video compression, such as live broadcasting, on demand, and the like. HEVC is mainly composed of main techniques such as prediction, transformation, quantization, loop filtering, entropy coding, etc. Among them, prediction is an important module of an encoder, and is divided into intra prediction and inter prediction. The intra-frame prediction refers to a method for predicting a pixel block to be encoded in a certain manner by using a reconstructed pixel value of an image block encoded in the same frame image. Inter-frame prediction refers to a method of using a pixel block in an already encoded forward or backward reference frame for prediction of a pixel block to be encoded. Currently, inter-frame prediction uses a block-by-block matching method to obtain the best matching block in the reference frame, and this process is called motion estimation.
Disclosure of Invention
The present disclosure provides an inter-frame prediction method, apparatus, device, storage medium, and program product.
According to a first aspect of the present disclosure, there is provided an inter prediction method, including obtaining a sequence of encoded reference frames adjacent to a pixel block to be encoded; determining a current search starting point from a reference frame which is closest to a pixel block to be coded in a reference frame sequence; performing a motion estimation process: searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree.
According to a second aspect of the present disclosure, there is provided an inter prediction apparatus including: an acquisition module configured to acquire a sequence of encoded reference frames adjacent to a block of pixels to be encoded; a determining module configured to determine a current search starting point starting from a reference frame closest to a pixel block to be encoded in a sequence of reference frames; an estimation module configured to perform a motion estimation process: searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; and the output module is configured to end the motion estimation process and output the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform the method as described in any one of the implementations of the first aspect.
According to a fifth aspect of the present disclosure, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the method as described according to any of the implementations of the first aspect.
The inter-frame prediction method, the device, the equipment, the storage medium and the program product provided by the present disclosure are characterized in that firstly, a current search starting point is determined from a reference frame which is closest to a pixel block to be coded in a coded reference frame sequence adjacent to the pixel block to be coded; then executing a motion estimation process; searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; and finally, if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree. As most target optimal matching blocks fall on a few reference frames which are nearest or next to a pixel block to be coded, an early exit mechanism of motion estimation of multiple reference frames is provided, the operation amount of the motion estimation process under the condition of multiple reference frames is reduced, and the coding and transcoding efficiency is further improved.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to constitute a limitation on the present disclosure. Wherein:
FIG. 1 is a flow diagram of one embodiment of an inter-prediction method according to the present disclosure;
FIG. 2 is a flow diagram of yet another embodiment of an inter-prediction method according to the present disclosure;
fig. 3 is a scene diagram of an inter prediction method in which an embodiment of the present disclosure may be implemented.
FIG. 4 is a schematic block diagram illustrating an embodiment of an inter-prediction apparatus according to the present disclosure;
fig. 5 is a block diagram of an electronic device for implementing an inter prediction method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 shows a flow diagram of one embodiment of an inter prediction method according to the present disclosure. The inter-frame prediction method comprises the following steps:
step 101, obtaining an encoded reference frame sequence adjacent to a pixel block to be encoded.
In the present embodiment, the execution subject of the inter prediction method may acquire a sequence of encoded reference frames adjacent to a pixel block to be encoded.
Generally, video compression is required for video in services such as live broadcasting and on-demand broadcasting. HEVC is a new generation of video coding compression standard, and mainly consists of main technologies such as prediction, transformation, quantization, loop filtering, entropy coding, and the like. Among them, prediction is an important module of an encoder, and is divided into intra prediction and inter prediction. The intra-frame prediction refers to a method for predicting a pixel block to be encoded in a certain manner by using a reconstructed pixel value of an image block encoded in the same frame image. Inter-frame prediction refers to a method of using a pixel block in an already encoded forward or backward reference frame for prediction of a pixel block to be encoded. A video frame in a video may be divided into a plurality of pixel blocks, and the pixel blocks in the video frame to be encoded are the pixel blocks to be encoded. The forward or backward reference frame of the video frame to be encoded is the reference frame adjacent to the pixel block to be encoded, and here, the encoded reference frame sequence adjacent to the pixel block to be encoded is obtained. The pixel block in the encoded reference frame may also be obtained by encoding after inter-frame prediction is performed by using the inter-frame prediction method provided by the present disclosure.
Inter prediction, which obtains the best matching block in a sequence of adjacent encoded reference frames, is called motion estimation. The motion estimation process outputs information on the displacement of the matching block with respect to the block of pixels to be encoded, called motion vector. The motion vector is so represented by two vector values in the x and y directions. The realization proves that the motion estimation process occupies huge computation amount and is the most time-consuming part in coding and transcoding, so the motion estimation process is improved to reduce the computation amount, and the coding and transcoding efficiency can be improved. The matching block is a pixel block matched with a pixel block to be coded in the reference frame. The best pixel block is the matching block with the least distortion.
Step 102, starting from a reference frame closest to a pixel block to be encoded in a reference frame sequence, determining a current search starting point.
In this embodiment, the execution body may determine the current search starting point starting from a reference frame closest to a pixel block to be encoded in the sequence of reference frames. Here, the current search starting point is a search starting point on a reference frame closest to the pixel block to be encoded.
In the prior art, x265 is the best open source HEVC encoder, and the motion estimation principle of multiple reference frames is to sequentially traverse reference frames in an adjacent encoded reference frame sequence, and calculate a best matching reference frame and a best matching block by combining with the weight of rate distortion. Wherein the best matching reference frame is the reference frame where the best matching block is located. It was found through testing that although x265 performs motion estimation of multiple reference frames, most of the best matching blocks fall on the nearest or next-to-nearest reference frames to the pixel block to be encoded, and motion search of other distant reference frames contributes little to the encoding process. Therefore, the embodiment of the present application performs the motion search starting from the reference frame closest to the pixel block to be encoded.
Step 103, executing a motion estimation process: and searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree.
In this embodiment, the execution subject may execute a motion estimation process. For any reference frame in the adjacent encoded reference frame sequence, the execution body may perform a matching block search for motion estimation from a current search starting point of the reference frame, obtain a current best matching block, a current best motion vector, and a current best distortion, and record a target best reference frame index number, a target best motion vector, and a target best distortion.
Specifically, the execution main body may match pixel blocks in the reference frame one by one from the current search starting point, calculate a motion vector and a distortion degree corresponding to each pixel block, and select a pixel block with the smallest distortion degree as the current best matching block. The motion vector corresponding to the current best matching block is the current best motion vector. The distortion factor corresponding to the current best matching block is the current best distortion factor.
In addition, the execution body can record a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion factor. This information is updated in time during the motion estimation process to ensure accuracy.
And step 104, if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree.
In this embodiment, the executing entity may calculate a difference between the current optimal distortion and the target optimal distortion, and compare the difference with a preset threshold. If the target frame index number is larger than the preset threshold value, the motion estimation process is ended, and the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion factor are output.
Here, the threshold value may be used as a basis for early exit of the motion estimation process to reduce the amount of computation in the motion estimation process. And under the condition that the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, the fact that the subsequent reference frame is far away from the pixel block to be coded and belongs to an invalid reference frame is shown. At this time, the motion estimation process is exited, not only the matching block search result of the motion estimation is not influenced, but also a huge amount of computation can be saved. The threshold value can be determined according to a test statistic method and has universality.
The inter-frame prediction method provided by the embodiment of the disclosure comprises the steps of firstly, determining a current search starting point from a reference frame closest to a pixel block to be coded in a coded reference frame sequence adjacent to the pixel block to be coded; then executing a motion estimation process; searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; and finally, if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree. As most target optimal matching blocks fall on a few reference frames which are nearest or next to a pixel block to be coded, an early exit mechanism of motion estimation of multiple reference frames is provided, the operation amount of the motion estimation process under the condition of multiple reference frames is reduced, and the coding and transcoding efficiency is further improved.
With continued reference to fig. 2, a flow 200 of yet another embodiment of an inter prediction method according to the present disclosure is shown. The inter-frame prediction method comprises the following steps:
step 201, obtaining an encoded reference frame sequence adjacent to a pixel block to be encoded.
In the present embodiment, the execution subject of the inter prediction method may acquire a sequence of encoded reference frames adjacent to a pixel block to be encoded.
Generally, video compression is required for video in services such as live broadcasting and on-demand broadcasting. HEVC is a new generation of video coding compression standard, and mainly consists of main technologies such as prediction, transformation, quantization, loop filtering, entropy coding, and the like. Among them, prediction is an important module of an encoder, and is divided into intra prediction and inter prediction. The intra-frame prediction refers to a method for predicting a pixel block to be encoded in a certain manner by using a reconstructed pixel value of an image block encoded in the same frame image. Inter-frame prediction refers to a method of using a pixel block in an already encoded forward or backward reference frame for prediction of a pixel block to be encoded. A video frame in a video may be divided into a plurality of pixel blocks, and the pixel blocks in the video frame to be encoded are the pixel blocks to be encoded. The forward or backward reference frame of the video frame to be encoded is the reference frame adjacent to the pixel block to be encoded, and here, the encoded reference frame sequence adjacent to the pixel block to be encoded is obtained. The pixel block in the encoded reference frame may also be obtained by encoding after inter-frame prediction is performed by using the inter-frame prediction method provided by the present disclosure.
Inter prediction, which obtains the best matching block in a sequence of adjacent encoded reference frames, is called motion estimation. The motion estimation process outputs information on the displacement of the matching block with respect to the block of pixels to be encoded, called motion vector. The motion vector is so represented by two vector values in the x and y directions. The realization proves that the motion estimation process occupies huge computation amount and is the most time-consuming part in coding and transcoding, so the motion estimation process is improved to reduce the computation amount, and the coding and transcoding efficiency can be improved. The matching block is a pixel block matched with a pixel block to be coded in the reference frame. The best pixel block is the matching block with the least distortion.
Step 202, starting from the reference frame closest to the pixel block to be encoded in the reference frame sequence, determining the current search starting point.
In this embodiment, the execution body may determine the current search starting point starting from a reference frame closest to a pixel block to be encoded in the sequence of reference frames. Here, the current search starting point is a search starting point on a reference frame closest to the pixel block to be encoded.
In the prior art, x265 is the best open source HEVC encoder, and the motion estimation principle of multiple reference frames is to sequentially traverse reference frames in an adjacent encoded reference frame sequence, and calculate a best matching reference frame and a best matching block by combining with the weight of rate distortion. Wherein the best matching reference frame is the reference frame where the best matching block is located. It was found through testing that although x265 performs motion estimation of multiple reference frames, most of the best matching blocks fall on the nearest or next-to-nearest reference frames to the pixel block to be encoded, and motion search of other distant reference frames contributes little to the encoding process. Therefore, the embodiment of the present application performs the motion search starting from the reference frame closest to the pixel block to be encoded.
In this embodiment, the execution body may calculate a search start point of Motion estimation by acquiring a Motion Vector Prediction (MVP).
In some optional implementation manners of this embodiment, the execution main body may first divide the reference frame into CTUs (Coding Tree units) to obtain a CTU set; then, for each CTU in the CTU set, dividing a CU (Coding Unit) to obtain a CU set; and finally, performing motion search on the CUs in the CU set according to a preset scanning sequence at the level of the CUs, so that a search starting point on the reference frame can be determined. Common CU-level motion vector prediction modes may include, but are not limited to, ATMVP (optional temporal motion vector prediction) and STMVP (spatial-temporal motion vector prediction).
In some optional implementations of this embodiment, the execution main body may first determine a plurality of candidate predicted Motion vectors in the reference frame based on an AMVP (Advanced Motion Vector Prediction) technique; then, a candidate predicted motion vector with the minimum rate distortion cost is selected from the plurality of candidate predicted motion vectors as a predicted motion vector, and the position pointed by the predicted motion vector is used as a current search starting point.
Step 203, executing a motion estimation process: and searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree.
In this embodiment, the execution subject may execute a motion estimation process. For any reference frame in the adjacent encoded reference frame sequence, the execution body may perform a matching block search for motion estimation from a current search starting point of the reference frame, obtain a current best matching block, a current best motion vector, and a current best distortion, and record a target best reference frame index number, a target best motion vector, and a target best distortion.
Specifically, the execution main body may match pixel blocks in the reference frame one by one from the current search starting point, calculate a motion vector and a distortion degree corresponding to each pixel block, and select a pixel block with the smallest distortion degree as the current best matching block. The motion vector corresponding to the current best matching block is the current best motion vector. The distortion factor corresponding to the current best matching block is the current best distortion factor.
In addition, the execution body can record a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion factor. This information is updated in time during the motion estimation process to ensure accuracy. Specifically, the execution body may compare the current optimal distortion degree with a target optimal distortion degree recorded before. If the current optimal distortion is smaller than the target optimal distortion recorded before, the execution body may update the index number of the current reference frame to the index number of the target optimal reference frame, update the current optimal motion vector to the target optimal motion vector, and update the current optimal distortion to the target optimal distortion. If the current optimal distortion is not less than the target optimal distortion recorded before, the execution main body reserves the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion recorded before.
Step 204, determining whether the difference between the current optimal distortion and the target optimal distortion is greater than a preset threshold.
In this embodiment, the main body of the inter-frame prediction method may calculate a difference between the current optimal distortion and the target optimal distortion, and compare the difference with a preset threshold. If the threshold value is greater than the preset threshold value, go to step 205; if not, go to step 206. The threshold value can be determined according to a test statistic method and has universality. For example, it may be determined according to a reference sequence given by the relevant standard of HEVC.
Step 205, ending the motion estimation process, and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion factor.
In this embodiment, if the difference between the current optimal distortion and the target optimal distortion is greater than the preset threshold, the execution body may end the motion estimation process and output the target optimal reference frame index number, the target optimal motion vector, and the target optimal distortion. And under the condition that the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, the fact that the subsequent reference frame is far away from the pixel block to be coded and belongs to an invalid reference frame is shown. At this time, the motion estimation process is exited, not only the matching block search result of the motion estimation is not influenced, but also a huge amount of computation can be saved.
At step 206, a current search starting point is determined from the next reference frame in the sequence of reference frames.
In this embodiment, if the difference between the current optimal distortion and the target optimal distortion is not greater than the preset threshold, the executing entity may determine the current search starting point from the next reference frame in the sequence of reference frames, and continue to execute step 203 to continue the motion estimation process. And under the condition that the difference value between the current optimal distortion degree and the target optimal distortion degree is not larger than a preset threshold value, the fact that a subsequent reference frame still has a reference frame which is closer to the pixel block to be coded and an effective reference frame still exists is shown. At this time, the accuracy of the search result of the matching block of the motion estimation can be ensured by entering the motion estimation process of the next reference frame.
Generally, after the pixel block prediction of all video frames in the video is completed, the operations of transformation, quantization, loop filtering, entropy coding, and the like can be continued to complete the coding and transcoding of the video.
As can be seen from fig. 2, the flow 200 of the inter-frame prediction method in the present embodiment adds a step of continuing the motion estimation process, compared with the corresponding embodiment of fig. 1. Therefore, the scheme described in the embodiment can ensure the accuracy of the search result of the matching block of the motion estimation by entering the motion estimation process of the next reference frame under the condition that the effective reference frame exists.
With further reference to fig. 3, a scene diagram of an inter prediction method is shown in which embodiments of the present disclosure may be implemented. As shown in fig. 3, for a video frame to be encoded with index number 5 in a video, a sequence of encoded reference frames with index numbers 1-4 is obtained. For a pixel block to be coded in a video frame to be coded, starting from a reference frame with the index number of 4, determining a current search starting point of the reference frame with the index number of 4. Performing a motion estimation process: searching a matching block of motion estimation from a current searching starting point of a reference frame with the index number of 4, acquiring a current best matching block, a current best motion vector and a current best distortion of the reference frame with the index number of 4, recording the index number of 4 as a target best reference frame index number, recording the current best motion vector of the reference frame with the index number of 4 as a target best motion vector, and recording the current best distortion of the reference frame with the index number of 4 as a target best distortion. Further, determining a current search starting point of the reference frame with the index number of 3, and continuing to execute the motion estimation process: and if the current optimal distortion degree of the reference frame with the index number of 3 is not less than the target optimal distortion degree, the index number of the target optimal reference frame, the target optimal motion vector and the target optimal distortion degree which are recorded before are reserved. If the difference value between the current optimal distortion degree and the target optimal distortion degree of the reference frame with the index number of 3 is larger than a preset threshold value, ending the motion estimation process, and outputting the index number of the target optimal reference frame, the target optimal motion vector and the target optimal distortion degree.
With further reference to fig. 4, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of an inter-frame prediction apparatus, which corresponds to the embodiment of the method shown in fig. 1, and which can be applied in various electronic devices.
As shown in fig. 4, the inter prediction apparatus 400 of the present embodiment may include: an acquisition module 401, a determination module 402, an estimation module 403 and an output module 404. Wherein, the obtaining module 401 is configured to obtain a sequence of encoded reference frames adjacent to a pixel block to be encoded; a determining module 402 configured to determine a current search starting point starting from a reference frame closest to a pixel block to be encoded in the sequence of reference frames; an estimation module 403 configured to perform a motion estimation procedure: searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; the output module 404 is configured to end the motion estimation process and output the target best reference frame index number, the target best motion vector and the target best distortion if the difference between the current best distortion and the target best distortion is greater than a preset threshold.
In the present embodiment, in the inter prediction apparatus 400: the specific processing of the obtaining module 401, the determining module 402, the estimating module 403 and the outputting module 404 and the technical effects thereof can refer to the related descriptions of step 101 and step 104 in the corresponding embodiment of fig. 1, which are not described herein again.
In some optional implementations of this embodiment, the inter-frame prediction apparatus 400 further includes: and the execution module is configured to determine a current search starting point from a next reference frame in the reference frame sequence and continue to execute the motion estimation process if the difference value between the current optimal distortion degree and the target optimal distortion degree is not greater than a preset threshold value.
In some optional implementations of this embodiment, the estimation module 403 is further configured to: if the current optimal distortion degree is smaller than the target optimal distortion degree, the index number of the current reference frame is updated to be the index number of the target optimal reference frame, the current optimal motion vector is updated to be the target optimal motion vector, and the current optimal distortion degree is updated to be the target optimal distortion degree.
In some optional implementations of this embodiment, the determining module 402 is further configured to: dividing a reference frame into a CTU set; carrying out coding unit CU division on CTUs in the CTU set to obtain a CU set; and performing motion search on the CUs in the CU set according to a preset scanning sequence, and determining a current search starting point.
In some optional implementations of this embodiment, the determining module 402 is further configured to: determining a plurality of candidate predicted motion vectors in a reference frame based on an advanced motion vector prediction technique; selecting a candidate prediction motion vector with the minimum rate distortion cost from the plurality of candidate prediction motion vectors as a prediction motion vector, and taking the position pointed by the prediction motion vector as a current search starting point.
In some optional implementations of the present embodiment, the threshold value is determined according to a reference sequence given by the relevant standard of high efficiency video coding HEVC.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
FIG. 5 illustrates a schematic block diagram of an example electronic device 500 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 comprises a computing unit 501 which may perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM)502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The calculation unit 501, the ROM 502, and the RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
A number of components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, or the like; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508, such as a magnetic disk, optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general-purpose and/or special-purpose processing components having processing and computing capabilities. Some examples of the computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 501 performs the respective methods and processes described above, such as an inter prediction method. For example, in some embodiments, the inter-prediction method may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When the computer program is loaded into the RAM 503 and executed by the computing unit 501, one or more steps of the inter prediction method described above may be performed. Alternatively, in other embodiments, the calculation unit 501 may be configured to perform the inter prediction method in any other suitable way (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. An inter prediction method comprising:
acquiring a coded reference frame sequence adjacent to a pixel block to be coded;
determining a current search starting point from a reference frame which is closest to the pixel block to be coded in the reference frame sequence;
performing a motion estimation process: searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree;
if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree.
2. The method of claim 1, wherein the method further comprises:
and if the difference value between the current optimal distortion degree and the target optimal distortion degree is not larger than a preset threshold value, determining a current search starting point from a next reference frame in the reference frame sequence, and continuously executing the motion estimation process.
3. The method of claim 1 or 2, wherein said recording a target best reference frame index number, a target best motion vector, and a target best distortion factor comprises:
if the current optimal distortion degree is smaller than the target optimal distortion degree, the index number of the current reference frame is updated to be the index number of the target optimal reference frame, the current optimal motion vector is updated to be the target optimal motion vector, and the current optimal distortion degree is updated to be the target optimal distortion degree.
4. The method of claim 1 or 2, wherein the determining a current search starting point comprises:
dividing a reference frame into a CTU set;
carrying out Coding Unit (CU) division on the CTUs in the CTU set to obtain a CU set;
and performing motion search on the CUs in the CU set according to a preset scanning sequence, and determining a current search starting point.
5. The method of claim 1 or 2, wherein the determining a current search starting point comprises:
determining a plurality of candidate predicted motion vectors in a reference frame based on an advanced motion vector prediction technique;
selecting a candidate prediction motion vector with the minimum rate distortion cost from the plurality of candidate prediction motion vectors as a prediction motion vector, and taking the position pointed by the prediction motion vector as a current search starting point.
6. A method according to claim 1, wherein the threshold value is determined from a reference sequence given by the relevant standard for high efficiency video coding, HEVC.
7. An inter prediction apparatus comprising:
an acquisition module configured to acquire a sequence of encoded reference frames adjacent to a block of pixels to be encoded;
a determining module configured to determine a current search starting point starting from a reference frame closest to the pixel block to be encoded in the sequence of reference frames;
an estimation module configured to perform a motion estimation process: searching a matching block of motion estimation from a current searching starting point, acquiring a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree;
and the output module is configured to end the motion estimation process and output the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value.
8. The apparatus of claim 7, wherein the apparatus further comprises:
and the execution module is configured to determine a current search starting point from a next reference frame in the reference frame sequence if the difference value between the current optimal distortion and the target optimal distortion is not greater than a preset threshold value, and continue to execute the motion estimation process.
9. The apparatus of claim 7 or 8, wherein the estimation module is further configured to:
if the current optimal distortion degree is smaller than the target optimal distortion degree, the index number of the current reference frame is updated to be the index number of the target optimal reference frame, the current optimal motion vector is updated to be the target optimal motion vector, and the current optimal distortion degree is updated to be the target optimal distortion degree.
10. The apparatus of claim 7 or 8, wherein the determination module is further configured to:
dividing a reference frame into a CTU set;
carrying out Coding Unit (CU) division on the CTUs in the CTU set to obtain a CU set;
and performing motion search on the CUs in the CU set according to a preset scanning sequence, and determining a current search starting point.
11. The apparatus of claim 7 or 8, wherein the determination module is further configured to:
determining a plurality of candidate predicted motion vectors in a reference frame based on an advanced motion vector prediction technique;
selecting a candidate prediction motion vector with the minimum rate distortion cost from the plurality of candidate prediction motion vectors as a prediction motion vector, and taking the position pointed by the prediction motion vector as a current search starting point.
12. The apparatus of claim 7, wherein the threshold value is determined according to a reference sequence given by a related standard for High Efficiency Video Coding (HEVC).
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.
CN202011557753.4A 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product Active CN112738529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011557753.4A CN112738529B (en) 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011557753.4A CN112738529B (en) 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN112738529A true CN112738529A (en) 2021-04-30
CN112738529B CN112738529B (en) 2023-07-07

Family

ID=75615735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011557753.4A Active CN112738529B (en) 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN112738529B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022267667A1 (en) * 2021-06-24 2022-12-29 Zhejiang Dahua Technology Co., Ltd. Systems and methods for inter frame prediction of a video

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418166B1 (en) * 1998-11-30 2002-07-09 Microsoft Corporation Motion estimation and block matching pattern
US20050190844A1 (en) * 2004-02-27 2005-09-01 Shinya Kadono Motion estimation method and moving picture coding method
JP2005253015A (en) * 2004-03-08 2005-09-15 Matsushita Electric Ind Co Ltd Apparatus and method for detecting motion vector, and program
US20050243921A1 (en) * 2004-03-26 2005-11-03 The Hong Kong University Of Science And Technology Efficient multi-frame motion estimation for video compression
US20060056719A1 (en) * 2004-09-13 2006-03-16 Microsoft Corporation Variable block size early termination for video coding
US20100080297A1 (en) * 2008-09-30 2010-04-01 Microsoft Corporation Techniques to perform fast motion estimation
CN101778281A (en) * 2010-01-13 2010-07-14 中国移动通信集团广东有限公司中山分公司 Method for estimating H.264-based fast motion on basis of structural similarity
CN101815218A (en) * 2010-04-02 2010-08-25 北京工业大学 Method for coding quick movement estimation video based on macro block characteristics
US20110051813A1 (en) * 2009-09-02 2011-03-03 Sony Computer Entertainment Inc. Utilizing thresholds and early termination to achieve fast motion estimation in a video encoder
CN102238378A (en) * 2010-05-06 2011-11-09 北京科迪讯通科技有限公司 Fast motion search algorithm used in three-dimensional (3D) video image coding
CN108093259A (en) * 2017-12-14 2018-05-29 希诺麦田技术(深圳)有限公司 Picture motion estimating method, device and computer readable storage medium
CN109660811A (en) * 2018-12-17 2019-04-19 杭州当虹科技股份有限公司 A kind of quick HEVC inter-frame encoding methods

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418166B1 (en) * 1998-11-30 2002-07-09 Microsoft Corporation Motion estimation and block matching pattern
US20050190844A1 (en) * 2004-02-27 2005-09-01 Shinya Kadono Motion estimation method and moving picture coding method
JP2005253015A (en) * 2004-03-08 2005-09-15 Matsushita Electric Ind Co Ltd Apparatus and method for detecting motion vector, and program
US20050243921A1 (en) * 2004-03-26 2005-11-03 The Hong Kong University Of Science And Technology Efficient multi-frame motion estimation for video compression
US20060056719A1 (en) * 2004-09-13 2006-03-16 Microsoft Corporation Variable block size early termination for video coding
US20100080297A1 (en) * 2008-09-30 2010-04-01 Microsoft Corporation Techniques to perform fast motion estimation
US20110051813A1 (en) * 2009-09-02 2011-03-03 Sony Computer Entertainment Inc. Utilizing thresholds and early termination to achieve fast motion estimation in a video encoder
CN101778281A (en) * 2010-01-13 2010-07-14 中国移动通信集团广东有限公司中山分公司 Method for estimating H.264-based fast motion on basis of structural similarity
CN101815218A (en) * 2010-04-02 2010-08-25 北京工业大学 Method for coding quick movement estimation video based on macro block characteristics
CN102238378A (en) * 2010-05-06 2011-11-09 北京科迪讯通科技有限公司 Fast motion search algorithm used in three-dimensional (3D) video image coding
CN108093259A (en) * 2017-12-14 2018-05-29 希诺麦田技术(深圳)有限公司 Picture motion estimating method, device and computer readable storage medium
CN109660811A (en) * 2018-12-17 2019-04-19 杭州当虹科技股份有限公司 A kind of quick HEVC inter-frame encoding methods

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIAOHAI HE ET AL: "《Motion estimation-based fast intra prediction(ME-FIP)》", 《JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC)OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 10TH MEETING: STOCKHOLM, SE, 11– 20 JULY 2012》 *
XIAOHAI HE ET AL: "《Motion estimation-based fast intra prediction(ME-FIP)》", 《JOINT COLLABORATIVE TEAM ON VIDEO CODING (JCT-VC)OF ITU-T SG 16 WP 3 AND ISO/IEC JTC 1/SC 29/WG 11 10TH MEETING: STOCKHOLM, SE, 11– 20 JULY 2012》, 20 July 2012 (2012-07-20) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022267667A1 (en) * 2021-06-24 2022-12-29 Zhejiang Dahua Technology Co., Ltd. Systems and methods for inter frame prediction of a video

Also Published As

Publication number Publication date
CN112738529B (en) 2023-07-07

Similar Documents

Publication Publication Date Title
US8705611B2 (en) Image prediction encoding device, image prediction encoding method, image prediction encoding program, image prediction decoding device, image prediction decoding method, and image prediction decoding program
US9621917B2 (en) Continuous block tracking for temporal prediction in video encoding
EP2805499B1 (en) Video decoder, video encoder, video decoding method, and video encoding method
CN116233463A (en) Motion vector correction for multi-reference prediction
US11076168B2 (en) Inter-prediction method and apparatus, and storage medium
CN103327327B (en) For the inter prediction encoding unit selection method of high-performance video coding HEVC
US20240089498A1 (en) Method and apparatus for encoding or decoding video data with sub-pixel motion vector refinement
CN112738529B (en) Inter prediction method, device, apparatus, storage medium, and program product
CN108401185B (en) Reference frame selection method, video transcoding method, electronic device and storage medium
CN104918047A (en) Bidirectional motion estimation elimination method and device
CN110876058B (en) Historical candidate list updating method and device
CN113099241B (en) Reference frame list updating method, device, equipment and storage medium
CN110557642A (en) Video frame coding motion searching method and image encoder
CN114040209A (en) Motion estimation method, motion estimation device, electronic equipment and storage medium
CN115661273B (en) Motion vector prediction method, motion vector prediction device, electronic equipment and storage medium
CN117061753A (en) Method and apparatus for predicting inter-coded motion vector
CN114513659B (en) Method, apparatus, electronic device and medium for determining picture prediction mode
CN113365077A (en) Inter-frame prediction method, encoder, decoder, computer-readable storage medium
CN113242427B (en) Rapid method and device based on adaptive motion vector precision in VVC
CN116708810A (en) Video coding method, device, equipment and storage medium
CN115190309B (en) Video frame processing method, training device, video frame processing equipment and storage medium
CN115037947A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN114040208A (en) Motion estimation method, motion estimation device, electronic equipment and storage medium
CN115988207A (en) Video coding method, video coding device, electronic equipment and video coding medium
CN117014602A (en) Training method, device and computer program product of reference frame screening model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant