CN112738529B - Inter prediction method, device, apparatus, storage medium, and program product - Google Patents

Inter prediction method, device, apparatus, storage medium, and program product Download PDF

Info

Publication number
CN112738529B
CN112738529B CN202011557753.4A CN202011557753A CN112738529B CN 112738529 B CN112738529 B CN 112738529B CN 202011557753 A CN202011557753 A CN 202011557753A CN 112738529 B CN112738529 B CN 112738529B
Authority
CN
China
Prior art keywords
current
reference frame
target
motion vector
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011557753.4A
Other languages
Chinese (zh)
Other versions
CN112738529A (en
Inventor
邹箭
丁文鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202011557753.4A priority Critical patent/CN112738529B/en
Publication of CN112738529A publication Critical patent/CN112738529A/en
Application granted granted Critical
Publication of CN112738529B publication Critical patent/CN112738529B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • H04N19/17Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
    • H04N19/176Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/513Processing of motion vectors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/56Motion estimation with initialisation of the vector search, e.g. estimating a good candidate to initiate a search
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/50Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding
    • H04N19/503Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using predictive coding involving temporal prediction
    • H04N19/51Motion estimation or motion compensation
    • H04N19/567Motion estimation based on rate distortion criteria

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The disclosure discloses an inter-frame prediction method, an inter-frame prediction device, inter-frame prediction equipment, a storage medium and a program product, and relates to the technical fields of computer vision, cloud computing and the like. One embodiment of the method comprises the following steps: acquiring an encoded reference frame sequence adjacent to a pixel block to be encoded; starting from a reference frame closest to a pixel block to be coded in a reference frame sequence, determining a current searching starting point; performing a motion estimation process: searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; if the difference value between the current best distortion degree and the target best distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target best reference frame index number, the target best motion vector and the target best distortion degree. This embodiment reduces the amount of computation in the motion estimation process.

Description

Inter prediction method, device, apparatus, storage medium, and program product
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to the technical fields of computer vision, cloud computing, and the like, and in particular, to an inter-frame prediction method, apparatus, device, storage medium, and program product.
Background
HEVC (High Efficiency Video Coding ) is a new generation of video coding compression standard. Compared with the previous generation H.264/AVC (advanced video coding ) standard, the code rate of nearly 50% can be saved under the same definition. Therefore, HEVC can be widely applied to fields related to video compression, such as live broadcast, on demand, etc. HEVC is mainly composed of major techniques such as prediction, transformation, quantization, loop filtering, entropy coding, etc. Among them, prediction is an important module of an encoder, and is classified into intra prediction and inter prediction. Intra prediction refers to a method of predicting a pixel block to be encoded in a certain manner by using reconstructed pixel values of an encoded image block in the same frame image. Inter prediction refers to a method of predicting a pixel block to be encoded using a pixel block in an encoded forward or backward reference frame. At present, inter prediction adopts a block-by-block matching mode to acquire a best matching block in a reference frame, and the process is called motion estimation.
Disclosure of Invention
The present disclosure provides an inter prediction method, apparatus, device, storage medium, and program product.
According to a first aspect of the present disclosure, there is provided an inter prediction method comprising obtaining a sequence of encoded reference frames adjacent to a pixel block to be encoded; starting from a reference frame closest to a pixel block to be coded in a reference frame sequence, determining a current searching starting point; performing a motion estimation process: searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; if the difference value between the current best distortion degree and the target best distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target best reference frame index number, the target best motion vector and the target best distortion degree.
According to a second aspect of the present disclosure, there is provided an inter prediction apparatus comprising: an acquisition module configured to acquire a sequence of encoded reference frames adjacent to a pixel block to be encoded; the determining module is configured to determine a current searching starting point from a reference frame closest to a pixel block to be encoded in the reference frame sequence; an estimation module configured to perform a motion estimation process: searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; and the output module is configured to finish the motion estimation process and output a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect.
According to a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described in any one of the implementations of the first aspect.
According to a fifth aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect.
The present disclosure provides an inter-frame prediction method, apparatus, device, storage medium, and program product, which first starts from a reference frame nearest to a pixel block to be encoded in an encoded reference frame sequence adjacent to the pixel block to be encoded, and determines a current search start point; then performing a motion estimation process; searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; and finally, if the difference value between the current best distortion degree and the target best distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target best reference frame index number, the target best motion vector and the target best distortion degree. In view of the fact that most of the target optimal matching blocks fall on a plurality of reference frames closest or next closest to the pixel block to be encoded, an early exit mechanism of motion estimation of multiple reference frames is provided, the operation amount of the motion estimation process under the condition of multiple reference frames is reduced, and encoding and transcoding efficiency is improved.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and are not to be construed as limiting the present application. Wherein:
FIG. 1 is a flow chart of one embodiment of an inter prediction method according to the present disclosure;
FIG. 2 is a flow chart of yet another embodiment of an inter prediction method according to the present disclosure;
fig. 3 is a scene diagram of an inter prediction method in which embodiments of the present disclosure may be implemented.
FIG. 4 is a schematic diagram of the structure of one embodiment of an inter prediction apparatus according to the present disclosure;
fig. 5 is a block diagram of an electronic device used to implement an inter prediction method of an embodiment of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 shows a flowchart of one embodiment of an inter prediction method according to the present disclosure. The inter prediction method comprises the following steps:
step 101, a sequence of encoded reference frames adjacent to a pixel block to be encoded is obtained.
In this embodiment, the execution body of the inter prediction method may acquire the coded reference frame sequence adjacent to the pixel block to be coded.
Video compression is typically required for video in live, on-demand, etc. services. HEVC is a new generation of video coding compression standard, and mainly consists of main techniques such as prediction, transformation, quantization, loop filtering, entropy coding, etc. Among them, prediction is an important module of an encoder, and is classified into intra prediction and inter prediction. Intra prediction refers to a method of predicting a pixel block to be encoded in a certain manner by using reconstructed pixel values of an encoded image block in the same frame image. Inter prediction refers to a method of predicting a pixel block to be encoded using a pixel block in an encoded forward or backward reference frame. A video frame in a video may be divided into a plurality of pixel blocks, and the pixel blocks in the video frame to be encoded are the pixel blocks to be encoded. The forward or backward reference frame of the video frame to be encoded is the reference frame adjacent to the pixel block to be encoded, where the sequence of encoded reference frames adjacent to the pixel block to be encoded is obtained. The pixel blocks in the encoded reference frame may also be obtained by performing inter-prediction post-encoding by using the inter-prediction method provided by the present disclosure.
Inter prediction will obtain the best matching block in the adjacent coded reference frame sequence, a process called motion estimation. The motion estimation process outputs displacement information of the matching block relative to the pixel block to be encoded, referred to as a motion vector. The motion vector is so represented by two vector values in the x and y directions. The realization proves that the motion estimation process occupies huge operand and is the most time-consuming part in encoding and transcoding, so the motion estimation process is improved to reduce the operand and improve the encoding and transcoding efficiency. The matching block is a pixel block matched with the pixel block to be coded in the reference frame. The best pixel block is the matching block with the smallest distortion.
Step 102, starting from the nearest reference frame from the reference frame sequence to the pixel block to be encoded, determining the current searching starting point.
In this embodiment, the execution body may determine the current search start point from a reference frame closest to the pixel block to be encoded in the reference frame sequence. Here, the current search start point is a search start point on a reference frame nearest to the pixel block to be encoded.
In the prior art, x265 is the best HEVC encoder with open source, and the motion estimation principle of multiple reference frames is to sequentially traverse reference frames in adjacent coded reference frame sequences, and calculate a best matching reference frame and a best matching block by combining the weights of rate distortion. The best matching reference frame is the reference frame where the best matching block is located. It was found by testing that although x265 performs motion estimation for multiple reference frames, most of the best matching blocks fall on several reference frames nearest or next closest to the pixel block to be encoded, while the motion search for other reference frames further away has little contribution to the encoding process. Thus, embodiments of the present application begin with the nearest reference frame to the pixel block to be encoded.
Step 103, performing a motion estimation process: and searching a matching block for motion estimation from the current searching starting point, obtaining a current best matching block, a current best motion vector and a current best distortion degree, and recording a target best reference frame index number, a target best motion vector and a target best distortion degree.
In this embodiment, the above-described execution body may execute the motion estimation process. For any reference frame in the adjacent coded reference frame sequence, the execution body may perform a matching block search for motion estimation from the current search start point of the reference frame, obtain a current best matching block, a current best motion vector, and a current best distortion degree, and record a target best reference frame index number, a target best motion vector, and a target best distortion degree.
Specifically, the execution body may match pixel blocks in the reference frame one by one from the current search start point, calculate a motion vector and a distortion degree corresponding to each pixel block, and select a pixel block with the minimum distortion degree from the motion vectors and the distortion degrees as the current best matching block. The motion vector corresponding to the current best matching block is the current best motion vector. The distortion degree corresponding to the current best matching block is the current best distortion degree.
In addition, the execution body may record a target best reference frame index number, a target best motion vector, and a target best distortion degree. These information are updated in time during the motion estimation process to ensure accuracy.
And 104, ending the motion estimation process and outputting the target optimal reference frame index number, the target optimal motion vector and the target optimal distortion degree if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value.
In this embodiment, the executing body may calculate a difference between the current best distortion degree and the target best distortion degree, and compare the difference with a preset threshold value. If the target best reference frame index number is larger than the preset threshold value, ending the motion estimation process, and outputting the target best reference frame index number, the target best motion vector and the target best distortion degree.
Here, the threshold value may be used as a basis for the early exit of the motion estimation process, so as to reduce the operand of the motion estimation process. And under the condition that the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, the follow-up reference frame is far away from the pixel block to be encoded, and belongs to an invalid reference frame. At this time, the motion estimation process is exited, so that the search result of the matching block of the motion estimation is not affected, and huge operation amount can be saved. The threshold value can be determined according to a test statistical method, and the method has universality.
The inter-frame prediction method provided by the embodiment of the disclosure firstly starts from a nearest reference frame to a pixel block to be encoded in an encoded reference frame sequence adjacent to the pixel block to be encoded, and determines a current searching starting point; then performing a motion estimation process; searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; and finally, if the difference value between the current best distortion degree and the target best distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting the target best reference frame index number, the target best motion vector and the target best distortion degree. In view of the fact that most of the target optimal matching blocks fall on a plurality of reference frames closest or next closest to the pixel block to be encoded, an early exit mechanism of motion estimation of multiple reference frames is provided, the operation amount of the motion estimation process under the condition of multiple reference frames is reduced, and encoding and transcoding efficiency is improved.
With continued reference to fig. 2, a flow 200 of yet another embodiment of an inter prediction method according to the present disclosure is shown. The inter prediction method comprises the following steps:
in step 201, a sequence of encoded reference frames adjacent to a pixel block to be encoded is obtained.
In this embodiment, the execution body of the inter prediction method may acquire the coded reference frame sequence adjacent to the pixel block to be coded.
Video compression is typically required for video in live, on-demand, etc. services. HEVC is a new generation of video coding compression standard, and mainly consists of main techniques such as prediction, transformation, quantization, loop filtering, entropy coding, etc. Among them, prediction is an important module of an encoder, and is classified into intra prediction and inter prediction. Intra prediction refers to a method of predicting a pixel block to be encoded in a certain manner by using reconstructed pixel values of an encoded image block in the same frame image. Inter prediction refers to a method of predicting a pixel block to be encoded using a pixel block in an encoded forward or backward reference frame. A video frame in a video may be divided into a plurality of pixel blocks, and the pixel blocks in the video frame to be encoded are the pixel blocks to be encoded. The forward or backward reference frame of the video frame to be encoded is the reference frame adjacent to the pixel block to be encoded, where the sequence of encoded reference frames adjacent to the pixel block to be encoded is obtained. The pixel blocks in the encoded reference frame may also be obtained by performing inter-prediction post-encoding by using the inter-prediction method provided by the present disclosure.
Inter prediction will obtain the best matching block in the adjacent coded reference frame sequence, a process called motion estimation. The motion estimation process outputs displacement information of the matching block relative to the pixel block to be encoded, referred to as a motion vector. The motion vector is so represented by two vector values in the x and y directions. The realization proves that the motion estimation process occupies huge operand and is the most time-consuming part in encoding and transcoding, so the motion estimation process is improved to reduce the operand and improve the encoding and transcoding efficiency. The matching block is a pixel block matched with the pixel block to be coded in the reference frame. The best pixel block is the matching block with the smallest distortion.
Step 202, starting from the nearest reference frame from the reference frame sequence to the pixel block to be encoded, determining the current searching starting point.
In this embodiment, the execution body may determine the current search start point from a reference frame closest to the pixel block to be encoded in the reference frame sequence. Here, the current search start point is a search start point on a reference frame nearest to the pixel block to be encoded.
In the prior art, x265 is the best HEVC encoder with open source, and the motion estimation principle of multiple reference frames is to sequentially traverse reference frames in adjacent coded reference frame sequences, and calculate a best matching reference frame and a best matching block by combining the weights of rate distortion. The best matching reference frame is the reference frame where the best matching block is located. It was found by testing that although x265 performs motion estimation for multiple reference frames, most of the best matching blocks fall on several reference frames nearest or next closest to the pixel block to be encoded, while the motion search for other reference frames further away has little contribution to the encoding process. Thus, embodiments of the present application begin with the nearest reference frame to the pixel block to be encoded.
In the present embodiment, the above-described execution subject may calculate the search start point of motion estimation by acquiring MVP (Motion Vector Prediction, predicted motion vector).
In some optional implementations of this embodiment, the execution body may first divide the reference frame into CTUs (Coding Tree units) to obtain a CTU set; then, for each CTU in the CTU set, CU (Coding Unit) division is carried out to obtain a CU set; and finally, carrying out motion search on CUs in the CU set according to a preset scanning sequence at the CU level, and determining a search starting point on the reference frame. Among them, the commonly used CU-level motion vector prediction modes may include, but are not limited to, ATMVP (Alternative temporal motion vector prediction, optional temporal motion vector prediction) and STMVP (spatial-temporal motion vector prediction, spatial motion vector prediction).
In some optional implementations of the present embodiment, the executing entity may first determine a plurality of candidate predicted motion vectors in the reference frame based on AMVP (Advanced Motion Vector Prediction ) technique; then selecting a candidate predicted motion vector with the minimum rate distortion cost from a plurality of candidate predicted motion vectors as a predicted motion vector, and taking the position pointed by the predicted motion vector as a current searching starting point.
Step 203, performing a motion estimation process: and searching a matching block for motion estimation from the current searching starting point, obtaining a current best matching block, a current best motion vector and a current best distortion degree, and recording a target best reference frame index number, a target best motion vector and a target best distortion degree.
In this embodiment, the above-described execution body may execute the motion estimation process. For any reference frame in the adjacent coded reference frame sequence, the execution body may perform a matching block search for motion estimation from the current search start point of the reference frame, obtain a current best matching block, a current best motion vector, and a current best distortion degree, and record a target best reference frame index number, a target best motion vector, and a target best distortion degree.
Specifically, the execution body may match pixel blocks in the reference frame one by one from the current search start point, calculate a motion vector and a distortion degree corresponding to each pixel block, and select a pixel block with the minimum distortion degree from the motion vectors and the distortion degrees as the current best matching block. The motion vector corresponding to the current best matching block is the current best motion vector. The distortion degree corresponding to the current best matching block is the current best distortion degree.
In addition, the execution body may record a target best reference frame index number, a target best motion vector, and a target best distortion degree. These information are updated in time during the motion estimation process to ensure accuracy. Specifically, the execution body may compare the current optimum distortion degree with the previously recorded target optimum distortion degree. If the current best distortion is smaller than the previously recorded target best distortion, the executing body may update the index number of the current reference frame to the target best reference frame index number, update the current best motion vector to the target best motion vector, and update the current best distortion to the target best distortion. If the current best distortion is not less than the previously recorded target best distortion, the execution body reserves the previously recorded target best reference frame index number, the target best motion vector and the target best distortion.
Step 204, determining whether the difference between the current best distortion and the target best distortion is greater than a preset threshold.
In this embodiment, the execution body of the inter prediction method may calculate a difference between the current best distortion degree and the target best distortion degree, and compare the difference with a preset threshold value. If the threshold value is greater than the preset threshold value, step 205 is executed; if not, go to step 206. The threshold value can be determined according to a test statistical method, and the method has universality. For example, the reference sequence determination may be given according to the relevant standard of HEVC.
Step 205, the motion estimation process is ended, and the target best reference frame index number, the target best motion vector, and the target best distortion are output.
In this embodiment, if the difference between the current best distortion and the target best distortion is greater than the preset threshold, the execution body may end the motion estimation process and output the target best reference frame index number, the target best motion vector and the target best distortion. And under the condition that the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value, the follow-up reference frame is far away from the pixel block to be encoded, and belongs to an invalid reference frame. At this time, the motion estimation process is exited, so that the search result of the matching block of the motion estimation is not affected, and huge operation amount can be saved.
At step 206, a current search starting point is determined from the next reference frame in the sequence of reference frames.
In this embodiment, if the difference between the current best distortion and the target best distortion is not greater than the preset threshold, the execution body may determine the current search start point from the next reference frame in the sequence of reference frames, and continue to execute step 203 to continue the motion estimation process. And under the condition that the difference value between the current optimal distortion degree and the target optimal distortion degree is not greater than a preset threshold value, the fact that the subsequent reference frames also have the reference frames which are close to the pixel blocks to be encoded and have effective reference frames is indicated. At this time, the motion estimation process of the next reference frame is entered, so that the accuracy of the search result of the matching block of the motion estimation can be ensured.
Generally, after the prediction of the pixel blocks of all video frames in the video is completed, operations such as transformation, quantization, loop filtering, entropy coding and the like can be continued to complete the encoding and the transcoding of the video.
As can be seen from fig. 2, compared to the corresponding embodiment of fig. 1, the flow 200 of the inter prediction method in this embodiment adds steps for continuing the motion estimation process. Therefore, the scheme described in the embodiment can enter the motion estimation process of the next reference frame under the condition that an effective reference frame exists, so that the accuracy of the search result of the matching block of the motion estimation can be ensured.
With further reference to fig. 3, a scene diagram is shown in which the inter prediction method of embodiments of the present disclosure may be implemented. As shown in fig. 3, for a video frame to be encoded with index number 5 in the video, a sequence of encoded reference frames with index numbers 1-4 is obtained. For a block of pixels to be encoded in a video frame to be encoded, starting from a reference frame with index number 4, determining a current search starting point for the reference frame with index number 4. Performing a motion estimation process: and performing motion estimation on the matching block from the current searching starting point of the reference frame with the index number of 4, obtaining the current best matching block, the current best motion vector and the current best distortion degree of the reference frame with the index number of 4, recording the index number of 4 as a target best reference frame index number, recording the current best motion vector of the reference frame with the index number of 4 as a target best motion vector, and recording the current best distortion degree of the reference frame with the index number of 4 as a target best distortion degree. Further, determining the current searching starting point of the reference frame with index number of 3, and continuing to execute the motion estimation process: and searching a matching block for motion estimation from the current searching starting point of the reference frame with the index number of 3, obtaining the current best matching block, the current best motion vector and the current best distortion degree of the reference frame with the index number of 3, and if the current best distortion degree of the reference frame with the index number of 3 is not less than the target best distortion degree, reserving the previously recorded index number of the target best reference frame, the target best motion vector and the target best distortion degree. If the difference value between the current best distortion degree and the target best distortion degree of the reference frame with index number of 3 is larger than a preset threshold value, ending the motion estimation process, and outputting the index number of the target best reference frame, the target best motion vector and the target best distortion degree.
With further reference to fig. 4, as an implementation of the method shown in the foregoing figures, the present disclosure provides an embodiment of an inter-frame prediction apparatus, where the apparatus embodiment corresponds to the method embodiment shown in fig. 1, and the apparatus may be specifically applied to various electronic devices.
As shown in fig. 4, the inter prediction apparatus 400 of the present embodiment may include: an acquisition module 401, a determination module 402, an estimation module 403, and an output module 404. Wherein the obtaining module 401 is configured to obtain a sequence of encoded reference frames adjacent to the pixel block to be encoded; a determining module 402 configured to determine a current search start point starting from a reference frame closest to a pixel block to be encoded in a sequence of reference frames; an estimation module 403 configured to perform a motion estimation process: searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree; an output module 404 configured to end the motion estimation process and output the target best reference frame index number, the target best motion vector, and the target best distortion if the difference between the current best distortion and the target best distortion is greater than a preset threshold.
In the present embodiment, in the inter prediction apparatus 400: the specific processes of the obtaining module 401, the determining module 402, the estimating module 403 and the output module 404 and the technical effects thereof may refer to the relevant descriptions of the steps 101 to 104 in the corresponding embodiment of fig. 1, and are not repeated herein.
In some optional implementations of this embodiment, the inter prediction apparatus 400 further includes: and the execution module is configured to determine a current searching starting point from the next reference frame in the reference frame sequence and continue to execute the motion estimation process if the difference value between the current optimal distortion degree and the target optimal distortion degree is not greater than a preset threshold value.
In some alternative implementations of the present embodiment, the estimation module 403 is further configured to: if the current best distortion is less than the target best distortion, updating the index number of the current reference frame to the target best reference frame index number, updating the current best motion vector to the target best motion vector, and updating the current best distortion to the target best distortion.
In some alternative implementations of the present embodiment, the determination module 402 is further configured to: dividing the reference frame into a set of coding tree units CTUs; dividing the CTUs in the CTU set into Coding Units (CUs) to obtain a CU set; and carrying out motion search on CUs in the CU set according to a preset scanning sequence, and determining a current searching starting point.
In some alternative implementations of the present embodiment, the determination module 402 is further configured to: determining a plurality of candidate predicted motion vectors in a reference frame based on an advanced motion vector prediction technique; selecting a candidate predicted motion vector with the minimum rate distortion cost from a plurality of candidate predicted motion vectors as a predicted motion vector, and taking the position pointed by the predicted motion vector as a current searching starting point.
In some alternative implementations of this embodiment, the threshold value is determined according to a reference sequence given by a relevant standard for high efficiency video coding HEVC.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 illustrates a schematic block diagram of an example electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM 502, and RAM 503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The calculation unit 501 performs the respective methods and processes described above, for example, an inter prediction method. For example, in some embodiments, the inter prediction method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the inter prediction method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the inter prediction method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (10)

1. An inter prediction method, comprising:
acquiring an encoded reference frame sequence adjacent to a pixel block to be encoded;
starting from the reference frame closest to the pixel block to be coded in the reference frame sequence, determining a current searching starting point by acquiring a predicted motion vector;
performing a motion estimation process: searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree;
if the difference value between the current best distortion degree and the target best distortion degree is larger than a preset threshold value, ending the motion estimation process, and outputting a target best reference frame index number, a target best motion vector and a target best distortion degree;
the determining the current searching starting point comprises the following steps:
dividing the reference frame into a set of coding tree units CTUs; dividing the CTUs in the CTU set into Coding Units (CUs) to obtain a CU set; performing motion search on CUs in the CU set according to a preset scanning sequence, and determining a current searching starting point;
the determining the current searching starting point comprises the following steps:
determining a plurality of candidate predicted motion vectors in a reference frame based on an advanced motion vector prediction technique;
selecting a candidate predicted motion vector with the minimum rate distortion cost from the plurality of candidate predicted motion vectors as a predicted motion vector, and taking the position pointed by the predicted motion vector as a current searching starting point.
2. The method of claim 1, wherein the method further comprises:
if the difference value between the current best distortion degree and the target best distortion degree is not greater than a preset threshold value, determining a current searching starting point from the next reference frame in the reference frame sequence, and continuing to execute the motion estimation process.
3. The method according to claim 1 or 2, wherein the recording of the target best reference frame index number, the target best motion vector and the target best distortion comprises:
if the current best distortion is less than the target best distortion, updating the index number of the current reference frame to the target best reference frame index number, updating the current best motion vector to the target best motion vector, and updating the current best distortion to the target best distortion.
4. The method of claim 1, wherein the threshold value is determined from a reference sequence given by a relevant standard for high efficiency video coding HEVC.
5. An inter prediction apparatus, comprising:
an acquisition module configured to acquire a sequence of encoded reference frames adjacent to a pixel block to be encoded;
the determining module is configured to determine a current searching starting point by acquiring a predicted motion vector from a reference frame closest to the pixel block to be encoded in the reference frame sequence;
an estimation module configured to perform a motion estimation process: searching a matching block for motion estimation from a current searching starting point, obtaining a current optimal matching block, a current optimal motion vector and a current optimal distortion degree, and recording a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree;
the output module is configured to finish the motion estimation process and output a target optimal reference frame index number, a target optimal motion vector and a target optimal distortion degree if the difference value between the current optimal distortion degree and the target optimal distortion degree is larger than a preset threshold value;
the determination module is further configured to:
dividing the reference frame into a set of coding tree units CTUs; dividing the CTUs in the CTU set into Coding Units (CUs) to obtain a CU set; performing motion search on CUs in the CU set according to a preset scanning sequence, and determining a current searching starting point;
the determination module is further configured to:
determining a plurality of candidate predicted motion vectors in a reference frame based on an advanced motion vector prediction technique;
selecting a candidate predicted motion vector with the minimum rate distortion cost from the plurality of candidate predicted motion vectors as a predicted motion vector, and taking the position pointed by the predicted motion vector as a current searching starting point.
6. The apparatus of claim 5, wherein the apparatus further comprises:
and the execution module is configured to determine a current searching starting point from the next reference frame in the reference frame sequence and continue to execute the motion estimation process if the difference value between the current optimal distortion degree and the target optimal distortion degree is not greater than a preset threshold value.
7. The apparatus of claim 5 or 6, wherein the estimation module is further configured to:
if the current best distortion is less than the target best distortion, updating the index number of the current reference frame to the target best reference frame index number, updating the current best motion vector to the target best motion vector, and updating the current best distortion to the target best distortion.
8. The apparatus of claim 5, wherein the threshold value is determined from a reference sequence given by a relevant standard for high efficiency video coding HEVC.
9. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-4.
10. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-4.
CN202011557753.4A 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product Active CN112738529B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011557753.4A CN112738529B (en) 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011557753.4A CN112738529B (en) 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product

Publications (2)

Publication Number Publication Date
CN112738529A CN112738529A (en) 2021-04-30
CN112738529B true CN112738529B (en) 2023-07-07

Family

ID=75615735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011557753.4A Active CN112738529B (en) 2020-12-23 2020-12-23 Inter prediction method, device, apparatus, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN112738529B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113596475A (en) * 2021-06-24 2021-11-02 浙江大华技术股份有限公司 Image/video encoding method, apparatus, system, and computer-readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418166B1 (en) * 1998-11-30 2002-07-09 Microsoft Corporation Motion estimation and block matching pattern
JP2005253015A (en) * 2004-03-08 2005-09-15 Matsushita Electric Ind Co Ltd Apparatus and method for detecting motion vector, and program
CN102238378A (en) * 2010-05-06 2011-11-09 北京科迪讯通科技有限公司 Fast motion search algorithm used in three-dimensional (3D) video image coding

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7894526B2 (en) * 2004-02-27 2011-02-22 Panasonic Corporation Motion estimation method and moving picture coding method
US7720148B2 (en) * 2004-03-26 2010-05-18 The Hong Kong University Of Science And Technology Efficient multi-frame motion estimation for video compression
US7697610B2 (en) * 2004-09-13 2010-04-13 Microsoft Corporation Variable block size early termination for video coding
US8363727B2 (en) * 2008-09-30 2013-01-29 Microsoft Corporation Techniques to perform fast motion estimation
US8848799B2 (en) * 2009-09-02 2014-09-30 Sony Computer Entertainment Inc. Utilizing thresholds and early termination to achieve fast motion estimation in a video encoder
CN101778281A (en) * 2010-01-13 2010-07-14 中国移动通信集团广东有限公司中山分公司 Method for estimating H.264-based fast motion on basis of structural similarity
CN101815218B (en) * 2010-04-02 2012-02-08 北京工业大学 Method for coding quick movement estimation video based on macro block characteristics
CN108093259B (en) * 2017-12-14 2021-10-08 希诺麦田技术(深圳)有限公司 Image motion estimation method, device and computer readable storage medium
CN109660811B (en) * 2018-12-17 2020-09-18 杭州当虹科技股份有限公司 Rapid HEVC inter-frame coding method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6418166B1 (en) * 1998-11-30 2002-07-09 Microsoft Corporation Motion estimation and block matching pattern
JP2005253015A (en) * 2004-03-08 2005-09-15 Matsushita Electric Ind Co Ltd Apparatus and method for detecting motion vector, and program
CN102238378A (en) * 2010-05-06 2011-11-09 北京科迪讯通科技有限公司 Fast motion search algorithm used in three-dimensional (3D) video image coding

Also Published As

Publication number Publication date
CN112738529A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
US8705611B2 (en) Image prediction encoding device, image prediction encoding method, image prediction encoding program, image prediction decoding device, image prediction decoding method, and image prediction decoding program
EP2805499B1 (en) Video decoder, video encoder, video decoding method, and video encoding method
US11076168B2 (en) Inter-prediction method and apparatus, and storage medium
US10412409B2 (en) Encoding system using motion estimation and encoding method using motion estimation
CN103327327B (en) For the inter prediction encoding unit selection method of high-performance video coding HEVC
CN118055253A (en) Optical flow estimation for motion compensated prediction in video coding
CN112738529B (en) Inter prediction method, device, apparatus, storage medium, and program product
CN111277838B (en) Encoding mode selection method, device, electronic equipment and computer readable medium
CN104918047A (en) Bidirectional motion estimation elimination method and device
CN114040209A (en) Motion estimation method, motion estimation device, electronic equipment and storage medium
CN110557642A (en) Video frame coding motion searching method and image encoder
CN115661273B (en) Motion vector prediction method, motion vector prediction device, electronic equipment and storage medium
CN113099241A (en) Reference frame list updating method, device, equipment and storage medium
CN117061753A (en) Method and apparatus for predicting inter-coded motion vector
CN113242427B (en) Rapid method and device based on adaptive motion vector precision in VVC
CN115037947A (en) Video encoding method, video encoding device, electronic equipment and storage medium
CN116708810A (en) Video coding method, device, equipment and storage medium
CN114513659B (en) Method, apparatus, electronic device and medium for determining picture prediction mode
CN117014602A (en) Training method, device and computer program product of reference frame screening model
CN113099231A (en) Method and device for determining sub-pixel interpolation position, electronic equipment and storage medium
CN116962716A (en) Video processing method and device, electronic equipment and storage medium
CN115988207A (en) Video coding method, video coding device, electronic equipment and video coding medium
CN114040208A (en) Motion estimation method, motion estimation device, electronic equipment and storage medium
CN116112707A (en) Video processing method and device, electronic equipment and storage medium
TWI324482B (en) Algorithm of video coding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant