WO2023087637A1 - 视频编码方法和装置、电子设备和计算机可读存储介质 - Google Patents
视频编码方法和装置、电子设备和计算机可读存储介质 Download PDFInfo
- Publication number
- WO2023087637A1 WO2023087637A1 PCT/CN2022/092314 CN2022092314W WO2023087637A1 WO 2023087637 A1 WO2023087637 A1 WO 2023087637A1 CN 2022092314 W CN2022092314 W CN 2022092314W WO 2023087637 A1 WO2023087637 A1 WO 2023087637A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image block
- intra
- rate
- prediction mode
- distortion cost
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/103—Selection of coding mode or of prediction mode
- H04N19/11—Selection of coding mode or of prediction mode among a plurality of spatial predictive coding modes
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/169—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
- H04N19/17—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object
- H04N19/176—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding the unit being an image region, e.g. an object the region being a block, e.g. a macroblock
Definitions
- the present disclosure relates to the technical field of video coding, and in particular, to a method and device for intra-frame prediction of video coding, electronic equipment, and a computer-readable storage medium.
- each frame of video needs to be divided into square image blocks of a fixed size as the basic unit, and each image block is encoded sequentially according to the raster order.
- the image block is first divided into coding blocks, and each coding block uses a reference image for intra/inter prediction, and the difference between the predicted block and the original block is a residual block.
- the generated residual blocks are transformed and quantized sequentially, and entropy coded together with the coding mode to form a code stream.
- the magnitude of the predicted residual is usually much smaller than the original pixel value, so using the encoded pixel difference instead of directly encoding the original pixel value can greatly improve the encoding efficiency.
- adjacent reconstructed pixels directly above, above right, directly left, and below left may be used as reference pixels.
- multiple lines of reference pixels are also used to improve the prediction accuracy.
- reference pixels are usually quantized, so there will be different degrees of distortion. Since subsequent blocks use coded reconstructed pixels as reference pixels to predict the current block, the distortion of the reference pixels will affect the prediction accuracy of the current block. Therefore, a method for improving the prediction accuracy of video coding is needed.
- An embodiment of the present disclosure provides a video encoding method, which selectively reduces reference pixel distortion in combination with video content characteristics, thereby improving the accuracy of intra-frame prediction and improving the compression efficiency of an encoder.
- the distortion of the reference pixel can be reduced as much as possible, so that the video coding quality can be improved.
- a video coding method including: determining a first intra-frame prediction mode for an image block divided from a video image frame and reference pixels for intra-frame prediction; based on The first intra-frame prediction mode of the surrounding image blocks of the current image block, determine the weight corresponding to the reference pixel in the current image block, and the weight represents the effect of the distortion of the reference pixel on the intra-frame prediction of the surrounding image blocks Influence; determining a rate-distortion cost value of the current image block in at least one candidate intra prediction mode based on a rate-distortion cost function, wherein the rate-distortion cost function includes a first rate-distortion cost item and a second rate-distortion cost with the weight A distortion cost item, wherein the first rate-distortion cost item is a cost item for intra-frame prediction of the current image block, and the second rate-distortion cost item is a weighted cost item for the reference pixel; according to
- determining a first intra-frame prediction mode for an image block divided from a video image frame includes: performing texture detection on the image block, and determining a first frame suitable for the detected texture
- Intra-prediction mode determining the reference pixel for intra-frame prediction includes: determining the position of the reference pixel for intra-frame prediction in the image block based on the adopted video coding standard.
- the reference pixels used for intra-frame prediction of surrounding image blocks include: at least one of a lower pixel row, a right pixel column, and a lower right pixel of the current image block .
- the determining the weight corresponding to the reference pixel in the current image block includes: based on the weight corresponding to the first intra prediction mode of the right image block of the current image block The prediction direction angle is used to determine the weight A of the rate-distortion cost item for the right pixel row of the current image block; based on the prediction direction angle corresponding to the first intra prediction mode of the lower image block of the current image block, Determine the weight B of the rate-distortion cost item used for the lower pixel column of the current image block; determine the weight value C of the rate-distortion cost item used for the lower right corner pixel as a preset value MAX; wherein, A and B Values are in the range [0,MAX].
- the determining the second intra-frame prediction mode of the current image block according to the plurality of rate-distortion cost values in the at least one candidate intra-frame prediction mode includes: from A candidate intra prediction mode with a minimum rate-distortion cost value is determined from the at least one candidate intra prediction mode as a second intra prediction mode for the current image block.
- the rate-distortion cost function further includes a quantization parameter
- the rate-distortion cost value of the current image block in at least one candidate intra prediction mode is determined based on the rate-distortion cost function Including: traversing multiple quantization parameters in each candidate intra prediction mode, and determining multiple rate-distortion cost values of the rate-distortion cost function of the current image block under each quantization parameter of different candidate intra prediction modes ; wherein, according to the rate-distortion cost value in the at least one candidate intra-frame prediction mode, determining the second intra-frame prediction mode includes: determining the minimum rate-distortion cost value among the plurality of rate-distortion cost values, and determining the candidate intra-frame prediction mode and quantization parameter corresponding to the minimum rate-distortion cost value as a second intra-frame prediction mode and quantization parameter for performing intra-frame prediction on the current image block.
- the determining the first intra-frame prediction mode of the image block includes: calculating the gradient angle of the image block through image gradient detection; acquiring an intra-frame prediction mode corresponding to the calculated gradient angle, as the first intra-frame prediction mode of the image block.
- a video encoding device including: a first mode determination module configured to determine a first intra-frame prediction mode for an image block divided from a video image frame and use Reference pixels for intra-frame prediction; the weight determination module is configured as the first intra-frame prediction mode of the surrounding image blocks of the current image block, and determines the weights corresponding to the reference pixels in the current image block, and the weights represent the The influence of the distortion of the reference pixel on the intra prediction of the surrounding image block; the rate-distortion cost determination module is configured to determine the rate-distortion generation of the current image block in at least one candidate intra prediction mode based on the rate-distortion cost function value, wherein the rate-distortion cost function includes a first rate-distortion cost item and a second rate-distortion cost item with the weight, wherein the first rate-distortion cost item is a cost item for intra prediction of the current image block , the second rate-distor
- the first mode determination module is configured to perform texture detection on the image block, determine a first intra-frame prediction mode suitable for the detected texture, and determine the image block based on the video coding standard The position of the reference pixel used for intra prediction in .
- the reference pixels used for intra prediction of the current surrounding image block include at least one of the lower pixel row, the right pixel column, and the lower right pixel of the current image block .
- the weight determination module is configured to: based on the prediction direction angle corresponding to the first intra prediction mode of the right image block of the current image block, determine the The weight A of the rate-distortion cost item of the right pixel row of the block; based on the prediction direction angle corresponding to the first intra prediction mode of the lower image block of the current image block, determine the lower side of the current image block
- the weight B of the rate-distortion cost item of the pixel column; the weight value C of the rate-distortion cost item for the lower right pixel is determined as a preset value MAX, wherein the values of A and B are in the range of [0, MAX].
- the second mode determination module is configured to: determine a candidate intra prediction mode with the smallest rate-distortion cost value from the plurality of candidate intra prediction modes as the Describe the second intra-frame prediction mode of the current image block.
- the rate-distortion cost function further includes quantization parameters
- the rate-distortion cost determination module is configured to traverse multiple quantization parameters in each candidate intra prediction mode, and determine the Multiple rate-distortion cost values of the rate-distortion cost function of the current image block under multiple quantization parameters of each candidate intra prediction mode
- the second mode determination module is configured to: The candidate intra-frame prediction mode and quantization parameter corresponding to the minimum rate-distortion cost value in the value, and determine the candidate intra-frame prediction mode and quantization parameter as the second intra-frame prediction mode for performing intra-frame prediction on the current image block Prediction modes and quantization parameters.
- the first mode determination module is configured to calculate the gradient angle of the image block through image gradient detection, and obtain the intra prediction mode corresponding to the calculated gradient angle as the The first intra prediction mode.
- an electronic device comprising: at least one processor; at least one memory storing computer-executable instructions, wherein the computer-executable instructions are processed by the at least one When the processor is running, the at least one processor is prompted to execute the video encoding method described in any one of the embodiments of the first aspect.
- a computer-readable storage medium is provided, and when the instructions in the computer-readable storage medium are executed by at least one processor, the at least one processor can execute the first
- the video encoding method described in any one of the embodiments is implemented.
- a computer program product is provided, and instructions in the computer program product are executed by at least one processor to execute the video encoding method described in any one of the embodiments of the first aspect.
- a computer program includes computer program code, and when the computer program code is run on a computer, the computer executes any one of the embodiments of the first aspect.
- the distortion of the reference pixel can be reduced as much as possible, so that the video coding quality can be improved.
- FIG. 1 is a schematic diagram illustrating an overall framework of a video encoding scheme according to an exemplary embodiment of the present disclosure.
- FIG. 2 is a schematic diagram illustrating intra prediction used in a video coding scheme.
- FIG. 3 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure.
- FIG. 4 is a schematic diagram illustrating image blocks and prediction directions thereof for intra prediction of HEVC encoding according to an exemplary embodiment of the present disclosure.
- FIG. 5 is a block diagram illustrating a video encoding device according to an exemplary embodiment of the present disclosure.
- FIG. 6 is a schematic diagram illustrating an electronic device for video encoding according to an exemplary embodiment of the present disclosure.
- FIG. 7 is a schematic diagram illustrating an electronic device for video encoding according to another exemplary embodiment of the present disclosure.
- FIG. 1 is a schematic diagram of an overall framework of a video coding scheme 100 according to an exemplary embodiment of the present disclosure.
- the frame image is divided into at least one coding unit.
- the second step is to input the frame image into the encoder for encoding prediction.
- This process mainly utilizes the spatial correlation and temporal correlation of video data, and uses intra prediction 103 or inter prediction (corresponding to 104 and 105) to remove each
- the spatio-temporal redundancy information of the blocks to be coded in the CU obtains the matching block of each block in the reference frame 106 .
- intra-frame prediction coded reconstructed pixels of the current frame are used to predict uncoded image blocks, thereby removing spatial redundancy in video.
- the image content inside the box is similar to the pixel values outside the box, and has a directional texture.
- the reference pixels can use adjacent reconstructed pixels directly above, above right, directly left and below left. When the reconstructed pixels do not exist, they will be filled according to certain rules. In the VVC coding standard, multiple rows of reference pixels are also used to improve prediction accuracy.
- the horizontal mode (mode10) uses a column of pixels on the left side of the current block as reference pixels (the pixels shaded by the left oblique line in Figure 2(c)), and the vertical mode (mode 26) Then use a row of reconstructed pixels above the current block as reference pixels (pixels in the shaded part above the oblique line in Fig. 2(c)). Therefore, for coding blocks with different contents, the positions and numbers of reference pixels to be adjusted are different. For example, for the horizontal mode (mode 10), it is necessary to make up for the distortion of the reference pixels in the left diagonally shaded area, and for the vertical mode (mode 26), it is necessary to compensate for the distortion of the reference pixels in the upper diagonally shaded area.
- the matching block is subtracted from the corresponding coding block to obtain a residual block, and the residual block is transformed 107 and quantized 108 to obtain quantized transform coefficients.
- the transform may include discrete cosine transform (DCT), fast Fourier transform (FFT), and the like.
- Quantization processing is a commonly used technology in the field of digital signal processing, which refers to the process of approximating continuous values (or a large number of possible discrete values) of a signal to a finite number (or less) of discrete values. Quantization processing is mainly used in the conversion from continuous signals to digital signals. Continuous signals become discrete signals after sampling, and discrete signals become digital signals after quantization.
- the fourth step perform entropy coding 109 on the quantized transform coefficients to obtain a part of the code stream and output it.
- the fifth step is to perform inverse quantization 110 and inverse transformation 111 on the quantized transform coefficients to obtain a reconstructed residual block, and then add the reconstructed residual block to the prediction block to obtain a reconstructed image.
- the sixth step is to add the reconstructed image to the reference frame queue after DB (Deblocking Filter, block filter 112) and SAO (Sample Adaptive Offset, adaptive pixel compensation 113) processing, and use it as the theoretical reference frame of the next frame image .
- the video image can be coded frame by frame by performing the first step to the sixth step above in a loop.
- the prediction mode when performing prediction mode selection 102 in the second step, the prediction mode may be selected according to the rate-distortion cost of the residual block in different prediction modes.
- the rate-distortion cost may be calculated by methods such as sum of squared differences (SSE), sum of absolute transform differences (SATD), and the like.
- SSE is used as an example for calculating the rate-distortion cost, but the disclosure is not limited thereto.
- FIG. 3 is a flowchart illustrating a video encoding method according to an exemplary embodiment of the present disclosure.
- the layered video coding method according to the exemplary embodiment of the present disclosure can be implemented on a device having video codec processing capability.
- it can be used in mobile phones, tablet computers, desktops, laptops, handheld computers, notebook computers, netbooks, personal digital assistants (personal digital assistant, PDA), augmented reality (augmented reality, AR) / virtual reality (virtual reality,
- PDA personal digital assistant
- augmented reality augmented reality, AR
- virtual reality virtual reality
- the video encoding method is implemented on a VR) device.
- the method includes the following steps S310 to S340.
- Step S310 determining a first intra-frame prediction mode and reference pixels of the image block.
- a first intra-frame prediction mode of an image block divided from a video frame and reference pixels for intra-frame prediction are determined.
- the current video image frame can be divided into N ⁇ N (N can be any value supported by the standard, such as 64, 32, 16, etc.) image blocks B_i (i is the image block index ), then, texture detection is performed on each B_i to determine the intra prediction mode applicable to the detected texture.
- the reference pixels used for intra prediction may be determined according to a specific video codec standard.
- the pixels in the above positions can be used At least one is determined as a reference pixel for intra prediction of surrounding pixel blocks.
- the corresponding intra prediction mode can be directly determined by using the intra prediction mode selection method defined in the HEVC and VVC standards.
- the selection design of the intra prediction mode is related to the image block The texture is consistent, so the initial intra-frame prediction mode, ie, the first intra-frame prediction mode, can be obtained directly.
- an image gradient detection method may also be used, that is, the gradient angle is calculated after filtering the image block using a sobel operator, and then the prediction direction corresponding to the gradient angle is determined as the first intra prediction mode .
- the detected result can be corresponding to any one of mode: 2, 3...34.
- the information of the first intra prediction mode (M_i) determined in step S310 may be saved and used in a subsequent encoding process.
- Step S320 determining the weight of the reference pixel in the current image block based on the first intra-frame prediction mode of the surrounding image blocks of the current image block.
- step S320 based on the first intra prediction mode of the surrounding image blocks of the current image block, the weight corresponding to the reference pixel in the current image block is determined, and the weight represents the distortion of the reference pixel for the surrounding image Influence of intra prediction of blocks.
- Step S330 determining the rate-distortion cost value of the intra prediction of the current image block based on the rate-distortion cost function.
- the rate-distortion cost value of the current image block in at least one candidate intra prediction mode is determined based on the rate-distortion cost function, wherein the rate-distortion cost function includes the rate-distortion rate used for the intra prediction of the current image block a cost item (hereinafter also referred to as a first rate-distortion cost item) and a weighted rate-distortion cost item (hereinafter also referred to as a second rate-distortion cost item) of reference pixels used for intra prediction of surrounding image blocks of the current image block, wherein, the weight of the weighted rate-distortion cost item for the reference pixel is determined based on the first intra-frame prediction mode of the surrounding image blocks of the current image block in step S320.
- the rate-distortion cost function includes the rate-distortion rate used for the intra prediction of the current image block a cost item (hereinafter also referred to as a first rate-distortion cost item) and a weighte
- reference pixels used for intra prediction of surrounding image blocks of a current image block may include lower pixel rows of the current image block At least one of , the right column of pixels, and the bottom right pixel.
- the bottom row of pixels, the rightmost column of pixels, and a pixel in the lower right corner of the current image block can be used as references for intra prediction of the lower adjacent image block, the right adjacent image block, and the lower right image block, respectively.
- pixels, while more rows or columns of reference pixels can be used in the VVC standard. It should be understood that reference pixels at different positions may be used according to different reference pixels used for intra-frame prediction in coding standards.
- Reference pixels are selectively adjusted to compensate for pixel distortion.
- the rightmost column of reference pixels (col_right) of the current image block i_TL may affect the image block i_T, and the bottom row of reference pixels
- the pixel (row_bottom) may affect the pixel block i_L, and the bottom right corner reference pixel (pixel_RB) may affect the image block i. Therefore, when selecting the intra prediction mode of the image block i_TL, the influence on surrounding image blocks needs to be considered.
- step S310 the information M_i of the first intra-frame prediction mode has been detected for each image block, so when the intra-frame prediction mode is selected for the current image block i_TL, the rate-distortion cost item that may be used as a reference pixel can be introduced into Rate-distortion optimization function.
- the original rate-distortion cost function usable in HEVC encoding is the following equation (1) including a rate-distortion cost term for an image block:
- J represents the rate-distortion cost of the image block i_TL in mode_j (here, j can be any one of the intra prediction mode indexes 0-34), and SSE represents the sum of squares of the difference between the reconstructed pixel and the original pixel of mode_j, R(mode_j) represents the code rate of mode_j, and lambda represents the rate-distortion cost coefficient of the code rate.
- the above rate-distortion cost function may be modified to include weighted rate-distortion cost items for reference pixels used for intra prediction of surrounding image blocks of the image block, and the modified rate-distortion cost function Equation (3) can be as follows:
- A, B, and C are weight parameters for the right pixel column, the lower pixel row, and the lower right pixel, respectively. That is to say, the same part of Equation (3) as Equation (1), namely, SSE(mode_j)+lambda*R(mode_j) can be considered as the first rate-distortion cost item, while the Part, ie, A*SSE(col_right)+B*SSE(row_bottom)+C*SSE(pixel_RB) ⁇ can be considered as the above-mentioned second rate-distortion cost term.
- the weight of the rate-distortion cost item for the lower pixel row of the current image block may be determined based on the first intra prediction mode of the lower image block adjacent to the current image block, using The weight of the rate-distortion cost item of the right pixel column of the current image block can be determined based on the first intra-frame prediction mode of the right image block adjacent to the current image block, for the rate-distortion cost item of the lower right corner pixel Weights can be fixed values.
- the influence of the reference pixel on the surrounding image blocks is related to the prediction mode (ie, direction) of the surrounding image blocks, by introducing the rate-distortion cost item in the rate-distortion cost function that considers the reference direction of the surrounding pixel blocks , which can better compensate the distortion of the reference pixel.
- the weight A of the rate-distortion cost item for the right pixel row of the current image block is determined based on the prediction direction angle corresponding to the first intra prediction mode of the right image block of the current image block
- the weight B of the rate-distortion cost item for the lower pixel column of the current image block is determined based on the prediction direction angle corresponding to the first intra prediction mode of the lower image block of the current image block, and the weight B for the lower right corner pixel
- the weight value C of the rate-distortion cost item may be a preset value MAX, where the values of A and B are within a predetermined range [0, MAX].
- the weight A of the rate-distortion cost item for the right pixel column of the current image block, the weight B of the rate-distortion cost item for the lower pixel row of the current image block, and the weight of the rate-distortion cost item for the right pixel row of the current image block can be determined according to the following equations
- ang_M_i_T represents the prediction direction angle corresponding to the first intra prediction mode of the right image block adjacent to the current image block
- ang_M_i_L represents the prediction direction angle corresponding to the first intra prediction mode of the lower image block adjacent to the current image block.
- MAX is a preset value.
- MAX may take a value of 2.
- the above value method for example, trigonometric function
- the corresponding value method can be adopted according to different video encoding methods, as long as the weight can reflect the current
- the influence of the reference pixels used in the surrounding image blocks in the image block on the prediction of the surrounding pixel blocks is sufficient.
- Step S340 Determine a second intra-frame prediction mode of the current image block according to the rate-distortion cost value, and perform encoding on the current image block using the second intra-frame prediction mode.
- step S340 according to the rate-distortion cost value in the at least one candidate intra-frame prediction mode, determine the second intra-frame prediction mode of the current image block, and use the second intra-frame prediction mode for the current image block Perform encoding. That is to say, the mode_j corresponding to the minimum rate-distortion cost value is determined by Equation 3, and mode_j can be determined as the final second intra-frame prediction mode for the image block i_TL, and the image block is determined using this intra-frame prediction mode to encode.
- the mode_j here may be one of multiple intra-frame prediction modes stipulated according to the video codec standard.
- the rate-distortion cost values of the rate-distortion cost function of the current image block in a plurality of candidate intra-frame prediction modes may be determined, and the candidate intra-frame prediction mode with the smallest rate-distortion cost value is determined to be used for the current image block.
- the second intra prediction mode for the image block may be determined, and the candidate intra-frame prediction mode with the smallest rate-distortion cost value is determined to be used for the current image block.
- the distortion of the reference pixels used in the intra-frame prediction can be reduced, thereby improving the quality of video encoding.
- the rate-distortion cost function under multiple quantization parameters may be considered, and the optimal intra prediction mode and the optimal quantization parameter may be found by traversing the multiple quantization parameters. That is to say, in step S330, multiple rate-distortion cost values of the rate-distortion cost function of the current image block under different candidate intra prediction modes and quantization parameters can be determined, and in step S340, the multiple rate-distortion cost values can be combined with the multiple rate-distortion cost values The candidate intra prediction mode and quantization parameter corresponding to the minimum rate-distortion cost value among the values are determined as the second intra prediction mode and quantization parameter for performing intra prediction on the current block.
- a quantization parameter (QP) can be introduced to traverse multiple QPs downwards, so that the rate-distortion cost of equation (3) can be changed into the following equation (4):
- QP_k belongs to ⁇ 32,31,30... ⁇ .
- the number of QPs that need to be traversed can be specified, generally 2.
- the distortion of the reference pixels of intra-frame prediction can be better reflected, thereby further improving the efficiency and quality of video coding.
- FIG. 5 is a block diagram illustrating a video encoding device according to an exemplary embodiment of the present disclosure.
- the video encoding apparatus according to the exemplary embodiments of the present disclosure may be implemented in a device having a video encoding function in hardware, software, and/or a combination of software and hardware.
- a video encoding device 500 may include a first mode determination module 510 , a weight determination module 520 , a rate-distortion cost determination module 530 and a second mode determination mode 540 .
- the first mode determination module 510 is configured to determine a first intra prediction mode of an image block divided from a video frame and reference pixels for intra prediction.
- the first intra prediction mode for the image block may be determined according to the texture detection result of the image block.
- the rate-distortion cost may be directly used to determine an intra-frame prediction mode from multiple intra-frame prediction modes as the first intra-frame prediction mode.
- the weight determination module 520 is configured to determine, based on the first intra prediction mode of the surrounding image blocks of the current image block, the weights corresponding to the reference pixels used for the intra prediction of the surrounding image blocks in the current image block, the weights represent The influence of the distortion of the reference pixel on the intra prediction of the surrounding image blocks.
- the rate-distortion cost determination module 530 is configured to determine a rate-distortion cost value for the current image block in at least one candidate intra prediction mode based on a rate-distortion cost function, wherein the rate-distortion cost function includes intra prediction for the current image block
- the rate-distortion cost term of and the weighted rate-distortion cost term of the reference pixel for intra prediction of the surrounding image blocks of the current image block, wherein the weight of the weighted rate-distortion cost item of the reference pixel is based on the surrounding images of the current image block
- a first intra prediction mode for the block is determined.
- the second mode determination module 540 is configured to determine a second intra-frame prediction mode of the current image block according to the rate-distortion cost value in the at least one candidate intra-frame prediction mode, and use the second intra-frame prediction mode Perform encoding on the current image block.
- the first mode determination module 510 is configured to perform texture detection on the image block to determine a first intra prediction mode suitable for the detected texture
- the weight determination module 520 is configured to The video coding standard adopted by the coding method determines the position of the reference pixel.
- the reference pixels used for intra prediction of surrounding image blocks of the current image block include at least one of a lower pixel row, a right pixel column, and a lower right pixel of the current image block.
- the bottom row of pixels, the rightmost column of pixels, and a pixel in the lower right corner of an image block can be used as the lower adjacent image block, the right adjacent image block, and the lower right corner pixel respectively.
- Reference pixels for intra-frame prediction of an image block, and more rows or columns of reference pixels can be used in the VVC standard. It should be understood that reference pixels at different positions may be used according to different reference pixels used for intra-frame prediction in coding standards.
- the weight of the rate-distortion cost item for the lower pixel row of the current image block is determined based on the first intra prediction mode of the lower image block adjacent to the current image block, for
- the weight value of the rate-distortion cost item of the right pixel column of the current image block is determined based on the first intra prediction mode of the right image block adjacent to the current image block, and is used for the weight of the rate-distortion cost item of the lower right pixel The value is fixed.
- the weight A of the rate-distortion cost item for the right pixel row of the current image block is determined based on the prediction direction angle corresponding to the first intra prediction mode of the right image block of the current image block
- the weight B of the rate-distortion cost item for the lower pixel column of the current image block is determined based on the prediction direction angle corresponding to the first intra prediction mode of the lower image block of the current image block
- the weight value C of the rate-distortion cost item may be a preset value MAX, where the values of A and B are in the range [0, MAX].
- the weight A of the rate-distortion cost item for the lower pixel row of the current image block the weight B of the rate-distortion cost item for the lower pixel row of the current image block, and the rate-distortion cost item for the lower right corner pixel
- the weight value C of is determined by:
- ang_M_i_T represents the prediction direction angle corresponding to the first intra prediction mode of the right image block adjacent to the current image block
- ang_M_i_L represents the prediction direction angle corresponding to the first intra prediction mode of the lower image block adjacent to the current image block.
- Predicted direction angle, MAX is a preset value.
- the rate-distortion cost determination module 520 is configured to determine a plurality of rate-distortion cost values of the rate-distortion cost function of the current image block under different candidate intra prediction modes and quantization parameters
- the second The mode determination module 530 is configured to determine the candidate intra prediction mode and quantization parameter corresponding to the minimum rate-distortion cost value among the plurality of rate-distortion cost values as the second intra prediction mode for performing intra prediction on the image block. Prediction modes and quantization parameters.
- the first mode determination module 510 is configured to determine the gradient angle of the image block through image gradient detection, and determine the intra prediction mode corresponding to the determined gradient angle as the first intra prediction mode of the image block.
- the rate-distortion cost determination module 520 is configured to determine the rate-distortion cost values of the rate-distortion cost function of the image block in a plurality of candidate intra prediction modes
- the second mode determination module 530 is configured To determine the candidate intra prediction mode with the smallest rate-distortion cost value as the second intra prediction mode for the image block.
- FIG. 6 is a structural block diagram illustrating an electronic device 600 for video encoding according to an exemplary embodiment of the present disclosure.
- the electronic device 600 can be, for example, a smart phone, a tablet computer, an MP4 (Moving Picture Experts Group Audio Layer IV, Moving Picture Experts Group Audio Layer IV) player, a notebook computer or a desktop computer.
- the electronic device 600 may also be called user equipment, portable terminal, laptop terminal, desktop terminal and other names.
- the electronic device 600 includes: a processor 601 and a memory 602 .
- the processor 601 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
- the processor 601 can be realized by at least one hardware form of DSP (Digital Signal Processing, digital signal processing), FPGA (Field Programmable Gate Array, field programmable gate array), and PLA (Programmable Logic Array, programmable logic array).
- Processor 601 may also include a main processor and a coprocessor, and the main processor is a processor for processing data in a wake-up state, also known as a CPU (Central Processing Unit, central processing unit); Low-power processor for processing data in standby state.
- CPU Central Processing Unit, central processing unit
- Low-power processor for processing data in standby state.
- the processor 601 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
- the processor 601 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is configured to process computing operations related to machine learning.
- AI Artificial Intelligence, artificial intelligence
- Memory 602 may include one or more computer-readable storage media, which may be non-transitory.
- the memory 602 may also include high-speed random access memory and non-volatile memory, such as one or more magnetic disk storage devices and flash memory storage devices.
- the non-transitory computer-readable storage medium in the memory 602 is used to store at least one instruction, and the at least one instruction is used to be executed by the processor 601 to implement the video encoding of the exemplary embodiments of the present disclosure. method.
- the electronic device 600 may optionally further include: a peripheral device interface 603 and at least one peripheral device.
- the processor 601, the memory 602, and the peripheral device interface 603 may be connected through buses or signal lines.
- Each peripheral device can be connected to the peripheral device interface 603 through a bus, a signal line or a circuit board.
- the peripheral device includes: at least one of a radio frequency circuit 604 , a touch screen 605 , a camera 606 , an audio circuit 607 , a positioning component 608 and a power supply 609 .
- the peripheral device interface 603 may be used to connect at least one peripheral device related to I/O (Input/Output, input/output) to the processor 601 and the memory 602 .
- the processor 601, memory 602 and peripheral device interface 603 are integrated on the same chip or circuit board.
- any one or two of the processor 601 , memory 602 and peripheral device interface 603 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.
- the radio frequency circuit 604 is used to receive and transmit RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
- the radio frequency circuit 604 communicates with the communication network and other communication devices through electromagnetic signals.
- the radio frequency circuit 604 converts electrical signals into electromagnetic signals for transmission, or converts received electromagnetic signals into electrical signals.
- the radio frequency circuit 604 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and the like.
- the radio frequency circuit 604 can communicate with other terminals through at least one wireless communication protocol.
- the wireless communication protocol includes but is not limited to: metropolitan area network, mobile communication networks of various generations (2G, 3G, 4G and 5G), wireless local area network and/or WiFi (Wireless Fidelity, wireless fidelity) network.
- the radio frequency circuit 604 may also include circuits related to NFC (Near Field Communication, short-range wireless communication), which is not limited in the present disclosure.
- the display screen 605 is used to display a UI (User Interface, user interface).
- the UI can include graphics, text, icons, video, and any combination thereof.
- the display screen 605 is a touch display screen, the display screen 605 also has the ability to collect touch signals on or above the surface of the display screen 605 .
- the touch signal can be input to the processor 601 as a control signal for processing.
- the display screen 605 can also be used to provide virtual buttons and/or virtual keyboards, also called soft buttons and/or soft keyboards.
- there may be one display screen 605 which is set on the front panel of the electronic device 600 .
- the display screen 605 may be a flexible display screen, which is disposed on the curved surface or the folded surface of the terminal 600 . Even, the display screen 605 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
- the display screen 605 can be made of LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light-emitting diode) and other materials.
- the camera assembly 606 is used to capture images or videos.
- the camera assembly 606 includes a front camera and a rear camera.
- the front camera is set on the front panel of the terminal, and the rear camera is set on the back of the terminal.
- there are at least two rear cameras which are any one of the main camera, depth-of-field camera, wide-angle camera, and telephoto camera, so as to realize the fusion of the main camera and the depth-of-field camera to realize the background blur function.
- camera assembly 606 may also include a flash.
- the flash can be a single-color temperature flash or a dual-color temperature flash. Dual color temperature flash refers to the combination of warm light flash and cold light flash, which can be used for light compensation under different color temperatures.
- Audio circuitry 607 may include a microphone and speakers.
- the microphone is used to collect sound waves of the user and the environment, and convert the sound waves into electrical signals and input them to the processor 601 for processing, or input them to the radio frequency circuit 604 to realize voice communication.
- the microphone can also be an array microphone or an omnidirectional collection microphone.
- the speaker is used to convert the electrical signal from the processor 601 or the radio frequency circuit 604 into sound waves.
- the loudspeaker can be a conventional membrane loudspeaker or a piezoelectric ceramic loudspeaker.
- the audio circuit 607 may also include a headphone jack.
- the positioning component 608 is used to locate the current geographic location of the electronic device 600, so as to realize navigation or LBS (Location Based Service, location-based service).
- the positioning component 608 may be a positioning component based on the GPS (Global Positioning System, Global Positioning System) of the United States, the Beidou system of China, the Grenax system of Russia, or the Galileo system of the European Union.
- the power supply 609 is used to supply power to various components in the electronic device 600 .
- Power source 609 may be AC, DC, disposable or rechargeable batteries.
- the rechargeable battery may support wired charging or wireless charging.
- the rechargeable battery can also be used to support fast charging technology.
- the electronic device 600 further includes one or more sensors 610 .
- the one or more sensors 610 include, but are not limited to: an acceleration sensor 611 , a gyro sensor 612 , a pressure sensor 613 , a fingerprint sensor 614 , an optical sensor 615 and a proximity sensor 616 .
- the acceleration sensor 311 can detect the acceleration on the three coordinate axes of the coordinate system established by the terminal 600 .
- the acceleration sensor 611 can be used to detect the components of the acceleration of gravity on the three coordinate axes.
- the processor 601 may control the touch screen 605 to display a user interface in a landscape view or a portrait view according to the gravitational acceleration signal collected by the acceleration sensor 611 .
- the acceleration sensor 611 can also be used for collecting game or user's motion data.
- the gyro sensor 612 can detect the body direction and rotation angle of the terminal 600 , and the gyro sensor 612 can cooperate with the acceleration sensor 611 to collect 3D actions of the user on the terminal 600 .
- the processor 601 can realize the following functions: motion sensing (such as changing the UI according to the user's tilt operation), image stabilization during shooting, game control and inertial navigation.
- the pressure sensor 613 may be disposed on a side frame of the terminal 600 and/or a lower layer of the touch display screen 605 .
- the pressure sensor 613 When the pressure sensor 613 is installed on the side frame of the terminal 600 , it can detect the user's grip signal on the terminal 600 , and the processor 601 performs left and right hand recognition or shortcut operation according to the grip signal collected by the pressure sensor 613 .
- the processor 601 controls the operable controls on the UI according to the user's pressure operation on the touch screen 605.
- the operable controls include at least one of button controls, scroll bar controls, icon controls, and menu controls.
- the fingerprint sensor 614 is used to collect the user's fingerprint, and the processor 601 recognizes the identity of the user according to the fingerprint collected by the fingerprint sensor 614, or the fingerprint sensor 614 recognizes the user's identity according to the collected fingerprint.
- the processor 601 authorizes the user to perform related sensitive operations, such sensitive operations include unlocking the screen, viewing encrypted information, downloading software, making payment, and changing settings.
- the fingerprint sensor 614 may be disposed on the front, back or side of the electronic device 600 . When the electronic device 600 is provided with a physical button or a manufacturer's logo, the fingerprint sensor 614 may be integrated with the physical button or the manufacturer's Logo.
- the optical sensor 615 is used to collect ambient light intensity.
- the processor 601 can control the display brightness of the touch screen 605 according to the ambient light intensity collected by the optical sensor 615 . Specifically, when the ambient light intensity is high, the display brightness of the touch screen 605 is increased; when the ambient light intensity is low, the display brightness of the touch screen 605 is decreased.
- the processor 601 may also dynamically adjust shooting parameters of the camera assembly 606 according to the ambient light intensity collected by the optical sensor 615 .
- the proximity sensor 616 also called a distance sensor, is usually arranged on the front panel of the electronic device 600 .
- the proximity sensor 616 is used to collect the distance between the user and the front of the electronic device 600 .
- the processor 601 controls the touch display screen 605 to switch from the bright screen state to the off screen state; when the proximity sensor 616 detects When the distance between the user and the front of the electronic device 600 gradually increases, the processor 601 controls the touch display screen 605 to switch from the off-screen state to the on-screen state.
- FIG. 6 does not constitute a limitation to the electronic device 600, and may include more or less components than shown in the figure, or combine some components, or adopt a different arrangement of components.
- FIG. 7 is a structural block diagram of another electronic device 700 .
- the electronic device 700 may be provided as a server.
- an electronic device 700 includes one or more processing processors 710 and a memory 720 .
- the memory 720 may include one or more programs for executing the above data labeling method.
- the electronic device 700 may also include a power supply component 730 configured to perform power management of the electronic device 700, a wired or wireless network interface 740 configured to connect the electronic device 700 to a network, and an input-output (I/O) interface 750 .
- the electronic device 700 can operate based on an operating system stored in the memory 720, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM or the like.
- a computer-readable storage medium storing instructions, wherein, when the instructions are executed by at least one processor, at least one processor is prompted to execute the video processing described in any one of the embodiments of the present disclosure. encoding method.
- Examples of computer readable storage media herein include: Read Only Memory (ROM), Random Access Programmable Read Only Memory (PROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Random Access Memory (RAM) , Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), Flash Memory, Non-volatile Memory, CD-ROM, CD-R, CD+R, CD-RW, CD+RW, DVD-ROM , DVD-R, DVD+R, DVD-RW, DVD+RW, DVD-RAM, BD-ROM, BD-R, BD-R LTH, BD-RE, Blu-ray or Optical Memory, Hard Disk Drive (HDD), Solid State Hard disks (SSD), memory cards (such as MultiMediaCards, Secure Digital (SD) or Extreme Digital (XD) cards), magnetic tapes, floppy disks, magneto-optical data storage devices, optical data storage devices, hard disks, solid-state disks, and any other means configured to store a computer program and any associated data, data files
- the computer program in the above-mentioned computer-readable storage medium can be run in an environment deployed in computer equipment such as a client, a host, an agent device, a server, and the like. Also, in one example, the computer program and any associated data, data files and data structures are distributed over a networked computer system such that the computer program and any associated data, data files and data structures are processed by one or more stored, accessed, and executed in a distributed fashion by servers or computers.
- a computer program product is also provided, and instructions in the computer program product can be used by a processor of a computer device to execute the video encoding method described in any one of the embodiments of the present disclosure.
- a computer program including computer program code, when the computer program code is run on a computer, so that the computer executes the method described in any one of the embodiments of the present disclosure.
- Video encoding method when the computer program code is run on a computer, so that the computer executes the method described in any one of the embodiments of the present disclosure.
- the layered video coding scheme and decoding scheme combines layered coding properties with video temporal correlation properties, not only utilizing previously reconstructed base layer coded video frames, but also utilizing previously reconstructed high quality enhancement layer video frames , sending two kinds of strongly correlated feature information (multi-frame & multi-layer) into the designed convolutional neural network, which greatly improves the quality of the inner video frame, thereby improving the efficiency of layered video coding. While improving the compression quality, the bit rate of the video is reduced.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
公开了一种视频编码方法和装置、电子设备和计算机可读存储介质。所述视频编码方法包括:确定针对从视频图像帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素;基于当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中的参考像素对应的权重;基于率失真代价函数确定所述当前图像块在至少一个帧内预测模式下的率失真代价值;根据所述至少一个候选帧内预测模式下的所述率失真代价值确定所述图像块的第二帧内预测模式,并使用第二帧内预测模式对所述图像块执行编码。
Description
相关申请的交叉引用
本申请基于申请号为202111370720.3、申请日为2021年11月18日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
本公开涉及视频编码技术领域,尤其涉及一种用于视频编码的帧内预测的方法和装置、电子设备及计算机可读存储介质。
在现有的视频编码标准中,例如国际标准高效率视频编码(HEVC)、通用视频编码(VVC)等,包含了帧内和帧间两种预测方法。以HEVC标准举例,视频的每一帧图像需要划分为固定大小的正方形图像块为基本单位,每一图像块按照光栅顺序依次进行编码。图像块首先进行编码块划分,每一编码块利用参考图像进行帧内/帧间预测,预测块与原始块之间的差值为残差块。生成的残差块依次进行变换和量化,与编码模式一起进行熵编码后构成码流。预测后的残差通常幅值远比原始像素值要小,因此用编码像素差值来代替直接编码原始像素值,可以大大提高编码效率。在HEVC编码标准中,参考像素可以使用正上方、右上方、正左方和左下方的相邻重构像素。在VVC中,还会采用多行参考像素来提高预测的精度。
结合上述编码方法可以知道,在有损视频压缩中,参考像素通常是经过量化过程的,因此会存在不同程度的失真。由于后续块会使用已编码的重构像素作为参考像素进行当前块预测,所以参考像素的失真会影响当前块的预测准确度,因此,需要一种提高视频编码的预测准确度的方法。
发明内容
本公开实施例提供了一种视频编码方法,结合视频内容特点,有选择地减少参考像素失真,从而提高帧内预测准确度,提高编码器压缩效率。通过对参考像素的失真情况进行调整,可以尽可能减少参考像素的失真,从而可以提高视频编码质量。
根据本公开的第一方面的实施例,提供了一种视频编码方法,包括:确定针对从视频图像帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素;基于当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中的参考像素对应的权重,所述权重表征所述参考像素的失真对于所述周围图像块的帧内预测的影响;基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值,其中,率失真代价函数包括第一率失真代价项和具有所述权重的第二率失真代价项,其中,第一率失 真代价项为针对所述当前图像块的帧内预测的代价项,第二率失真代价项为针对所述参考像素的加权代价项;根据所述至少一个候选帧内预测模式下的所述率失真代价值,确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对所述当前图像块执行编码。
根据本公开的第一方面的实施例,确定针对从视频图像帧划分出的图像块的第一帧内预测模式包括:对图像块进行纹理检测,并确定适用于检测到的纹理的第一帧内预测模式;确定用于帧内预测的参考像素包括:基于采用的视频编码标准,确定图像块中用于帧内预测的参考像素的位置。
根据本公开的第一方面的实施例,所述用于周围图像块的帧内预测的参考像素包括:所述当前图像块的下侧像素行、右侧像素列和右下角像素中的至少一个。
根据本公开的第一方面的实施例,所述确定所述当前图像块中的参考像素对应的权重,包括:基于所述当前图像块的右侧图像块的第一帧内预测模式所对应的预测方向角度,确定用于所述当前图像块的右侧像素行的率失真代价项的权重A;基于所述当前图像块的下侧图像块的第一帧内预测模式对应的预测方向角度,确定用于所述当前图像块的下侧像素列的率失真代价项的权重B;将用于右下角像素的率失真代价项的权重值C确定为预设值MAX;其中,A和B的数值在[0,MAX]的范围内。
根据本公开的第一方面的实施例,所述根据所述至少一个候选帧内预测模式下的所述多个率失真代价值,确定所述当前图像块的第二帧内预测模式包括:从所述至少一个候选帧内预测模式中确定出具有最小率失真代价值的候选帧内预测模式,作为用于所述当前图像块的第二帧内预测模式。
根据本公开的第一方面的实施例,所述率失真代价函数中还包括量化参数,所述基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值包括:在每个候选帧内预测模式下遍历多个量化参数,并确定所述当前图像块的率失真代价函数在不同的候选帧内预测模式的每个量化参数下的多个率失真代价值;其中,所述根据所述至少一个候选帧内预测模式下的所述率失真代价值,确定第二帧内预测模式包括:确定所述多个率失真代价值中的最小率失真代价值,并将所述最小率失真代价值对应的候选帧内预测模式和量化参数确定为用于对所述当前图像块执行帧内预测的第二帧内预测模式和量化参数。
根据本公开的第一方面的实施例,所述确定图像块的第一帧内预测模式包括:通过图像梯度检测计算图像块的梯度角度;获取与计算出的梯度角度相应的帧内预测模式,作为所述图像块的第一帧内预测模式。
根据本公开的第二方面的实施例,提供了一种视频编码装置,包括:第一模式确定模块,被配置为确定针对从视频图像帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素;权重确定模块,被配置为当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中的参考像素对应的权重,所述权重表征所述参考像素的失真对于所述周围图像块的帧内预测的影响;率失真代价确定模块,被配置为基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值,其中,率失真代价函 数包括第一率失真代价项和具有所述权重的第二率失真代价项,其中,第一率失真代价项为针对所述当前图像块的帧内预测的代价项,第二率失真代价项为针对所述参考像素的加权代价项;第二模式确定模块,被配置为根据所述至少一个候选帧内预测模式下的所述率失真代价值确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对所述当前图像块执行编码。
根据本公开的第二方面的实施例,第一模式确定模块被配置为对图像块进行纹理检测,确定适用于检测到的纹理的第一帧内预测模式,并且基于视频编码标准,确定图像块中用于帧内预测的参考像素的位置。
根据本公开的第二方面的实施例,所述用于当前周围图像块的帧内预测的参考像素包括所述当前图像块的下侧像素行、右侧像素列和右下角像素中的至少一个。
根据本公开的第二方面的实施例,权重确定模块被配置为:基于所述当前图像块的右侧图像块的第一帧内预测模式所对应的预测方向角度,确定用于所述当前图像块的右侧像素行的率失真代价项的权重A;基于所述当前图像块的下侧图像块的第一帧内预测模式对应的预测方向角度,确定用于所述当前图像块的下侧像素列的率失真代价项的权重B;将用于右下角像素的率失真代价项的权重值C确定为预设值MAX,其中,A和B的数值在[0,MAX]的范围内。
根据本公开的第二方面的实施例,第二模式确定模块被配置为:从所述多个候选帧内预测模式中确定出具有最小率失真代价值的候选帧内预测模式,作为用于所述当前图像块的第二帧内预测模式。
根据本公开的第二方面的实施例,所述率失真代价函数中还包括量化参数,率失真代价确定模块被配置为:在每个候选帧内预测模式下遍历多个量化参数,并确定所述当前图像块的率失真代价函数在每个候选帧内预测模式的多个量化参数下的多个率失真代价值,并且第二模式确定模块被配置为:确定与所述多个率失真代价值中的最小率失真代价值对应的候选帧内预测模式和量化参数,并将所述候选帧内预测模式和量化参数确定为用于对所述当前图像块执行帧内预测的第二帧内预测模式和量化参数。
根据本公开的第二方面的实施例,第一模式确定模块被配置为通过图像梯度检测计算图像块的梯度角度,获取与计算出的梯度角度相应的帧内预测模式,作为用于图像块的第一帧内预测模式。
根据本公开的第三方面的实施例,提供了一种电子设备,包括:至少一个处理器;至少一个存储计算机可执行指令的存储器,其中,所述计算机可执行指令在被所述至少一个处理器运行时,促使所述至少一个处理器执行第一方面实施例任一项所述的视频编码方法。
根据本公开的第四方面的实施例,提供了一种计算机可读存储介质,当所述计算机可读存储介质中的指令由至少一个处理器执行时,使得所述至少一个处理器能够执行第一方面实施例任一项所述的视频编码方法。
根据本公开的第五方面的实施例,提供了一种计算机程序产品,所述计算机程序产 品中的指令被至少一个处理器运行以执行第一方面实施例任一项所述的视频编码方法。
根据本公开的第六方面的实施例,提供了一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行第一方面实施例任一项所述的视频编码方法。
通过对参考像素的失真情况进行调整,可以尽可能减少参考像素的失真,从而可以提高视频编码质量。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理,并不构成对本公开的不当限定。
图1是示出根据本公开的示例性实施例的视频编码方案的整体框架示意图。
图2是示出在视频编码方案中使用的帧内预测的示意图。
图3是示出根据本公开的示例性实施例的视频编码方法的流程图。
图4是示出根据本公开的示例性实施例的用于HEVC编码的帧内预测的图像块及其预测方向的示意图。
图5是示出根据本公开的示例性实施例的视频编码装置的框图。
图6是示出根据本公开的示例性实施例的用于视频编码的电子设备的示意图。
图7是示出根据本公开的另一示例性实施例的用于视频编码的电子设备的示意图。
为了使本领域普通人员更好地理解本公开的技术方案,下面将结合附图,对本公开实施例中的技术方案进行清楚、完整地描述。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例能够以除了在这里图示或描述的那些以外的顺序实施。以下实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在此需要说明的是,在本公开中出现的“若干项之中的至少一项”均表示包含“该若干项中的任意一项”、“该若干项中的任意多项的组合”、“该若干项的全体”这三类并列的情况。例如“包括A和B之中的至少一个”即包括如下三种并列的情况:(1)包括A;(2)包括B;(3)包括A和B。又例如“执行步骤一和步骤二之中的至少一个”,即表示如下三种并列的情况:(1)执行步骤一;(2)执行步骤二;(3)执行步骤一和步骤二。
图1是根据本公开的示例性实施例的视频编码方案100的整体框架示意图。
第一步,对于任一帧视频图像101,将该帧图像分割为至少一个编码单元。
第二步,将该帧图像输入到编码器中进行编码预测,该过程主要利用视频数据的空间相关性和时间相关性,采用帧内预测103或帧间预测(对应104和105)去除每个编码单元中将被编码的块的时空域冗余信息,得到每个块在参考帧106中的匹配块。具体地,在帧内预测中,利用当前帧已编码的重构像素来预测还未编码的图像块,从而去除视频中的空间冗余。
如图2所示,在图2(a)中,方框内图像内容与框外像素值相似,并且拥有方向性的纹理。如图2(b)所示,在HEVC编码标准中,参考像素可以使用正上方,右上方,正左方和左下方的相邻重构像素。当重构像素不存在时会按照一定规则进行填充。在VVC编码标准中,还会采用多行参考像素来提高预测的精度。
不同帧内预测模式使用的参考像素位置不同。例如,如图2(c)所示,水平模式(mode10)使用的是当前块左侧一列像素作为参考像素(图2(c)中左方斜线阴影部分像素),垂直模式(mode 26)则使用当前块上方一行重构像素作为参考像素(图2(c)上方斜线阴影部分像素)。因此,对不同内容的编码块,需要调整的参考像素的位置、数量均不相同。例如,对于水平模式(mode 10),需要弥补左方斜线阴影区域的参考像素的失真,对于垂直模式(mode 26)需要弥补上方斜线阴影区域的参考像素的失真。
第三步,将匹配块和相应的编码块相减,得到残差块,并对残差块分别进行变换107和量化处理108,得到量化后的变换系数。这里,变换可包括离散余弦变换(DCT)、快速傅里叶变换(FFT)等。量化处理为数字信号处理领域一种常用的技术,是指将信号的连续取值(或者大量可能的离散取值)近似为有限多个(或较少的)离散值的过程。量化处理主要应用于从连续信号到数字信号的转换中,连续信号经过采样成为离散信号,离散信号经过量化即成为数字信号。
第四步,将量化后的变换系数进行熵编码109,得到码流中的一部分并输出。
第五步,将量化后的变换系数进行逆量化处理110和逆变换111,得到重构残差块,进而将重构残差块与预测块相加,得到重建图像。
第六步,将重建图像经过DB(Deblocking Filter,区块滤波112)和SAO(SampleAdaptive Offset,自适应像素补偿113)处理后,加入到参考帧队列中,并作为下一帧图像的理论参考帧。通过循环执行上述第一步至第六步使得视频图像能够逐帧地编码。
根据本公开的示例性实施例,在第二步中进行预测模式的选择102时,可根据残差块在不同的预测模式下的率失真代价来选择预测模式。根据本公开的示例性实施例,在不同的视频编码标准中,可通过诸如差值平方和(SSE)、变换差绝对值总和(SATD)等方式来计算率失真代价。在以下说明中,以SSE作为计算率失真代价的示例进行说明,但本公开不限于此。
图3是示出根据本公开的示例性实施例的视频编码方法的流程图。应理解,可以在具有视频编解码处理能力的设备上实现根据本公开的示例性实施例的分层视频编码方法。 例如,可以在手机、平板电脑、桌面型、膝上型、手持计算机、笔记本电脑、上网本、个人数字助理(personal digital assistant,PDA)、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备上实施该视频编码方法。
如图3所示,该方法包含以下步骤S310至S340。
步骤S310,确定图像块的第一帧内预测模式和参考像素。
在步骤S310中,确定从视频帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素。例如,在支持HEVC的视频编码器中,可将当前视频图像帧划分为N×N(N可为标准支持的任意值,例如64、32、16等)的图像块B_i(i为图像块索引),然后,对每个B_i进行纹理检测以确定适用于检测到的纹理的帧内预测模式。这里,用于帧内预测的参考像素可根据具体的视频编解码标准而被确定。例如,在HEVC标准中,由于图像块的右侧的像素列、下侧的像素行以及右下角的像素可能被分别用于周围像素块的帧内预测,因此,可将以上位置的像素中的至少一个确定为用于周围像素块的帧内预测的参考像素。
根据本公开的示例性实施例,可使用HEVC、VVC标准中定义的帧内预测模式选择方法来直接确定相应的帧内预测模式,在这些标准中,帧内预测模式的选择设计与图像块的纹理是一致的,因此可直接获得初始的帧内预测模式,即第一帧内预测模式。
根据本公开的示例性实施例,也可以使用图像梯度检测方法,即,利用sobel算子对图像块进行滤波后计算梯度角度,然后将梯度角度所对应的预测方向确定为第一帧内预测模式。例如在图4(a)示出的HEVC的帧内预测模式对应的预测方向角度中,可将检测到的结果对应到mode:2,3…34中任意一种。
在步骤S310确定的第一帧内预测模式(M_i)的信息可被保存并在后续编码过程中被使用。
步骤S320,基于当前图像块的周围图像块的第一帧内预测模式确定当前图像块中的参考像素的权重。
在步骤S320中,基于当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中的参考像素对应的权重,所述权重表征所述参考像素的失真对于所述周围图像块的帧内预测的影响。
步骤S330,基于率失真代价函数确定所述当前图像块的帧内预测的率失真代价值。
在步骤S330中,基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值,其中,率失真代价函数包括用于当前图像块的帧内预测的率失真代价项(以下也称为第一率失真代价项)和用于当前图像块的周围图像块的帧内预测的参考像素的加权率失真代价项(以下也称为第二率失真代价项),其中,用于参考像素的加权率失真代价项的权重是在步骤S320中基于当前图像块的周围图像块的第一帧内预测模式被确定的。
如上所述,根据本公开的示例性实施例,在符合HEVC标准的编码器中,用于当前图像块的周围图像块的帧内预测的参考像素可包括所述当前图像块的下侧像素行、右侧像 素列和右下角像素中的至少一个。例如,当前图像块的最下面一行像素、最右侧一列像素和右下角的一个像素可分别用作下边的邻近图像块、右侧的邻近图像块以及右下角的图像块的帧内预测的参考像素,而在VVC标准中可采用更多行或更多列的参考像素。应理解,根据编码标准的所采用的帧内预测的参考像素的不同,可采用不同位置的参考像素。
如上所述,为了考虑到当前编码的图像块被用作帧内预测的参考像素而对后续的图像块的编码产生的影响,需要对可能用于该图像块的周围图像块的帧内预测的参考像素进行选择性调整以补偿像素失真。在HEVC标准中,如图4(b)所示,当对当前图像块i_TL进行编码时,当前图像块i_TL的最右侧一列参考像素(col_right)可能会影响到图像块i_T,最下一行参考像素(row_bottom)可能会影响像素块i_L,而最右下角参考像素(pixel_RB)可能会影响到图像块i。因此,在选择图像块i_TL的帧内预测模式时,需要考虑对周围图像块的影响。
在步骤S310已经针对各个图像块检测出了第一帧内预测模式的信息M_i,因此在对当前图像块i_TL进行帧内预测模式选择时,可以将可能被用作参考像素的率失真代价项引入率失真优化函数。
具体地,例如,在HEVC编码中可使用的原来的率失真代价函数为包括用于图像块的率失真代价项的如下的等式(1):
J(i_TL_mode_j)=SSE(mode_j)+lambda*R(mode_j)(1),
其中,J表示图像块i_TL在mode_j(这里,j可以为帧内预测模式索引0-34中的任意一个)下的率失真代价,SSE表示mode_j的重构像素与原始像素的差值平方和,R(mode_j)表示mode_j的码率,lambda表示码率的率失真代价系数。
根据本公开的示例性实施例,可对以上率失真代价函数进行修正,以包括用于图像块的周围图像块的帧内预测的参考像素的加权率失真代价项,修正后的率失真代价函数可以为如下的等式(3):
(i_TL_mode_best)=min{J(i_TL_mode_j)=SSE(mode_j)+lambda*R(mode_j)+A*SSE(col_right)+B*SSE(row_bottom)+C*SSE(pixel_RB)}(3),
其中,A、B和C分别是用于右侧像素列、下侧像素行和右下角像素的权重参数。也就是说,等式(3)中与等式(1)相同的部分,即,SSE(mode_j)+lambda*R(mode_j)可以被认为是第一率失真代价项,而与参考像素相关的部分,即,A*SSE(col_right)+B*SSE(row_bottom)+C*SSE(pixel_RB)}可以被认为是上述的第二率失真代价项。
根据本公开的示例性实施例,用于当前图像块的下侧像素行的率失真代价项的权重可基于与当前图像块相邻的下侧图像块的第一帧内预测模式被确定,用于当前图像块的右侧像素列的率失真代价项的权重可基于与当前图像块相邻的右侧图像块的第一帧内预测模式被确定,用于右下角像素的率失真代价项的权重可以为固定值。也就是说,由于参考像素对于周围图像块的影响与周围图像块的预测模式(即,方向)相关,因此,通过在率失真代价函数中引入考虑了周围像素块的参考方向的率失真代价项,可以更好地补偿参考像素的失真。
根据本公开的示例性实施例,用于当前图像块的右侧像素行的率失真代价项的权重A基于当前图像块的右侧图像块的第一帧内预测模式所对应的预测方向角度被确定,用于当前图像块的下侧像素列的率失真代价项的权重B基于当前图像块的下侧图像块的第一帧内预测模式对应的预测方向角度被确定,而用于右下角像素的率失真代价项的权重值C可以为预设值MAX,其中,A和B的数值在预定范围[0,MAX]内。
例如,可根据以下等式来确定用于当前图像块的右侧像素列的率失真代价项的权重A、用于当前图像块的下侧像素行的率失真代价项的权重B和用于右下角像素的率失真代价项的权重值C:
A=clip3(0,MAX,abs(cot(ang_M_i_T))),
B=clip3(0,MAX,abs(tan(ang_M_i_L))),
C=MAX,
其中,ang_M_i_T表示与当前图像块相邻的右侧图像块的第一帧内预测模式对应的预测方向角度,ang_M_i_L表示与当前图像块相邻的下侧图像块的第一帧内预测模式对应的预测方向角度,MAX为预先设置的值。通过clip3函数,可将A、B的值限定在预定的范围[0,MAX]内。也就是说,如果abs(cot(ang_M_i_T))的值大于MAX,则A=MAX,如果abs(cot(ang_M_i_T))的值小于0,则A=0。同样地,如果abs(tan(ang_M_i_L))的值大于MAX,则B=MAX,如果abs(tan(ang_M_i_L))的值小于0,则B=0。根据本公开的示例性实施例,MAX可以取值2。特别地,在HEVC和VVC标准中,对于平面模式(INTRA_PLANAR)和DC模式,可以取值A=B=C=1。
应理解,以上的率失真代价项中的权重值所使用的取值方式(例如,三角函数)仅是示例,可根据视频编码方式的不同采取对应的取值方式,只要该权重能够反映出当前图像块中用于周围图像块的参考像素对于周围像素块的预测的影响即可。
步骤S340,根据所述率失真代价值确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对所述当前图像块执行编码。
在步骤S340中,根据所述至少一个候选帧内预测模式下的所述率失真代价值,确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对当前图像块执行编码。也就是说,通过等式3确定出与最小率失真代价值相应的mode_j,可以将mode_j确定为最终的用于图像块i_TL的第二帧内预测模式,并使用该帧内预测模式对图像块进行编码。这里的mode_j可以是根据视频编解码标准规定的多个帧内预测模式之一。也就是说,可确定当前图像块的率失真代价函数在多个候选帧内预测模式下的率失真代价值,并将具有最小率失真代价值的候选帧内预测模式确定为用于所述当前图像块的第二帧内预测模式。
通过采用以上的帧内预测模式的视频编码方法,可以减少在帧内预测中使用的参考像素的失真,从而可以提高视频编码的质量。
此外,根据本公开的示例性实施例,可以考虑在多个量化参数下的率失真代价函数,通过遍历多个量化参数来找到最佳的帧内预测模式和最佳的量化参数。也就是说,在步骤S330可确定当前图像块的率失真代价函数在不同的候选帧内预测模式和量化参数下的多 个率失真代价值,并且在步骤S340将与所述多个率失真代价值中的最小率失真代价值对应的候选帧内预测模式和量化参数确定为用于对当前块执行帧内预测的第二帧内预测模式和量化参数。
例如,在进行计算图像块i_TL的率失真代价时,可引入量化参数(QP)向下遍历多个QP,从而等式(3)的率失真代价可变为如下的等式(4):
(i_TL_QP_best,i_TL_mode_best)=min{J(i_TL,mode_j,QP_k)=SSE(mode_j,QP_k)+lambda*R(mode_j,QP_k)+A*SSE(col_right,QP_k)+B*SSE(row_bottom,QP_k)+C*SSE(pixel_RB,QP_k)}(4)
假设当前给定的QP为32,那么QP_k属于{32,31,30…}。需要遍历QP的个数可以指定,一般可以为2。
通过在计算率失真代价过程中加入量化参数,可以更好地反映帧内预测的参考像素的失真,从而进一步提高视频编码的效率和质量。
图5是示出根据本公开的示例性实施例的视频编码装置的框图。可以在具有视频编码功能的设备中以硬件、软件和/或软件硬件结合的方式来实现根据本公开的示例性实施例的视频编码装置。
如图5所示,根据本公开的示例性实施例的视频编码装置500可包括第一模式确定模块510、权重确定模块520、率失真代价确定模块530和第二模式确定模式540。
第一模式确定模块510被配置为确定从视频帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素。这里,如上所述,可根据图像块的纹理检测结果来确定用于图像块的第一帧内预测模式。在一些实施例中,在VVC和HEVC标准的视频编码中,可以直接使用率失真代价来从多个帧内预测模式中确定一个帧内预测模式作为第一帧内预测模式。
权重确定模块520被配置为基于当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中用于周围图像块的帧内预测的参考像素对应的权重,所述权重表征所述参考像素的失真对于所述周围图像块的帧内预测的影响。
率失真代价确定模块530被配置基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值,其中,率失真代价函数包括用于当前图像块的帧内预测的率失真代价项和用于当前图像块的周围图像块的帧内预测的参考像素的加权率失真代价项,其中,用于参考像素的加权率失真代价项的权重基于当前图像块的周围图像块的第一帧内预测模式被确定。
第二模式确定模块540被配置为根据所述至少一个候选帧内预测模式下的所述率失真代价值,确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对当前图像块执行编码。
根据本公开的示例性实施例,第一模式确定模块510被配置为对图像块进行纹理检测以确定适用于检测到的纹理的第一帧内预测模式,并且权重确定模块520被配置为根据视频编码方法所采用的视频编码标准确定所述参考像素的位置。
根据本公开的示例性实施例,用于当前图像块的周围图像块的帧内预测的参考像素包括当前图像块的下侧像素行、右侧像素列和右下角像素中的至少一个。例如,在符合HEVC标准的编码器中,图像块的最下面一行像素、最右侧一列像素和右下角的一个像素可分别用作下边的邻近图像块、右侧的邻近图像块以及右下角的图像块的帧内预测的参考像素,而在VVC标准中可采用更多行或更多列的参考像素。应理解,根据编码标准的所采用的帧内预测的参考像素的不同,可采用不同位置的参考像素。
根据本公开的示例性实施例,用于当前图像块的下侧像素行的率失真代价项的权重基于与当前图像块相邻的下侧图像块的第一帧内预测模式被确定,用于当前图像块的右侧像素列的率失真代价项的权重值基于与当前图像块相邻的右侧图像块的第一帧内预测模式被确定,用于右下角像素的率失真代价项的权重值为固定值。
根据本公开的示例性实施例,用于当前图像块的右侧像素行的率失真代价项的权重A基于当前图像块的右侧图像块的第一帧内预测模式所对应的预测方向角度被确定,用于当前图像块的下侧像素列的率失真代价项的权重B基于当前图像块的下侧图像块的第一帧内预测模式对应的预测方向角度被确定,而用于右下角像素的率失真代价项的权重值C可以为预设值MAX,其中,A和B的数值在范围[0,MAX]内。
例如,用于当前图像块的下侧像素行的率失真代价项的权重A、用于当前图像块的下侧像素行的率失真代价项的权重B和用于右下角像素的率失真代价项的权重值C通过以下方式被确定:
A=clip3(0,MAX,abs(cot(ang_M_i_T))),
B=clip3(0,MAX,abs(tan(ang_M_i_L))),
C=MAX,
其中,ang_M_i_T表示与当前图像块相邻的右侧图像块的第一帧内预测模式对应的预测方向角度,ang_M_i_L表示与当前图像块相邻的下侧图像块的第一帧内预测模式对应的预测方向角度,MAX为预先设置的值。
根据本公开的示例性实施例,率失真代价确定模块520被配置为确定当前图像块的率失真代价函数在不同的候选帧内预测模式和量化参数下的多个率失真代价值,并且第二模式确定模块530被配置为将与所述多个率失真代价值中的最小率失真代价值对应的候选帧内预测模式和量化参数确定为用于对图像块执行帧内预测的第二帧内预测模式和量化参数。
第一模式确定模块510被配置为通过图像梯度检测确定图像块的梯度角度,并将与确定出的梯度角度相应的帧内预测模式确定为图像块的第一帧内预测模式。
根据本公开的示例性实施例,率失真代价确定模块520被配置为确定图像块的率失真代价函数在多个候选帧内预测模式下的率失真代价值,并且第二模式确定模块530被配置为将具有最小率失真代价值的候选帧内预测模式确定为用于图像块的第二帧内预测模式。
以上已经参照图3和图4对视频编码装置500的各个模块执行的操作的细节进行了 详细说明,在此不再进行重复说明。
图6是示出根据本公开的示例性实施例的一种用于视频编码的电子设备600的结构框图。该电子设备600例如可以是:智能手机、平板电脑、MP4(MovingPicture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。电子设备600还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,电子设备600包括有:处理器601和存储器602。
处理器601可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器601可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(FieldProgrammable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器601也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central ProcessingUnit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器601可以在集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。在本公开的示例性实施例中,处理器601还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器602可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器602还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器602中的非暂态的计算机可读存储介质用于存储至少一个指令,该至少一个指令用于被处理器601所执行以实现本公开的示例性实施例的视频编码方法。
在一些实施例中,电子设备600还可选包括有:外围设备接口603和至少一个外围设备。处理器601、存储器602和外围设备接口603之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口603相连。具体地,外围设备包括:射频电路604、触摸显示屏605、摄像头606、音频电路607、定位组件608和电源609中的至少一种。
外围设备接口603可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到处理器601和存储器602。在一些实施例中,处理器601、存储器602和外围设备接口603被集成在同一芯片或电路板上。在一些其他实施例中,处理器601、存储器602和外围设备接口603中的任意一个或两个可以在单独的芯片或电路板上实现,本实施例对此不加以限定。
射频电路604用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路604通过电磁信号与通信网络以及其他通信设备进行通信。射频电路604将电信号转换为电磁信号进行发送,或者,将接收到的电磁信号转换为电信号。在一些实施例中,射频电路604包括:天线系统、RF收发器、一个或多个放大器、调谐器、振荡器、数字信号处理器、编解码芯片组、用户身份模块卡等等。射频电路604可以通过至少一种无线 通信协议来与其它终端进行通信。该无线通信协议包括但不限于:城域网、各代移动通信网络(2G、3G、4G及5G)、无线局域网和/或WiFi(Wireless Fidelity,无线保真)网络。在一些实施例中,射频电路604还可以包括NFC(Near Field Communication,近距离无线通信)有关的电路,本公开对此不加以限定。
显示屏605用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏605是触摸显示屏时,显示屏605还具有采集在显示屏605的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至处理器601进行处理。此时,显示屏605还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏605可以为一个,设置在电子设备600的前面板。在另一些实施例中,显示屏605可以为至少两个,分别设置在终端600的不同表面或呈折叠设计。在又一些实施例中,显示屏605可以是柔性显示屏,设置在终端600的弯曲表面上或折叠面上。甚至,显示屏605还可以设置成非矩形的不规则图形,也即异形屏。显示屏605可以采用LCD(LiquidCrystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件606用于采集图像或视频。在一些实施例中,摄像头组件606包括前置摄像头和后置摄像头。通常,前置摄像头设置在终端的前面板,后置摄像头设置在终端的背面。在一些实施例中,后置摄像头为至少两个,分别为主摄像头、景深摄像头、广角摄像头、长焦摄像头中的任意一种,以实现主摄像头和景深摄像头融合实现背景虚化功能、主摄像头和广角摄像头融合实现全景拍摄以及VR(Virtual Reality,虚拟现实)拍摄功能或者其它融合拍摄功能。在一些实施例中,摄像头组件606还可以包括闪光灯。闪光灯可以是单色温闪光灯,也可以是双色温闪光灯。双色温闪光灯是指暖光闪光灯和冷光闪光灯的组合,可以用于不同色温下的光线补偿。
音频电路607可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至处理器601进行处理,或者输入至射频电路604以实现语音通信。出于立体声采集或降噪的目的,麦克风可以为多个,分别设置在终端600的不同部位。麦克风还可以是阵列麦克风或全向采集型麦克风。扬声器则用于将来自处理器601或射频电路604的电信号转换为声波。扬声器可以是传统的薄膜扬声器,也可以是压电陶瓷扬声器。当扬声器是压电陶瓷扬声器时,不仅可以将电信号转换为人类可听见的声波,也可以将电信号转换为人类听不见的声波以进行测距等用途。在一些实施例中,音频电路607还可以包括耳机插孔。
定位组件608用于定位电子设备600的当前地理位置,以实现导航或LBS(LocationBased Service,基于位置的服务)。定位组件608可以是基于美国的GPS(GlobalPositioning System,全球定位系统)、中国的北斗系统、俄罗斯的格雷纳斯系统或欧盟的伽利略系统的定位组件。
电源609用于为电子设备600中的各个组件进行供电。电源609可以是交流电、直流电、一次性电池或可充电电池。当电源609包括可充电电池时,该可充电电池可以支持 有线充电或无线充电。该可充电电池还可以用于支持快充技术。
在一些实施例中,电子设备600还包括有一个或多个传感器610。该一个或多个传感器610包括但不限于:加速度传感器611、陀螺仪传感器612、压力传感器613、指纹传感器614、光学传感器615以及接近传感器616。
加速度传感器311可以检测以终端600建立的坐标系的三个坐标轴上的加速度大小。比如,加速度传感器611可以用于检测重力加速度在三个坐标轴上的分量。处理器601可以根据加速度传感器611采集的重力加速度信号,控制触摸显示屏605以横向视图或纵向视图进行用户界面的显示。加速度传感器611还可以用于游戏或者用户的运动数据的采集。
陀螺仪传感器612可以检测终端600的机体方向及转动角度,陀螺仪传感器612可以与加速度传感器611协同采集用户对终端600的3D动作。处理器601根据陀螺仪传感器612采集的数据,可以实现如下功能:动作感应(比如根据用户的倾斜操作来改变UI)、拍摄时的图像稳定、游戏控制以及惯性导航。
压力传感器613可以设置在终端600的侧边框和/或触摸显示屏605的下层。当压力传感器613设置在终端600的侧边框时,可以检测用户对终端600的握持信号,由处理器601根据压力传感器613采集的握持信号进行左右手识别或快捷操作。当压力传感器613设置在触摸显示屏605的下层时,由处理器601根据用户对触摸显示屏605的压力操作,实现对UI上的可操作性控件进行控制。可操作性控件包括按钮控件、滚动条控件、图标控件、菜单控件中的至少一种。
指纹传感器614用于采集用户的指纹,由处理器601根据指纹传感器614采集到的指纹识别用户的身份,或者,由指纹传感器614根据采集到的指纹识别用户的身份。在识别出用户的身份为可信身份时,由处理器601授权该用户执行相关的敏感操作,该敏感操作包括解锁屏幕、查看加密信息、下载软件、支付及更改设置等。指纹传感器614可以被设置电子设备600的正面、背面或侧面。当电子设备600上设置有物理按键或厂商Logo时,指纹传感器614可以与物理按键或厂商Logo集成在一起。
光学传感器615用于采集环境光强度。在一个实施例中,处理器601可以根据光学传感器615采集的环境光强度,控制触摸显示屏605的显示亮度。具体地,当环境光强度较高时,调高触摸显示屏605的显示亮度;当环境光强度较低时,调低触摸显示屏605的显示亮度。在另一个实施例中,处理器601还可以根据光学传感器615采集的环境光强度,动态调整摄像头组件606的拍摄参数。
接近传感器616,也称距离传感器,通常设置在电子设备600的前面板。接近传感器616用于采集用户与电子设备600的正面之间的距离。在一个实施例中,当接近传感器616检测到用户与终端600的正面之间的距离逐渐变小时,由处理器601控制触摸显示屏605从亮屏状态切换为息屏状态;当接近传感器616检测到用户与电子设备600的正面之间的距离逐渐变大时,由处理器601控制触摸显示屏605从息屏状态切换为亮屏状态。
本领域技术人员可以理解,图6中示出的结构并不构成对电子设备600的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
图7所示为另一种电子设备700的结构框图。例如,电子设备700可以被提供为一服务器。参照图7,电子设备700包括一个或多个处理处理器710以及存储器720。存储器720可以包括用于执行以上的数据标注方法的一个或一个以上的程序。电子设备700还可以包括一个电源组件730被配置为执行电子设备700的电源管理,一个有线或无线网络接口740被配置为将电子设备700连接到网络,和一个输入输出(I/O)接口750。电子设备700可以操作基于存储在存储器720的操作系统,例如WindowsServerTM、Mac OS XTM、UnixTM、LinuxTM、FreeBSDTM或类似。
根据本公开的实施例,还提供了一种存储指令的计算机可读存储介质,其中,当指令被至少一个处理器运行时,促使至少一个处理器执行本公开实施例任一项所述的视频编码方法。这里的计算机可读存储介质的示例包括:只读存储器(ROM)、随机存取可编程只读存储器(PROM)、电可擦除可编程只读存储器(EEPROM)、随机存取存储器(RAM)、动态随机存取存储器(DRAM)、静态随机存取存储器(SRAM)、闪存、非易失性存储器、CD-ROM、CD-R、CD+R、CD-RW、CD+RW、DVD-ROM、DVD-R、DVD+R、DVD-RW、DVD+RW、DVD-RAM、BD-ROM、BD-R、BD-R LTH、BD-RE、蓝光或光盘存储器、硬盘驱动器(HDD)、固态硬盘(SSD)、卡式存储器(诸如,多媒体卡、安全数字(SD)卡或极速数字(XD)卡)、磁带、软盘、磁光数据存储装置、光学数据存储装置、硬盘、固态盘以及任何其他装置,所述任何其他装置被配置为以非暂时性方式存储计算机程序以及任何相关联的数据、数据文件和数据结构并将所述计算机程序以及任何相关联的数据、数据文件和数据结构提供给处理器或计算机使得处理器或计算机能执行所述计算机程序。上述计算机可读存储介质中的计算机程序可在诸如客户端、主机、代理装置、服务器等计算机设备中部署的环境中运行。此外,在一个示例中,计算机程序以及任何相关联的数据、数据文件和数据结构分布在联网的计算机系统上,使得计算机程序以及任何相关联的数据、数据文件和数据结构通过一个或多个处理器或计算机以分布式方式存储、访问和执行。
根据本公开的实施例,还提供了一种计算机程序产品,该计算机程序产品中的指令可由计算机设备的处理器执行本公开实施例任一项所述的视频编码方法。
根据本公开的实施例,还提供了一种计算机程序,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行本公开实施例任一项所述的视频编码方法。
根据本公开的分层视频编码方案和解码方案结合了分层编码特性与视频时域相关特性,不仅利用了先前重建的基本层编码视频帧,同时还利用了先前重建的高质量增强层视频帧,将两种强相关性的特征信息(多帧&多层)送入设计的卷积神经网络中,极大程度上地提高了内层视频帧的质量,进而提高了分层视频编码效率,在提高压缩质量的同时降低了视频的码率。
本领域技术人员在考虑说明书及实践这里公开的方案后,将容易想到本公开的其它实施方案。本公开旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识 或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围由下面的权利要求指出。
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。
本公开所有实施例均可以单独被执行,也可以与其他实施例相结合被执行,均视为本公开要求的保护范围。
Claims (18)
- 一种视频编码方法,其特征在于,包括:确定针对从视频图像帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素;基于当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中的参考像素对应的权重,所述权重表征所述参考像素的失真对于所述周围图像块的帧内预测的影响;基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值,其中,率失真代价函数包括第一率失真代价项和具有所述权重的第二率失真代价项,其中,第一率失真代价项为针对所述当前图像块的帧内预测的代价项,第二率失真代价项为针对所述参考像素的加权代价项;根据所述至少一个候选帧内预测模式下的所述率失真代价值,确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对所述当前图像块执行编码。
- 如权利要求1所述的方法,其特征在于,确定针对从视频图像帧划分出的图像块的第一帧内预测模式包括:对图像块进行纹理检测,并确定适用于检测到的纹理的第一帧内预测模式;确定用于帧内预测的参考像素包括:基于采用的视频编码标准,确定图像块中用于帧内预测的参考像素的位置。
- 如权利要求1或2所述的方法,其特征在于,用于周围图像块的帧内预测的参考像素包括:所述当前图像块的下侧像素行、右侧像素列和右下角像素中的至少一个。
- 如权利要求1至3中任一项所述的方法,其特征在于,所述确定所述当前图像块中的参考像素对应的权重,包括:基于所述当前图像块的右侧图像块的第一帧内预测模式所对应的预测方向角度,确定用于所述当前图像块的右侧像素行的率失真代价项的权重A;基于所述当前图像块的下侧图像块的第一帧内预测模式对应的预测方向角度,确定用于所述当前图像块的下侧像素列的率失真代价项的权重B;将用于右下角像素的率失真代价项的权重值C确定为预设值MAX;其中,A和B的数值在[0,MAX]的范围内。
- 如权利要求1至4中任一项所述的方法,其特征在于,所述根据所述至少一个候选帧内预测模式下的所述多个率失真代价值,确定所述当前图像块的第二帧内预测模式包括:从所述至少一个候选帧内预测模式中确定出具有最小率失真代价值的候选帧内预测模式,作为用于所述当前图像块的第二帧内预测模式。
- 如权利要求1至5中任一项所述的方法,其特征在于,所述率失真代价函数中还包括量化参数,所述基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模 式下的率失真代价值包括:在每个候选帧内预测模式下遍历多个量化参数,并确定所述当前图像块的率失真代价函数在不同的候选帧内预测模式的每个量化参数下的多个率失真代价值;其中,所述根据所述至少一个候选帧内预测模式下的所述率失真代价值,确定第二帧内预测模式包括:确定所述多个率失真代价值中的最小率失真代价值,并将所述最小率失真代价值对应的候选帧内预测模式和量化参数确定为用于对所述当前图像块执行帧内预测的第二帧内预测模式和量化参数。
- 如权利要求1至6中任一项所述的方法,其特征在于,所述确定图像块的第一帧内预测模式包括:通过图像梯度检测计算图像块的梯度角度;获取与计算出的梯度角度相应的帧内预测模式,作为所述图像块的第一帧内预测模式。
- 一种视频编码装置,其特征在于,包括:第一模式确定模块,被配置为确定针对从视频图像帧划分出的图像块的第一帧内预测模式以及用于帧内预测的参考像素;权重确定模块,被配置为当前图像块的周围图像块的第一帧内预测模式,确定所述当前图像块中的参考像素对应的权重,所述权重表征所述参考像素的失真对于所述周围图像块的帧内预测的影响;率失真代价确定模块,被配置为基于率失真代价函数确定所述当前图像块在至少一个候选帧内预测模式下的率失真代价值,其中,率失真代价函数包括第一率失真代价项和具有所述权重的第二率失真代价项,其中,第一率失真代价项为针对所述当前图像块的帧内预测的代价项,第二率失真代价项为针对所述参考像素的加权代价项;第二模式确定模块,被配置为根据所述至少一个候选帧内预测模式下的所述率失真代价值确定所述当前图像块的第二帧内预测模式,并使用第二帧内预测模式对所述当前图像块执行编码。
- 如权利要求8所述的装置,其特征在于,第一模式确定模块被配置为对图像块进行纹理检测,确定适用于检测到的纹理的第一帧内预测模式,并且基于视频编码标准,确定图像块中用于帧内预测的参考像素的位置。
- 如权利要求8或9所述的装置,其特征在于,用于当前周围图像块的帧内预测的参考像素包括所述当前图像块的下侧像素行、右侧像素列和右下角像素中的至少一个。
- 如权利要求8至10中任一项所述的装置,其特征在于,权重确定模块被配置为:基于所述当前图像块的右侧图像块的第一帧内预测模式所对应的预测方向角度,确定用于所述当前图像块的右侧像素行的率失真代价项的权重A;基于所述当前图像块的下侧图像块的第一帧内预测模式对应的预测方向角度,确定用于所述当前图像块的下侧像素列的率失真代价项的权重B;将用于右下角像素的率失真代价项的权重值C确定为预设值MAX,其中,A和B的数值在[0,MAX]的范围内。
- 如权利要求8至11中任一项所述的装置,其特征在于,第二模式确定模块被配置为:从所述多个候选帧内预测模式中确定出具有最小率失真代价值的候选帧内预测模式,作为用于所述当前图像块的第二帧内预测模式。
- 如权利要求8至12中任一项所述的装置,其特征在于,所述率失真代价函数中还包括量化参数,率失真代价确定模块被配置为:在每个候选帧内预测模式下遍历多个量化参数,并确定所述当前图像块的率失真代价函数在每个候选帧内预测模式的多个量化参数下的多个率失真代价值,并且第二模式确定模块被配置为:确定与所述多个率失真代价值中的最小率失真代价值对应的候选帧内预测模式和量化参数,并将所述候选帧内预测模式和量化参数确定为用于对所述当前图像块执行帧内预测的第二帧内预测模式和量化参数。
- 如权利要求8至13中任一项所述的装置,其特征在于,第一模式确定模块被配置为通过图像梯度检测计算图像块的梯度角度,获取与计算出的梯度角度相应的帧内预测模式,作为用于图像块的第一帧内预测模式。
- 一种电子设备,其特征在于,包括:至少一个处理器;至少一个存储计算机可执行指令的存储器,其中,所述计算机可执行指令在被所述至少一个处理器运行时,促使所述至少一个处理器执行如权利要求1到7中任一项所述的视频编码方法。
- 一种计算机可读存储介质,其特征在于,当所述计算机可读存储介质中的指令由至少一个处理器执行时,使得所述至少一个处理器能够执行如权利要求1到7中任一项所述的视频编码方法。
- 一种计算机程序产品,其特征在于,所述计算机程序产品中的指令被至少一个处理器运行以执行如权利要求1到7中任一项所述的视频编码方法。
- 一种计算机程序,其特征在于,所述计算机程序包括计算机程序代码,当所述计算机程序代码在计算机上运行时,以使得计算机执行如权利要求1到7中任一项所述的视频编码方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111370720.3A CN113891074B (zh) | 2021-11-18 | 2021-11-18 | 视频编码方法和装置、电子装置和计算机可读存储介质 |
CN202111370720.3 | 2021-11-18 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023087637A1 true WO2023087637A1 (zh) | 2023-05-25 |
Family
ID=79015750
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/092314 WO2023087637A1 (zh) | 2021-11-18 | 2022-05-11 | 视频编码方法和装置、电子设备和计算机可读存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN113891074B (zh) |
WO (1) | WO2023087637A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117692648A (zh) * | 2024-02-02 | 2024-03-12 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、设备、存储介质和计算机程序产品 |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113891074B (zh) * | 2021-11-18 | 2023-08-01 | 北京达佳互联信息技术有限公司 | 视频编码方法和装置、电子装置和计算机可读存储介质 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101141649A (zh) * | 2007-07-31 | 2008-03-12 | 北京大学 | 用于视频编码的帧内预测编码最佳模式的选取方法及装置 |
CN109889827A (zh) * | 2019-04-11 | 2019-06-14 | 腾讯科技(深圳)有限公司 | 帧内预测编码方法、装置、电子设备及计算机存储介质 |
WO2020102750A1 (en) * | 2018-11-16 | 2020-05-22 | Qualcomm Incorporated | Position-dependent intra-inter prediction combination in video coding |
WO2021045655A2 (en) * | 2019-12-31 | 2021-03-11 | Huawei Technologies Co., Ltd. | Method and apparatus for intra prediction |
CN112789863A (zh) * | 2018-10-05 | 2021-05-11 | 华为技术有限公司 | 帧内预测方法及设备 |
CN113891074A (zh) * | 2021-11-18 | 2022-01-04 | 北京达佳互联信息技术有限公司 | 视频编码方法和装置、电子装置和计算机可读存储介质 |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20170072637A (ko) * | 2015-12-17 | 2017-06-27 | 한국전자통신연구원 | 영상 부호화/복호화 방법 및 그 장치 |
WO2017184970A1 (en) * | 2016-04-22 | 2017-10-26 | Vid Scale, Inc. | Prediction systems and methods for video coding based on filtering nearest neighboring pixels |
CN112740684A (zh) * | 2018-09-19 | 2021-04-30 | 韩国电子通信研究院 | 用于对图像进行编码/解码的方法和装置以及用于存储比特流的记录介质 |
CN111669584B (zh) * | 2020-06-11 | 2022-10-28 | 浙江大华技术股份有限公司 | 一种帧间预测滤波方法、装置和计算机可读存储介质 |
CN112532975B (zh) * | 2020-11-25 | 2021-09-21 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、计算机设备及存储介质 |
-
2021
- 2021-11-18 CN CN202111370720.3A patent/CN113891074B/zh active Active
-
2022
- 2022-05-11 WO PCT/CN2022/092314 patent/WO2023087637A1/zh unknown
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101141649A (zh) * | 2007-07-31 | 2008-03-12 | 北京大学 | 用于视频编码的帧内预测编码最佳模式的选取方法及装置 |
CN112789863A (zh) * | 2018-10-05 | 2021-05-11 | 华为技术有限公司 | 帧内预测方法及设备 |
WO2020102750A1 (en) * | 2018-11-16 | 2020-05-22 | Qualcomm Incorporated | Position-dependent intra-inter prediction combination in video coding |
CN109889827A (zh) * | 2019-04-11 | 2019-06-14 | 腾讯科技(深圳)有限公司 | 帧内预测编码方法、装置、电子设备及计算机存储介质 |
WO2021045655A2 (en) * | 2019-12-31 | 2021-03-11 | Huawei Technologies Co., Ltd. | Method and apparatus for intra prediction |
CN113891074A (zh) * | 2021-11-18 | 2022-01-04 | 北京达佳互联信息技术有限公司 | 视频编码方法和装置、电子装置和计算机可读存储介质 |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117692648A (zh) * | 2024-02-02 | 2024-03-12 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、设备、存储介质和计算机程序产品 |
CN117692648B (zh) * | 2024-02-02 | 2024-05-17 | 腾讯科技(深圳)有限公司 | 视频编码方法、装置、设备、存储介质和计算机程序产品 |
Also Published As
Publication number | Publication date |
---|---|
CN113891074B (zh) | 2023-08-01 |
CN113891074A (zh) | 2022-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020182158A1 (zh) | 编码方法、解码方法及装置 | |
US11202072B2 (en) | Video encoding method, apparatus, and device, and storage medium | |
EP3962086A1 (en) | Prediction mode decoding method and apparatus and prediction mode encoding method and apparatus | |
US11388403B2 (en) | Video encoding method and apparatus, storage medium, and device | |
WO2023087637A1 (zh) | 视频编码方法和装置、电子设备和计算机可读存储介质 | |
CN110933334B (zh) | 视频降噪方法、装置、终端及存储介质 | |
CN112532975B (zh) | 视频编码方法、装置、计算机设备及存储介质 | |
WO2020083385A1 (zh) | 图像处理的方法、装置及系统 | |
CN114302137B (zh) | 用于视频的时域滤波方法、装置、存储介质及电子设备 | |
WO2022194017A1 (zh) | 基于自适应帧内刷新机制的解码、编码 | |
CN111770339B (zh) | 视频编码方法、装置、设备及存储介质 | |
CN114268797B (zh) | 用于视频的时域滤波的方法、装置、存储介质及电子设备 | |
WO2019141258A1 (zh) | 一种视频编码方法、视频解码方法、装置及系统 | |
CN114422782B (zh) | 视频编码方法、装置、存储介质及电子设备 | |
CN113038124B (zh) | 视频编码方法、装置、存储介质及电子设备 | |
CN113079372B (zh) | 帧间预测的编码方法、装置、设备及可读存储介质 | |
CN110062225B (zh) | 一种图片滤波的方法及装置 | |
CN113891090A (zh) | 视频编码方法、装置、存储介质及电子设备 | |
CN113938689A (zh) | 量化参数确定方法和装置 | |
CN112218071A (zh) | 视频编码方法、装置、存储介质及电子设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22894182 Country of ref document: EP Kind code of ref document: A1 |