CN113610055B

CN113610055B - Gradient information-based full-optical video sequence intra-frame prediction method

Info

Publication number: CN113610055B
Application number: CN202111003763.8A
Authority: CN
Inventors: 金欣; 江帆
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2021-08-30
Filing date: 2021-08-30
Publication date: 2023-09-26
Anticipated expiration: 2041-08-30
Also published as: CN113610055A

Abstract

An all-optical video sequence intra-frame prediction method based on gradient information comprises the following steps: a1: obtaining a first frame of a video sequence, carrying out intra-frame motion estimation, and calculating to obtain a scaling coefficient of the size of a reference block and distance parameters between corresponding reference blocks under different microlens focal lengths; a2: finding the positions of the left, upper left and upper reference blocks corresponding to the current block according to the calculated distance parameters, and obtaining scaled reference blocks according to the positions and the calculated size scaling coefficients; a3: dividing the reference block into a macro pixel boundary area and a non-boundary area according to texture information, and respectively shaping to ensure that the reference block is equal to the original reference block before scaling of the sequence; a4: the shaped reference block is smoothed along the macro-pixel boundary region and then weighted to predict the current block. The application can efficiently realize the accurate prediction of the uncoded plenoptic image by using the coded image.

Description

Gradient information-based full-optical video sequence intra-frame prediction method

Technical Field

The application relates to the field of computer vision and image processing, in particular to a video compression intra-frame prediction method.

Background

Hand-held light field cameras have recently found widespread commercial use. Unlike conventional cameras, plenoptic cameras can record not only the spatial light intensity varying with time, but also the light propagation direction information through an interposed microlens array, which is more conducive to depth estimation, refocusing and 3D reconstruction of images. Because the focusing type plenoptic camera can obtain higher spatial resolution, better balance is obtained between the spatial resolution and the angular resolution of the light field, and the applications such as microscopy, holographic imaging and VR/AR are facilitated, the application is wider than that of the traditional plenoptic camera. However, the different system optical structures cause different intensity distributions of imaging pixels, and complex macro-pixel structures, which in turn generate huge amounts of transmitted and compressed data, which presents challenges for further applications of focused plenoptic cameras.

Most of the existing all-optical video coding methods aim at the traditional all-optical cameras, mainly search for spatial or time domain correlation and reduce redundancy. For example, the matching block is found by using motion estimation and motion compensation on a spatial domain or time domain reference frame, and the current block is predicted directly or is subjected to homography transformation, averaging and the like. And simultaneously, a plurality of matching blocks can be searched for weighted prediction of the current block. The autocorrelation compensation and the parallax compensation are also used for searching and processing the matching blocks, so that the efficiency is further improved.

However, the current method mainly has two problems. First, most of the existing methods are aimed at videos shot by the traditional plenoptic camera, and differences of video pixel intensity distribution and differences of macro pixel structures caused by differences of imaging principles of the traditional plenoptic camera and the focused plenoptic camera are not considered. For the same shooting scene, macro-pixels only provide angle information in a conventional plenoptic camera, but are microimages that display imaging targets in a focused plenoptic camera. There is a large difference. Even the existing method for compressing focusing all-optical video in a small amount does not fully utilize imaging characteristics, and compression efficiency is low. Second, although there is a method of extracting sub-aperture images from these two images for compression, the extraction rendering process is reversible for conventional plenoptic video, but not reversible for focused plenoptic video, and is not a one-to-one correspondence of one pixel to one view. Therefore, the multi-view video coding (multiview video coding, MVC) method can lose the true pixel value of the original video, which is disadvantageous for further application of the focused plenoptic camera. In addition, the existing compression method for the focused plenoptic video does not consider the multi-focusing condition, and aims at a single-focusing video sequence only, so that the correlation of the video shot by the focused plenoptic camera cannot be fully utilized.

It should be noted that the information disclosed in the above background section is only for understanding the background of the application and thus may include information that does not form the prior art that is already known to those of ordinary skill in the art.

Disclosure of Invention

The application mainly aims to provide an intra-frame prediction method of an all-optical video sequence based on gradient information, which realizes efficient prediction compression of a new focusing all-optical video and solves the problem that the correlation of unique macro pixels cannot be fully utilized in the prior art.

In order to achieve the above purpose, the present application adopts the following technical scheme:

an all-optical video sequence intra-frame prediction method based on gradient information comprises the following steps:

a1: obtaining a first frame of a video sequence, carrying out intra-frame motion estimation, and calculating to obtain a scaling coefficient of the size of a reference block and distance parameters between corresponding reference blocks under different microlens focal lengths;

a2: finding the positions of the left, upper left and upper reference blocks corresponding to the current block according to the calculated distance parameters, and obtaining scaled reference blocks according to the positions and the calculated size scaling coefficients;

a3: dividing the reference block into a macro pixel boundary area and a non-boundary area according to texture information, and respectively shaping to ensure that the reference block is equal to the original reference block before scaling of the sequence;

a4: the shaped reference block is smoothed along the macro-pixel boundary region and then weighted to predict the current block.

Further:

the first frame of the video sequence obtained in the step A1 is subjected to motion estimation, and the frame is divided into 2 ⁿ ×2 ⁿ And (3) performing intra-frame motion estimation on the current block with the size, wherein n=3, 4,5 and 6.

The motion estimation range in the step A1 is a circle with the center of the current block as the center and the multiple of the macro pixel diameter as the radius; the search range is a macro-pixel region in the circle that is different from the current macro-pixel category.

The criterion for the matching in step A1 is sum of absolute error (SAD):

where M and N are the length and width in pixels of the current block; u (i, j) and V (i, j) are (i, j) position pixels of the current block and the searched reference block, respectively.

In the step A1, the best matching reference block is selected according to SAD minimum, and then the matching block i and the current block j are binarized to obtain the imaging area of the same object in different blocks, and then the ratio is calculated:

wherein i, j represents the types of the reference block and the current block, s (i) and s (j) represent the areas of the reference block and the current block which are binarized and represent the same object; the average value of all the scaling factors with the same type and size is calculated to obtain the final block size scaling factor lambda _ij The method comprises the steps of carrying out a first treatment on the surface of the Similarly, the distance between the current block and the pixel at the upper left corner of the matching block is the block distance, and the final distance parameter S is obtained by averaging _ij 。

In the step A2, boundary pixels of the rightmost column and the bottommost row are obtained by processing surrounding reference pixel points with known integer positions; the boundary pixels of the rightmost column are obtained by interpolation of integer position pixel points of the left and right nearest positions of the pixel, and the interpolation weight coefficient is the difference value between the pixels of the two integer positions and the corresponding pixels of the left side, namely the gradient; the boundary pixels of the lowest row are obtained by interpolation of integer position pixel points of the upper and lower nearest positions of the pixels, and the interpolation weight coefficient is the difference value between the pixels of the two integer positions and the corresponding upper pixel, namely the gradient; the corresponding formulas to be interpolated are as follows:

where p represents the pixel value, x' is the abscissa of the pixel to be interpolated, x is the abscissa of the nearest integer pixel position to the left of the pixel to be interpolated, y is the ordinate of the pixel to be interpolated, w ₀ And w ₁ Is a gradient-dependent weighting coefficient, which is calculated as follows:

where [ ] is a rounding operation and k is a sharpness control constant, preferably set to 0.05.

In the step A3, the reference blocks completely inside one macro pixel are skipped without scaling and shaping, the reference blocks crossing over a plurality of macro pixels are divided into macro pixel boundary areas and non-boundary areas according to texture information, different interpolation operations are respectively performed, and the non-boundary areas are reshaped back to the block size before original scaling, preferably, bilinear interpolation is used for the non-boundary areas, and nearest neighbor interpolation is used for the boundary areas.

In the step A4, the filtering is obtained by weighting four pixels of up, down, left and right of each boundary pixel, and is calculated as follows:

wherein the weighting coefficient h _l ,h _r ,v _u ,v _l The specific calculation formula is as follows for the inverse gradient of the pixels in the upper, lower, left and right directions:

in the step A4, weighting the shaped and filtered reference block to predict the current block; predicting boundary pixels of the current block by using the reference block boundary pixel weighting, obtaining a weighting coefficient, and predicting pixel values of the current block by using the weighting coefficient to match all pixels of the reference block in a weighting manner; specifically, the method comprises the following formula:

wherein x _i Boundary pixels of the left column and upper row of the i-th reference block; y' boundary pixels of the left column and upper row of the current block.

A computer readable storage medium storing a computer program which when executed by a processor implements the intra prediction method.

The application has the following beneficial effects:

the application provides an intra-frame prediction method of an all-optical video sequence based on gradient information, which comprises the steps of calculating the distance and the size scaling coefficient of a reference block through intra-frame motion estimation, then carrying out scaling and shaping operations on the found reference block in different focusing states based on the calculated coefficient, carrying out filtering smoothing operation, and designing an intra-frame prediction model of the all-optical video sequence given with the gradient information based on the filtering smoothing operation, thereby efficiently realizing the accurate prediction of an uncoded all-optical image by using a coded image. The application realizes the efficient predictive compression of the novel focusing all-optical video, realizes the efficient predictive compression of the all-optical video, and solves the problem that the correlation of unique macro pixels cannot be fully utilized in the prior art. The application can realize the improvement of the focusing type all-optical image coding efficiency and has great significance for the research of all-optical image compression coding.

Drawings

Fig. 1 is a flowchart of an intra-frame prediction method of an all-optical video sequence based on gradient information according to an embodiment of the present application.

Fig. 2 is a schematic diagram of an intra motion estimation range according to an embodiment of the present application.

Fig. 3 is a schematic diagram of intra motion estimation and binarization according to an embodiment of the present application.

FIG. 4 is a scaled schematic of an embodiment of the application.

FIG. 5 is a diagram illustrating filtered reference pixel selection according to an embodiment of the present application.

Detailed Description

The following describes embodiments of the present application in detail. It should be emphasized that the following description is merely exemplary in nature and is in no way intended to limit the scope of the application or its applications.

As shown in fig. 1, an embodiment of the present application provides an intra-frame prediction method for a focusing all-optical video sequence, which includes the following steps:

a1: acquiring a first frame of a video sequence, performing intra-frame motion estimation, and calculating to obtain a scaling factor of the size of a reference block and distance parameters between corresponding reference blocks under different microlens focal lengths;

a4: and smoothing and filtering the shaped reference block along the macro pixel boundary region, so as to facilitate the subsequent weighted prediction of the current block.

The application calculates the distance and the size scaling coefficient of the reference block through intra-frame motion estimation, then performs scaling and shaping operations on the found reference block with different focusing states based on the calculated coefficient, performs filtering smoothing operation, and designs an intra-frame prediction model of the all-optical video sequence giving gradient information based on the filtering smoothing operation, thereby efficiently realizing accurate prediction of an uncoded all-optical image by using the coded image.

In a further scheme, the designed intra-frame prediction model of the all-optical video sequence based on gradient information can be embedded into the original intra-frame prediction mode summary of the HM coding platform, so that the improvement of the focusing type all-optical image coding efficiency is finally realized, and the method has great significance for the application of all-optical image compression coding.

In one embodiment of the application, acquiring the optical parameters of the plenoptic video sequence comprises: macro-pixel diameter, microlens focal length, main lens image plane to microlens distance, microlens to its imaged distance; the self-contained parameters are macro-pixel diameter, microlens type.

In a specific embodiment, a sequence ChessPieces-movingCamera-4-5x5, a spatial resolution of 3840 x 2160, and an angular resolution of 5x5, of full-light video sequences captured by a Raytrix camera is used, with images generated by three different microlens types, in different in-focus and out-of-focus states.

In the step A1, a first frame of a video sequence is acquired, intra-frame motion estimation is carried out, a scaling factor of a size of a reference block is obtained through calculation and estimation, and distance parameters between corresponding reference blocks under different micro lens focal lengths are obtained:

(1) The first frame of the acquired video sequence is subjected to motion estimation by dividing the frame into 2 ⁿ ×2 ⁿ And (3) performing intra-frame motion estimation on the current block with the size, wherein n=3, 4,5 and 6. The range of motion estimation is around the center of the current block, with a multiple of the macro-pixel diameter being the radius, as exemplified in fig. 2, with a 1.5 times macro-pixel diameter being the radius. The search range is a macro pixel area in the circle which is different from the current macro pixel type, and the smaller range reduces the complexity. The criterion for matching is sum of absolute error (SAD).

(2) Selecting the best matching reference block according to SAD minimum, binarizing the matching block i and the current block j, obtaining the imaging area of the same object in different blocks, and calculating the ratio

Where i, j represent the types of the reference block and the current block, and S (i) and S (j) represent the areas of the same object as the binarized reference block and the current block, as shown in fig. 2, the binarized white portion. The average value of all the scaling factors with the same type and size is calculated to obtain the final block size scaling factor lambda _ij . Similarly, the distance between the pixels in the upper left corner of the current block and the matching block is the block distanceSeparating and averaging to obtain the final distance parameter S _ij 。

The result of this example through the selected region motion estimation search is as follows:

microlens type	Image size scaling factor	Image distance
			Type1&Type2	λ ₁₂ ＝1.0367	S ₁₂ ＝22.8254
Type1&Type3	λ ₁₃ ＝1.2276	S ₁₃ ＝20.6155
			Type2&Type3	λ ₂₃ ＝1.1841	S ₂₃ ＝22.0227

In step A2, the positions of the left, upper left and upper reference blocks corresponding to the current block are found according to the calculated distance parameters, and the scaled reference blocks are obtained according to the positions. A size lambda _ij Multiplied by the current block size.

The scaled reference block edge locations taken are typically fractional. The boundary pixels are the rightmost columns and the bottommost rows, and are obtained by processing the boundary pixels by using surrounding reference pixel points with known integer positions. The rightmost column is obtained by interpolation of integer position pixel points of the left and right nearest positions of the pixel, and the interpolation weight coefficient is the difference value between the two integer position pixels and the corresponding left pixel, namely the gradient. The lowest row is obtained by interpolation of integer position pixel points of the upper and lower nearest positions of the pixel, and the interpolation weight coefficient is the difference value between the two integer position pixels and the corresponding upper pixel, namely the gradient. As shown in fig. 4, the specific formulas corresponding to be interpolated are as follows:

where p represents the pixel value, x' is the abscissa of the pixel to be interpolated, x is the abscissa of the nearest integer pixel location to the left of the pixel to be interpolated, and y is the ordinate of the pixel to be interpolated. w (w) ₀ And w ₁ Is a gradient-dependent weighting coefficient, which is calculated as follows:

where [ ] is a rounding operation, avoiding floating point operations. The multiplication 256 is to improve accuracy. k is a sharpness control constant and is set to 0.05 for simplicity.

Step A3: skipping the reference block completely inside one macro pixel without scaling and shaping, dividing the reference block crossing a plurality of macro pixels into a macro pixel boundary area and a non-boundary area according to texture information, and shaping by different methods respectively to make the reference block equal to the original reference block size before scaling of the sequence. By skipping reference blocks that are entirely inside one macro-pixel, scaling and shaping are not performed, macro-pixel boundary distortion due to scaling is avoided. Dividing a reference block crossing a plurality of macro pixels into a macro pixel boundary area and a non-boundary area according to texture information, performing different interpolation operations on the macro pixel boundary area and the non-boundary area, and reshaping the macro pixel boundary area and the non-boundary area back to the original block size before scaling, for example, the non-boundary area utilizes bilinear interpolation to enrich details, and the boundary area utilizes nearest neighbor interpolation to reduce complexity.

Step A4: and smoothing and filtering the shaped reference block along the macro pixel boundary area, so that the current block is conveniently predicted by subsequent weighting:

(1) And smoothing and filtering the shaped reference block along the macro pixel boundary area to eliminate the discontinuous joint parts caused by different interpolation methods of the boundary area and the non-boundary area. The filtering is obtained using four pixel weights for each boundary pixel, up, down, left, right, calculated as shown in fig. 5:

(2) The boundary pixels of the current block are predicted by weighting the boundary pixels of the reference block, a weighting coefficient is obtained, and the pixel value of the current block is predicted by weighting and matching all pixels of the reference block by using the weighting coefficient; specifically, the method comprises the following formula:

Furthermore, the intra-frame prediction method of the all-optical video sequence based on the gradient information is used as a mode to be embedded into the HM coding platform, so that the intra-frame prediction mode is parallel to the other thirty-five modes of HM coding, and algorithm testing and performance analysis are completed after model embedding work is completed. So that the efficiency of compression coding of the plenoptic image is further improved.

The preferred embodiment of the application combines the structure of the plenoptic image with the HM coding platform of the compression tool aiming at the original plenoptic image formed by the micro-lens image, and provides a plenoptic video sequence coding method based on gradient information for the plenoptic image. And calculating the scaling coefficient of the distance and the size of the reference block through intra motion estimation, then carrying out scaling and shaping operations on the found reference block with different focusing states based on the calculated coefficient, then carrying out filtering smoothing operation, and designing an intra prediction model of the all-optical video sequence giving gradient information based on the filtering smoothing operation. Then, the model is used as an intra-frame prediction mode and is embedded into the original intra-frame prediction mode of the HM, so that the efficient compression coding of the all-optical image is realized, and the method has great significance for the research of the compression of the all-optical image. Through experiments, the Raytrix data ChessPieces-movingCamera-4-5x5 is adopted, and under the configuration of 50 frames of intra-frame prediction, the algorithm provided by the application can save 29.15% of code rate compared with the original HM platform algorithm, and the excellent performance of the method provided by the application is illustrated.

The embodiments of the present application also provide a storage medium storing a computer program which, when executed, performs at least the method as described above.

The embodiments of the present application also provide a processor executing a computer program, at least performing the method as described above.

The background section of the present application may contain background information about the problems or environments of the present application and is not necessarily descriptive of the prior art. Accordingly, inclusion in the background section is not an admission of prior art by the applicant.

The foregoing is a further detailed description of the application in connection with specific/preferred embodiments, and it is not intended that the application be limited to such description. It will be apparent to those skilled in the art that several alternatives or modifications can be made to the described embodiments without departing from the spirit of the application, and these alternatives or modifications should be considered to be within the scope of the application. In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "preferred embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Those skilled in the art may combine and combine the features of the different embodiments or examples described in this specification and of the different embodiments or examples without contradiction. Although embodiments of the present application and their advantages have been described in detail, it should be understood that various changes, substitutions and alterations can be made herein without departing from the scope of the application as defined by the appended claims.

Claims

1. An all-optical video sequence intra-frame prediction method based on gradient information is characterized by comprising the following steps:

a4: smoothing and filtering the shaped reference block along the macro pixel boundary area, and then carrying out weighting operation to predict the current block;

where [ ] is a rounding operation and k is a sharpness control constant;

2. the method of intra prediction according to claim 1, wherein the first frame of the video sequence obtained in step A1 is subjected to motion estimation, and the frame is divided into 2 frames ⁿ ×2 ⁿ And (3) performing intra-frame motion estimation on the current block with the size, wherein n=3, 4,5 and 6.

3. The method of intra prediction according to claim 1, wherein the range of motion estimation in step A1 is a circle with the center of the current block as the center and the multiple of the macro-pixel diameter as the radius; the search range is a macro-pixel region in the circle that is different from the current macro-pixel category.

4. The intra prediction method according to claim 1, wherein the criteria for matching in step A1 are absolute error and SAD:

5. The method of intra prediction according to claim 1, wherein in the step A1, the best matching reference block is selected according to SAD minimum, and then the matching block i and the current block j are binarized to obtain the ratio of the imaging areas of the same object in different blocks:

6. The intra prediction method according to any one of claims 1 to 5, wherein in the step A2, k is set to 0.05.

7. The method according to any one of claims 1 to 5, wherein in the step A3, reference blocks completely inside one macro pixel are skipped without scaling and shaping, the reference blocks across a plurality of macro pixels are divided into macro pixel boundary regions and non-boundary regions according to texture information, different interpolation operations are respectively performed to reshape back to the block size before original scaling, the non-boundary regions use bilinear interpolation, and the boundary regions use nearest neighbor interpolation.

8. The method according to any one of claims 1 to 5, wherein in the step A4, the shaped and filtered reference block is subjected to a weighting operation to predict the current block; predicting boundary pixels of the current block by using the reference block boundary pixel weighting, obtaining a weighting coefficient, and predicting pixel values of the current block by using the weighting coefficient to match all pixels of the reference block in a weighting manner; specifically, the method comprises the following formula:

9. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the intra prediction method according to any one of claims 1 to 8.