CN111147848B

CN111147848B - Light field video coding method based on content self-adaptation

Info

Publication number: CN111147848B
Application number: CN201911399373.XA
Authority: CN
Inventors: 金欣; 涂望; 李羚俊; 颜成钢; 戴琼海
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2019-12-30
Filing date: 2019-12-30
Publication date: 2021-10-01
Anticipated expiration: 2039-12-30
Also published as: CN111147848A

Abstract

The invention discloses a content self-adaptive light field video coding method, which comprises the following steps: selecting a central viewpoint subimage and an adjacent subimage of the subimages in the light field image; after the central viewpoint subimage is coded, subtracting the light field video of the adjacent subimage to obtain a residual video; and reordering the residual video into spatial domain residual video or continuously calculating the residual energy value E1 of residual sub-images in the residual video between frames and the residual energy value E2 of residual sub-images in the residual video between frames according to the relation between the absolute value of the correlation value R1 of the residual video between the frames and the correlation threshold tau, and selecting a corresponding coding method. The invention adaptively selects a better method from the space domain multi-view coding method and the time domain multi-view coding method for the residual video, reduces the coding code stream of the residual video, improves the coding efficiency and further improves the coding efficiency of the light field video.

Description

Light field video coding method based on content self-adaptation

Technical Field

The invention relates to the field of light field coding, in particular to a light field video coding method based on content self-adaptation.

Background

Since the introduction of light field technology, it has attracted the attention of many researchers and enterprises. Enterprises such as Lytro are working on developing consumer-grade light field cameras, two generations of products have been launched so far, but the problem of low resolution of light field cameras is always present, which is mainly limited to the current hardware level. In the long run, the light field technology has a sufficient potential in future VR and AR technologies, and thus, the technology still attracts a plurality of researchers to invest resources for research.

An original light field pattern captured by a light field camera records angle information while recording a plane scene, so that the size of the light field pattern is tens of times that of a photo shot by a common camera at present. In the future, the light field technology is applied to consumer-grade AR and VR technologies, and the file size of the light field graph is extremely high in requirements when the light field content is stored and transmitted no matter whether film and television works or real-time AR live broadcast and conversation. In order to ensure the quality of the light field image content, the raw file size of the light field image cannot be reduced, but will be increased in the future, and encoding becomes the key to solve the problem.

There are currently two main categories for light field coding: encoding a single light field picture and encoding a light field video. The idea of encoding a single light field pattern is simple: obtaining a plurality of sub-images recording different angle information from a single light field image, connecting the sub-images into a video according to a specific sequence, and coding the video sequence by adopting HEVC. In encoding a single light field pattern, researchers have focused on designing a particular scan order to concatenate sub-images, making more efficient use of redundant information between sub-images. The coded light field video is slightly more complex, and the coded object is a plurality of time-domain continuous light field patterns. A light field pattern is called a frame, each frame resulting in an equal number of sub-images, one sub-image representing a view, which is encoded using a multi-view encoding technique. However, when the existing multi-view coding technology is used for coding the light field video, the correlation between sub-views is not fully utilized, the coding efficiency is low, and the light field technology cannot be fully utilized for the increasingly enlarged primary file in the future AR and VR technologies.

Disclosure of Invention

The invention aims to solve the problem that high-efficiency light field coding is needed for a large native file in the coding process in the prior art, and provides a content-adaptive light field video coding method.

The invention provides a content self-adaptive light field video coding method, which comprises the following steps: s1, acquiring a light field image, and selecting a central viewpoint sub-image of the sub-images in the light field image, wherein the sub-images except the central viewpoint sub-image are adjacent sub-images; s2, coding the central viewpoint subimages by adopting a multi-view coding method to obtain a central viewpoint light field video; s3, subtracting the light field video of the central viewpoint and the light field video of the adjacent sub-images to obtain a residual video; s4, calculating a correlation value R1 of the residual video between frames, and comparing the correlation value with a correlation threshold tau between the frames; s5, if the absolute value of the correlation value R1 is smaller than the correlation threshold tau, reordering the residual video into an airspace residual video, and encoding the airspace residual video into an airspace residual video code stream and then outputting the airspace residual video code stream; if the absolute value of the correlation value R1 is greater than the correlation threshold τ, go to step S6; s6, calculating residual energy values E1 of residual sub-images in the residual video between frames and residual energy values E2 of the residual video between the residual sub-images; s7, when the residual energy value E1 between frames is smaller than the residual energy value E2 between residual sub-images, reordering the residual video into a time domain residual video, and encoding the time domain residual video into a time domain residual video code stream for outputting; otherwise, the residual video is reordered into an airspace residual video, and the airspace residual video is coded into an airspace residual video code stream and then output.

Preferably, the light field video comprises N frames, i.e. N light field images, of 140 residual sub-images per frame.

Preferably, the step of calculating a correlation value R1 between frames of the residual sub-video comprises: s41, randomly selecting 3 residual sub-images from 140 residual sub-images in each frame in all frames in the time domain; s42, respectively calculating the correlation value between each frame in the light field video and the reference frame used in the actual encoding process of each selected residual error sub-image; the number of each reference frame is n1, n1 correlation values are obtained by calculating a single frame, and the average value of the correlation values is taken as the correlation value of the single frame; n correlation values are obtained from N frames; calculating the average value of the obtained N correlation values to obtain the correlation value of the residual error sub-image; s43, averaging the 3 correlation values obtained by the 3 residual sub-images to obtain the correlation value R1 of the residual video between frames.

Preferably, the algorithm of the correlation value R1 between frames of the residual sub-images in the step S4 is based on the built-in function corr2 of MATLAB.

Preferably, the calculation formula of the residual energy value is E ═ (a-B)2, where A, B is two matrices, and E represents the residual energy between A, B.

Preferably, the step of calculating the residual energy value E1 in step S6 includes: s611, randomly selecting 3 residual error sub-images from 140 residual error sub-images in each frame for all frames in a time domain; s612, calculating the energy between each frame and a reference frame used in the actual coding process of each selected residual error sub-image; the number of the reference frames of each frame is n1, n1 energy values are obtained by calculating a single frame, and the average value of the n1 energy values is taken as the energy value of the single frame; obtaining N energy values by N frames; averaging the obtained N energy values to obtain energy values of residual error sub-images; and S613, averaging 3 energy values obtained by the 3 residual sub-images to obtain an energy value E1 of the residual video between frames.

Preferably, the step of calculating the residual energy value E2 in step S6 includes: s621, randomly selecting 3 light field images and M residual error sub-images on the light field images from the N light field images; s622, calculating energy between each residual sub-image and a reference residual digital image used in the actual coding process of the residual sub-image; the number of the reference residual sub-images of each residual sub-image is n1, n1 energy values are obtained by calculating a single residual sub-image, and the average value of the energy values is taken as the energy value of the single residual sub-image; obtaining M energy values by the M residual sub-images; and S623, averaging the M obtained energy values to obtain an energy value E2 between residual sub-images.

Preferably, the spatial domain residual video is encoded by a spatial domain multi-view encoding method; and the time domain residual video is coded by adopting a time domain multi-view coding method.

Preferably, the step of reordering the residual video into a time-domain residual video comprises concatenating all sub-pictures in each frame into one video sequence, resulting in a plurality of video sequences corresponding to the number of frames; the step of reordering the residual video to spatial domain residual video comprises concatenating all frames of each view into one video sequence, resulting in a number of video sequences corresponding to the number of views.

The invention also provides a computer-readable storage medium, in which a computer program is stored, which computer program, when being executed by a processor, is adapted to carry out the method of any of the above-mentioned embodiments.

The beneficial effects of the invention include: on the basis of respectively coding the central viewpoint sub-image video and the residual video, different coding methods are selected by analyzing the correlation and residual energy between residual video views and frames, and the coding efficiency of the residual video part is further improved. Because the correlation and the residual energy are inherent information of the light field video, the proposed method is completely based on the residual video content to adaptively select a better coding method, and further fully utilizes the redundant information between the sub-images. A better method is selected from a space-domain multi-view coding method and a time-domain multi-view coding method in a self-adaptive mode for the residual video, the coding code stream of the residual video is reduced, the coding efficiency is improved, and the coding efficiency of the light field video is further improved.

Drawings

Fig. 1 is a flow chart of a light field video coding method based on content adaptation according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of obtaining a sub-image from an original light field image in an embodiment of the present invention.

Fig. 3 is a schematic diagram of sub-image division in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the following detailed description and accompanying drawings. It should be emphasized that the following description is merely exemplary in nature and is not intended to limit the scope of the invention or its application.

Non-limiting and non-exclusive embodiments will be described with reference to the following figures, wherein like reference numerals refer to like parts, unless otherwise specified.

As shown in fig. 1, the present embodiment provides a light field video coding method based on content adaptation, which includes the following steps:

s1, acquiring a light field image, and selecting a central viewpoint sub-image of the sub-images in the light field image, wherein the sub-images except the central viewpoint sub-image are adjacent sub-images;

s2, coding the central viewpoint subimages by adopting a multi-view coding method to obtain a central viewpoint light field video;

s3, subtracting the light field video of the central viewpoint and the light field video of the adjacent sub-images to obtain a residual video;

s4, calculating a correlation value R1 of the residual video between frames, and comparing the correlation value with a correlation threshold tau between the frames;

s5, if the absolute value of the correlation value R1 is smaller than the correlation threshold tau, reordering the residual video into an airspace residual video, and encoding the airspace residual video into an airspace residual video code stream and then outputting the airspace residual video code stream;

if the absolute value of the correlation value R1 is greater than the correlation threshold τ, go to step S6;

s6, calculating residual energy values E1 of residual sub-images in the residual video between frames and residual energy values E2 of the residual video between the residual sub-images;

s7, when the residual energy value E1 between frames is smaller than the residual energy value E2 between residual sub-images, reordering the residual video into a time domain residual video, and encoding the time domain residual video into a time domain residual video code stream for outputting; otherwise, the residual video is reordered into the spatial domain residual video, and the spatial domain residual video is coded into the spatial domain residual video code stream and then output

In one embodiment of the present invention, the method further comprises preprocessing the sub-images in the light field image: the light field video comprises N frames, namely N light field images, the number of the sub-images in a single light field image is N multiplied by N, and the number of the sub-images is changed along with the number of lens arrays of different light field cameras. For facilitating subsequent blocking, the number of sub-images in a single light field image is first processed: when the remainder of n to 3 is 0, adopting n multiplied by n number of sub-images; when the remainder of n to 3 is 1, removing the sub-images positioned in the first row and the first column, and adopting (n-1) × (n-1) number of sub-images; when the remainder of n to 3 is 2, removing the head and tail rows and columns, and adopting (n-2) x (n-2) number of sub-images; secondly, because the luminance of the sub-images at the four corners in the light field image is too low, the effective information is very little, so the sub-images at the four corners and one circle of sub-images at the outermost periphery of the light field image are abandoned in the encoding process, for example, after the number of the sub-images is left, assuming that the number of the sub-images is n × n, the light field image containing (n-1) x (n-1) -4 effective sub-images is obtained after the sub-images at the four corners and one circle of sub-images at the outermost periphery of the light field image are abandoned in the encoding process. And processing each frame of the light field video in the same way, thus obtaining the preprocessed light field video. Specifically, as shown in fig. 2, the light field video includes N frames, that is, N light field images, and the light field image captured by the Lytro camera is used for the description. The number of sub-images in a single light field image is 15 × 15 ═ 225, and since the luminance of the sub-images at the four corners in the light field image is too low, the effective information contained is very little, so that the sub-images at the four corners and the sub-image at the outermost periphery of the light field image are discarded in the encoding process, and the light field image containing 165 effective sub-images is obtained. And processing each frame of the light field video in the same way, thus obtaining the preprocessed light field video.

Fig. 3 is a diagram of sub-image division of fig. 2, in which white squares are invalid sub-images, black squares are central view sub-images, and gray squares are neighboring sub-images. Specifically, first, an n × n sub-image obtained after preprocessing in one light field image is divided into (n/3) × (n/3) blocks: each block contains 9 sub-images, and one central view sub-image exists in each block. A sub-image at the center among the blocks except for the blocks at the four corners is a center viewpoint sub-image; for the blocks located at the four corners, the sub-image located at the lower right corner in the block at the upper left corner, the sub-image located at the lower left corner in the block at the upper right corner, the sub-image located at the upper right corner in the block at the lower left corner, and the sub-image located at the upper left corner in the block at the lower right corner are respectively the center viewpoint sub-images of the corresponding blocks. The sub-images in the block other than the center view sub-image are referred to as neighboring sub-images of this center view sub-image. Shown in fig. 3 is a 15 × 15 sub-image in a light field image, which is divided into 25 blocks, i.e., 5 × 5 blocks. And coding the central viewpoint sub-image video to obtain a primarily coded central viewpoint light field video, subtracting the primarily coded central viewpoint light field video and the preprocessed adjacent sub-image video to obtain a residual video, wherein 140 residual sub-images are obtained from a single light field image (namely each frame).

The method for calculating the correlation value R1 between frames of residual video comprises the following steps: a correlation calculation algorithm is used to calculate the correlation R between the two sub-images. The correlation calculation algorithm is based on a built-in function "corr 2" of MATLAB, and the calculation formula of the algorithm is as follows:

where A, B is two sub-images, and m and n represent the dimensions of the two-dimensional sub-images.

Namely, it is

Respectively, are average values of A, B.

R represents A, B correlation (-1. ltoreq. R.ltoreq.1), and the larger the absolute value of R, the stronger the correlation.

For all frames (set as N frames) in a time domain, each light field image has 140 residual error sub-images, 3 residual error sub-images are randomly selected from the 140 residual error sub-images of one frame, and correspondingly, the 3 residual error sub-images at the same positions with the rest N-1 frames are all selected; for each residual sub-image selected, the correlation between each frame and the reference frame used in its actual encoding process is calculated. The number of reference frames per frame is n1, then for a single frame, n1 correlation values are calculated, and the average of the n1 correlation values is taken as the correlation value for that frame. And averaging the N correlation values obtained from the N frames to obtain the correlation value of the residual sub-image. Finally, averaging the 3 correlation values obtained by the 3 residual sub-images to obtain a correlation value R1 of the residual video between frames. Further elucidation of the acquisition of R1: each frame has 1 position 1, N frames have N positions 1, namely N residual error sub-images, and the correlation among the N residual error sub-images is calculated to obtain a correlation value; therefore, 3 correlation values are calculated for the randomly selected 3 residual sub-images, and the obtained 3 correlation values are averaged to obtain a correlation value R1 of the residual video between frames.

The absolute value of the correlation value R1 between frames is compared to the correlation threshold τ between frames:

and when the absolute value of R1 is less than or equal to tau, reordering the residual video into a space domain residual video suitable for the space domain multi-view coding method, and then coding the space domain residual video by the space domain multi-view coding method to obtain a space domain residual video code stream.

When R1 > τ, the residual energy E is calculated. The calculation formula is as follows:

E＝(A-B)2(2)

in the formula, A, B are two sub-images.

When calculating the residual energy between frames, for all frames in time domain, randomly selecting 3 residual sub-images from 140 residual sub-images in each frame. For each residual sub-picture, the energy between each frame and the reference frame used in its actual encoding process is calculated. The number of reference frames per frame is n1, then for a single frame, n1 energy values are calculated, and the average of the n1 energy values is taken as the energy value of the frame. And averaging the N energy values obtained by the N frames to obtain the energy value of the residual sub-image. And finally, averaging 3 energy values obtained by 3 residual sub-images to obtain a residual energy value E1 of the residual video between frames. And when the residual energy between the residual sub-images is calculated, randomly selecting 3 light field images and all residual sub-images on the 3 light field images from the N light field images. For each residual sub-image, the energy between each residual sub-image and the reference residual sub-image used in its actual encoding process is calculated. The number of reference residual sub-images per residual sub-image is n1, and for a single residual sub-image, n1 energy values are calculated, and the average of the n1 energy values is taken as the energy value of the residual sub-image. And averaging 140 energy values obtained by 140 residual sub-images to obtain the energy value of the light field image. And finally, averaging 3 energy values obtained by 3 light field images to obtain a residual energy value E2 of the residual video between the residual sub-images.

When the residual energy value E1 between frames is smaller than the residual energy value E2 between residual sub-images, reordering the residual video into a time domain residual video suitable for the time domain multi-view coding method, and then coding the time domain residual video by the time domain multi-view coding method to obtain a time domain residual video code stream; and when the residual energy value E1 between the residual sub-images is smaller than the residual energy value E2 between the frames, reordering the residual video into the spatial domain residual video suitable for the spatial domain multi-view coding method, and then coding the spatial domain residual video by the spatial domain multi-view coding method to obtain the spatial domain residual video code stream. And finally, outputting the coded residual video code stream.

The spatial domain multi-view coding method comprises the following steps: in the spatial domain, a multi-view prediction structure is applied to all views (one view, namely a sub-image corresponding position, wherein 165 views in the light field video are included, the number of views is independent of the number of frames in the video), and the views are mutually referenced. The time domain multi-view coding method comprises the following steps: in the time domain, a multi-view prediction structure is applied to all frames, and the frames are mutually referred. The reordering method comprises the following steps: 1. reordering in spatial domain multiview is to concatenate all frames of each view into one video sequence, resulting in multiple video sequences corresponding to the number of views; 2. the reordering of temporal multiview is to concatenate all sub-pictures in each frame into one video sequence, resulting in multiple video sequences corresponding to the number of frames.

As shown in table 1, the method of the present invention is adopted to perform experiments on 6 groups of different materials, and (a) - (f)6 groups of experimental materials are respectively adopted to perform comparison on three coding modes, namely, "time domain multi-view coding" and "space domain multi-view coding" on residual videos and the content-adaptive-based light field video coding method of the present invention. According to experimental results, the method provided by the invention makes full use of the correlation among the sub-views by analyzing the video content and selecting the optimal algorithm in a self-adaptive manner, selects a better coding manner for different experimental materials from time domain multi-view coding and space domain multi-view coding in a self-adaptive manner, reduces the coding code stream and improves the coding efficiency.

TABLE 1

Those skilled in the art will recognize that numerous variations are possible in light of the above description, and therefore the examples and drawings are merely intended to describe one or more specific embodiments.

While there has been described and illustrated what are considered to be example embodiments of the present invention, it will be understood by those skilled in the art that various changes and substitutions may be made therein without departing from the spirit of the invention. In addition, many modifications may be made to adapt a particular situation to the teachings of the present invention without departing from the central concept described herein. Therefore, it is intended that the invention not be limited to the particular embodiments disclosed, but that the invention will include all embodiments and equivalents falling within the scope of the invention.

Claims

1. A light field video coding method based on content self-adaptation is characterized by comprising the following steps:

s4, calculating the correlation value R1 between frames of residual video and the correlation threshold value between the frames

Comparing;

s5, if the absolute value of the correlation value R1 is less than the correlation threshold value

Reordering the residual video into a spatial domain residual video, and encoding the spatial domain residual video into a spatial domain residual video code stream and then outputting the spatial domain residual video code stream;

if the absolute value of the correlation value R1 is greater than the correlation threshold

Then go to step S6;

s7, when the residual energy value E1 between frames is smaller than the residual energy value E2 between residual sub-images, reordering the residual video into a time domain residual video, and encoding the time domain residual video into a time domain residual video code stream for outputting;

otherwise, the residual video is reordered into the spatial domain residual video, and the spatial domain residual video is coded into a spatial domain residual video code stream and then output.

2. The content-adaptive light-field-based video coding method according to claim 1, wherein: the light field video comprises N frames, i.e. N light field images, each frame having 140 residual sub-images.

3. The content-adaptive light-field-based video coding method according to claim 2, wherein: the step of calculating a correlation value R1 between frames of the residual video comprises:

s41, randomly selecting 3 residual sub-images from 140 residual sub-images in each frame of all frames of the residual video in the time domain to form 3 residual sub-videos;

s42, respectively calculating the relevance value between each frame in the corresponding residual sub-video and the reference frame used in the actual coding process of the frame for each selected residual sub-image; the number of the reference frames of each frame is n1, n1 correlation values are obtained by calculating a single frame, and the average value of the correlation values is taken as the correlation value of the single frame; n correlation values are obtained from N frames; calculating the average value of the obtained N correlation values to obtain the correlation value of the residual error sub-image;

s43, averaging the 3 correlation values obtained by the 3 residual sub-images to obtain the correlation value R1 of the residual video between frames.

4. The content-adaptive light-field-based video coding method according to claim 3, wherein: the algorithm of the correlation values between frames of the residual sub-images in the step S42 is based on the built-in function corr2 of MATLAB.

5. The content-adaptive light-field-based video coding method according to claim 1, wherein: the residual energy value is calculated by the formula

Where A, B is two matrices and E represents the residual energy between A, B.

6. The content-adaptive light-field-based video coding method according to claim 2, wherein: the step of calculating the residual energy value E1 in step S6 includes:

s611, randomly selecting 3 residual error sub-images from 140 residual error sub-images in each frame for all frames in a time domain;

s612, calculating the energy between each frame and a reference frame used in the actual coding process of each selected residual error sub-image; the number of the reference frames of each frame is n1, n1 energy values are obtained by calculating a single frame, and the average value of the n1 energy values is taken as the energy value of the single frame; obtaining N energy values by N frames; averaging the obtained N energy values to obtain energy values of residual error sub-images;

and S613, averaging 3 energy values obtained by the 3 residual sub-images to obtain an energy value E1 of the residual video between frames.

7. The content-adaptive light-field-based video coding method according to claim 2, wherein: the step of calculating the residual energy value E2 in step S6 includes:

s621, randomly selecting 3 light field images and M residual error sub-images on the light field images from the N light field images;

s622, calculating energy between each residual error sub-image and a reference residual error sub-image used in the actual coding process; the number of the reference residual sub-images of each residual sub-image is n1, n1 energy values are obtained by calculating a single residual sub-image, and the average value of the energy values is taken as the energy value of the single residual sub-image; obtaining M energy values by the M residual sub-images;

and S623, averaging the M obtained energy values to obtain an energy value E2 between residual sub-images.

8. The content-adaptive light-field-based video coding method according to claim 1, wherein:

the spatial domain residual error video is coded by adopting a spatial domain multi-view coding method; and the time domain residual video is coded by adopting a time domain multi-view coding method.

9. The content-adaptive light-field-based video coding method according to claim 1, wherein: the step of reordering the residual video into a time domain residual video comprises concatenating all sub-pictures in each frame into a video sequence to obtain a plurality of video sequences corresponding to the number of frames; the step of reordering the residual video to spatial domain residual video comprises concatenating all frames of each view into one video sequence, resulting in a number of video sequences corresponding to the number of views.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 9.